AcercaCondicionesPrivacidadContacto
 
Actualizando
The Databricks Data Engineer

The Databricks Data Engineer

Estreno: 2026-06-15
© Jakub Lasak
The Databricks Data Engineer - QR Code
12 episodios
Audio
Escúchalo en Apple Podcasts
12 episodios
Audio
Escúchalo en Apple Podcasts
Estreno: 2026-06-15
© Jakub Lasak
-39
El episodio más reciente
The Spark Shuffle is baggage claim: why your job waits instead of computes (and more workers won't fix it)

The Spark Shuffle is baggage claim: why your job waits instead of computes (and more workers won't fix it)

Your Spark job has been running for forty minutes. The dashboard shows your cluster isn't even busy. So you do the obvious thing: add more workers. And it changes nothing. Here's why. During a shuffle, Spark is barely computing at all. It's tagging eve
Tiempo: 11:09
Your Spark job has been running for forty minutes. The dashboard shows your cluster isn't even busy. So you do the obvious thing: add more workers. And it changes nothing.
Here's why. During a shuffle, Spark is barely computing at all. It's tagging every row by destination, piling rows together, spilling the overflow to disk, and hauling data across the network between executors. It's an airport rerouting every passenger's bag to a new carousel, and more baggage handlers can't speed up a single overloaded belt.
In this episode:
- Why your slowest wide transformation spends most of its time on logistics, not computing
- The four-step model that lets you explain the shuffle to a teammate in sixty seconds
- Why adding workers can make a skewed job slower, not faster
- The two numbers in the Spark UI that tell you whether it's skew, partition count, or spill
- The one diagnostic to run before you ever resize the cluster again
This episode is for Databricks data engineers whose joins and aggregations crawl for reasons the cluster size never seems to fix. Whether you're mid-level and tired of guessing, or senior and tired of paying for compute that doesn't help, you'll walk away able to read a slow shuffle instead of throwing hardware at it.
---
Helping 18,000+ Databricks data engineers become seniors: interview like seniors, execute like seniors, think like seniors.
Follow The Databricks Data Engineer for new episodes every Monday, Wednesday, and Friday.
LinkedIn: linkedin.com/in/jrlasak
Newsletter: dataengineer.wiki
#DataEngineering #Databricks #DataEngineer #CareerGrowth #ApacheSpark #DeltaLake
ID de episodio: 1000772782700
GUID: 5477e284-0a3f-4b63-9283-855bac881095
Fecha de lanzamiento: 15/6/2026 11:00:00

Descripción

Helping 18k+ Databricks data engineers become seniors: interview like seniors, execute like seniors, think like seniors.

Apple Podcasts: Reseñas de clientes

Ninguna entrada