Acerca
Condiciones
Privacidad
Contacto
Apple Services
España
Apple Podcasts
Podcast
Apple Services España
Actualizando
The Databricks Data Engineer
Jakub Lasak
Tecnología
Estreno: 2026-06-15
© Jakub Lasak
Gratis
New
12 episodios
Audio
Gratis
New
12 episodios
Audio
Tecnología
Estreno: 2026-06-15
© Jakub Lasak
-39
n.°
124
en
Top podcasts
>
Tecnología
El episodio más reciente
The Spark Shuffle is baggage claim: why your job waits instead of computes (and more workers won't fix it)
Your Spark job has been running for forty minutes. The dashboard shows your cluster isn't even busy. So you do the obvious thing: add more workers. And it changes nothing. Here's why. During a shuffle, Spark is barely computing at all. It's tagging eve
Tiempo: 11:09
Reproducir
Your Spark job has been running for forty minutes. The dashboard shows your cluster isn't even busy. So you do the obvious thing: add more workers. And it changes nothing.
Here's why. During a shuffle, Spark is barely computing at all. It's tagging every row by destination, piling rows together, spilling the overflow to disk, and hauling data across the network between executors. It's an airport rerouting every passenger's bag to a new carousel, and more baggage handlers can't speed up a single overloaded belt.
In this episode:
- Why your slowest wide transformation spends most of its time on logistics, not computing
- The four-step model that lets you explain the shuffle to a teammate in sixty seconds
- Why adding workers can make a skewed job slower, not faster
- The two numbers in the Spark UI that tell you whether it's skew, partition count, or spill
- The one diagnostic to run before you ever resize the cluster again
This episode is for Databricks data engineers whose joins and aggregations crawl for reasons the cluster size never seems to fix. Whether you're mid-level and tired of guessing, or senior and tired of paying for compute that doesn't help, you'll walk away able to read a slow shuffle instead of throwing hardware at it.
---
Helping 18,000+ Databricks data engineers become seniors: interview like seniors, execute like seniors, think like seniors.
Follow The Databricks Data Engineer for new episodes every Monday, Wednesday, and Friday.
LinkedIn: linkedin.com/in/jrlasak
Newsletter: dataengineer.wiki
#DataEngineering #Databricks #DataEngineer #CareerGrowth #ApacheSpark #DeltaLake
ID de episodio:
1000772782700
GUID: 5477e284-0a3f-4b63-9283-855bac881095
Fecha de lanzamiento: 15/6/2026 11:00:00
Descripción
Helping 18k+ Databricks data engineers become seniors: interview like seniors, execute like seniors, think like seniors.
URL del canal
https://anchor.fm/s/110cfe0fc/podcast/rss
Apple Podcasts: Reseñas de clientes
Ninguna entrada