Tag: Hadoop

Hadoop and Spark shuffling

Alice
Tags: Hadoop, Spark
1

Shuffling is generally the most costly operation that we encounter in Hadoop and Spark processing. It has a huge impact on processing performance and can be a bottleneck in cases when our big data requires a lot of grouping or joining. That’s why I think it’s worth to spend a while to understand how shuffle is handled by both of those engines.

Alice
Tags: conferences, Hadoop
0

Few days ago I attended Data works summit in Munich. Used to be called Hadoop summit but apparently Hadoop itself moved a bit into the background, giving space to other cool technologies, emerging at very fast pace.

Tag: Hadoop

Hadoop and Spark shuffling

Data works summit 2017 in Munich