Spark Scaling to large datasets
In this post, I will share a few quick tips about scaling your Spark applications to larger datasets without having large executor memory.
- Increase Shuffle partitions: The default shuffle partitions is 200, for larger datasets, you are better off with larger number of shuffle partitions. This helps in many ways …