What is difference between coalesce and repartition?
Answer / Amit Kumar Singh
Coalesce operation in Apache Spark is used to reduce the number of partitions for a DataFrame or RDD, while maintaining their existing order. It combines the contiguous partitions. On the other hand, repartition operation is used to change the total number of partitions for a DataFrame or RDD, which might cause the shuffle of data, and can affect the performance due to increased network traffic.
| Is This Answer Correct ? | 0 Yes | 0 No |
What is RDD?
Which the fundamental data structure of Spark
Does spark run hadoop?
What are the ways in which Apache Spark handles accumulated Metadata?
What file systems does spark support?
How does broadcast join work in spark?
Are sparks dangerous?
How can you trigger automatic clean-ups in Spark to handle accumulated metadata?
What are the optimization techniques in spark?
Why is Transformation lazy in Spark?
What exactly is apache spark?
By Default, how many partitions are created in RDD in Apache Spark?
Apache Hadoop (394)
MapReduce (354)
Apache Hive (345)
Apache Pig (225)
Apache Spark (991)
Apache HBase (164)
Apache Flume (95)
Apache Impala (72)
Apache Cassandra (392)
Apache Mahout (35)
Apache Sqoop (82)
Apache ZooKeeper (65)
Apache Ambari (93)
Apache HCatalog (34)
Apache HDFS Hadoop Distributed File System (214)
Apache Kafka (189)
Apache Avro (26)
Apache Presto (15)
Apache Tajo (26)
Hadoop General (407)