Difference between groupByKey vs reduceByKey in Apache Spark?

Golgappa.net | Golgappa.org | BagIndia.net | BodyIndia.Com | CabIndia.net | CarsBikes.net | CarsBikes.org | CashIndia.net | ConsumerIndia.net | CookingIndia.net | DataIndia.net | DealIndia.net | EmailIndia.net | FirstTablet.com | FirstTourist.com | ForsaleIndia.net | IndiaBody.Com | IndiaCab.net | IndiaCash.net | IndiaModel.net | KidForum.net | OfficeIndia.net | PaysIndia.com | RestaurantIndia.net | RestaurantsIndia.net | SaleForum.net | SellForum.net | SoldIndia.com | StarIndia.net | TomatoCab.com | TomatoCabs.com | TownIndia.com
Interested to Buy Any Domain ? << Click Here >> for more details...

Difference between groupByKey vs reduceByKey in Apache Spark?

Question Posted / akhilesh kumar awasthi

1 Answers
487 Views
I also Faced
E-Mail Answers

Difference between groupByKey vs reduceByKey in Apache Spark?..

Answer / Pradeep Kumar Bhati

groupByKey aggregates data by key and returns an RDD of key-value pairs, while reduceByKey performs a reduction operation on values for each key. The main difference is that groupByKey creates intermediate results as collections (e.g., lists or arrays), whereas reduceByKey uses a custom function to combine values without storing intermediate results.

Is This Answer Correct ?

0 Yes

0 No

Post New Answer

More Apache Spark Interview Questions

Can rdd be shared between sparkcontexts?

What are the benefits of lazy evaluation?

What is a pipelinedrdd?

What are the benefits of Spark lazy evaluation?

Explain Spark join() operation?

Can we broadcast an rdd?

What does a Spark Engine do?

What is spark shuffle?

Can you define pagerank?

What is Apache Spark?

What is stage and task in spark?

How Spark handles monitoring and logging in Standalone mode?

For more Apache Spark Interview Questions Click Here

Categories

Apache Hadoop (394)
MapReduce (354)
Apache Hive (345)
Apache Pig (225)
Apache Spark (991)
Apache HBase (164)
Apache Flume (95)
Apache Impala (72)
Apache Cassandra (392)
Apache Mahout (35)
Apache Sqoop (82)
Apache ZooKeeper (65)
Apache Ambari (93)
Apache HCatalog (34)
Apache HDFS Hadoop Distributed File System (214)
Apache Kafka (189)
Apache Avro (26)
Apache Presto (15)
Apache Tajo (26)
Hadoop General (407)