"Apache Spark's Resilient Distributed Dataset (RDD) is a fundamen

Explain the operations of Apache Spark RDD?

Question Posted / Manoranjan Kumar

1 Answers
303 Views
I also Faced
E-Mail Answers

Answer Posted / Manoranjan Kumar

"Apache Spark's Resilient Distributed Dataset (RDD) is a fundamental data structure that provides a fault-tolerant distributed collection of objects. It can be created from Hadoop files, collections, or other RDDs. RDDs support two types of operations: transformations and actions. Transformations create a new dataset from an existing one without executing the computation. Examples include map(), filter(), and groupBy(). Actions, on the other hand, return a physical result to the driver program after running the computation on the cluster. Examples include count(), first(), collect(), saveAsTextFile(), etc. Spark performs RDD transformations lazily until an action is called."n

Is This Answer Correct ?

0 Yes

0 No

Post New Answer View All Answers

Please Help Members By Posting Answers For Below Questions

What is meant by Transformation? Give some examples.

328

List the advantage of Parquet file in Apache Spark?

474

What is the latest version of spark?

288

Explain how RDDs work with Scala in Spark

355