An RDD (Resilient Distributed Dataset) is a fundamental data structure in A

How does rdd work in spark?

Question Posted / Ashish Kumar Jaiswal

1 Answers
316 Views
I also Faced
E-Mail Answers

Answer Posted / Ashish Kumar Jaiswal

An RDD (Resilient Distributed Dataset) is a fundamental data structure in Apache Spark, designed to be highly resilient and fault-tolerant. RDDs are distributed collections of objects that can be processed in parallel across multiple nodes in a cluster. They consist of a logical dataset divided into partitions, each partition being stored on one worker node in the cluster. RDDs can be created from Hadoop Distributed File System (HDFS), local file system, or even by transforming other RDDs. Operations on RDDs are lazily evaluated and transformed into a series of tasks that are executed in parallel.

Is This Answer Correct ?

0 Yes

0 No

Post New Answer View All Answers

Please Help Members By Posting Answers For Below Questions

What is meant by Transformation? Give some examples.

328

List the advantage of Parquet file in Apache Spark?

473

Explain how RDDs work with Scala in Spark

355

What is the latest version of spark?

287