RDD (Resilient Distributed Datasets) in Apache Spark is an immutable distri

What is RDD in Apache Spark? How are they computed in Spark? what are the various ways in which it can create?

Question Posted / Neeraj Kumar Soni

1 Answers
343 Views
I also Faced
E-Mail Answers

Answer Posted / Neeraj Kumar Soni

RDD (Resilient Distributed Datasets) in Apache Spark is an immutable distributed collection of data that can be manipulated using transformations and actions. RDDs are computed by splitting large datasets into smaller chunks called partitions, each residing on a single node. RDDs can be created from various sources such as Hadoop files (textFile), local files (textFile("localfile")), or even other RDDs.

Is This Answer Correct ?

0 Yes

0 No

Post New Answer View All Answers

Please Help Members By Posting Answers For Below Questions

What is the latest version of spark?

287

List the advantage of Parquet file in Apache Spark?

473

Explain how RDDs work with Scala in Spark

354

What is meant by Transformation? Give some examples.

328