RDD (Resilient Distributed Dataset) is an immutable distributed collection

explain the concept of RDD (Resilient Distributed Dataset). Also, state how you can create RDDs in Apache Spark.

Question Posted / Sadhana Dubey

1 Answers
429 Views
I also Faced
E-Mail Answers

Answer Posted / Sadhana Dubey

RDD (Resilient Distributed Dataset) is an immutable distributed collection of objects that provides fault-tolerant parallel processing for large datasets in Apache Spark. It serves as the fundamental data structure for performing computations in Spark. RDDs can be created from various sources such as local files, HDFS files, or even other RDDs using Spark's API (Application Programming Interface). Some ways to create RDDs include textFile(path), wholeTextFiles(path), and parallelize(iterable) in Scala, SparkSession.textFile(path), SparkSession.wholeTextFiles(path), and SparkSession.parallelize(iterable) in Java and Python respectively.

Is This Answer Correct ?

0 Yes

0 No

Post New Answer View All Answers

Please Help Members By Posting Answers For Below Questions

Explain how RDDs work with Scala in Spark

355

What is the latest version of spark?

288

What is meant by Transformation? Give some examples.

328

List the advantage of Parquet file in Apache Spark?

474