What is RDD in Apache Spark? How are they computed in Spark? what are the various ways in which it can create?
Answer Posted / Neeraj Kumar Soni
RDD (Resilient Distributed Datasets) in Apache Spark is an immutable distributed collection of data that can be manipulated using transformations and actions. RDDs are computed by splitting large datasets into smaller chunks called partitions, each residing on a single node. RDDs can be created from various sources such as Hadoop files (textFile), local files (textFile("localfile")), or even other RDDs.
| Is This Answer Correct ? | 0 Yes | 0 No |
Post New Answer View All Answers