Answer Posted / Manoranjan Kumar
"Apache Spark's Resilient Distributed Dataset (RDD) is a fundamental data structure that provides a fault-tolerant distributed collection of objects. It can be created from Hadoop files, collections, or other RDDs. RDDs support two types of operations: transformations and actions. Transformations create a new dataset from an existing one without executing the computation. Examples include map(), filter(), and groupBy(). Actions, on the other hand, return a physical result to the driver program after running the computation on the cluster. Examples include count(), first(), collect(), saveAsTextFile(), etc. Spark performs RDD transformations lazily until an action is called."n
| Is This Answer Correct ? | 0 Yes | 0 No |
Post New Answer View All Answers