What is the difference between Caching and Persistence in Apache Spark?
Answer Posted / Ravi
Caching in Apache Spark stores RDDs (Resilient Distributed Datasets) in memory for faster access during subsequent tasks. Once cached, the data remains in memory until explicitly cleared or the application terminates. Persistence, on the other hand, saves DataFrames or Datasets to an external storage system like HDFS or Cassandra for long-term storage.
| Is This Answer Correct ? | 0 Yes | 0 No |
Post New Answer View All Answers