What is the difference between Caching and Persistence in Apache Spark?
Answer / Ravi
Caching in Apache Spark stores RDDs (Resilient Distributed Datasets) in memory for faster access during subsequent tasks. Once cached, the data remains in memory until explicitly cleared or the application terminates. Persistence, on the other hand, saves DataFrames or Datasets to an external storage system like HDFS or Cassandra for long-term storage.
| Is This Answer Correct ? | 0 Yes | 0 No |
What is spark in python?
List down the languages supported by Apache Spark?
What is Directed Acyclic Graph(DAG)?
What are broadcast variables in Apache Spark? Why do we need them?
Describe the run-time architecture of Spark?
Why Spark?
Describe Partition and Partitioner in Apache Spark?
What is spark slang for?
Which spark library allows reliable file sharing at memory speed across different cluster frameworks?
Explain the lookup() operation in Spark?
Can you use spark to access and analyze data stored in cassandra databases?
List the advantage of Parquet files?
Apache Hadoop (394)
MapReduce (354)
Apache Hive (345)
Apache Pig (225)
Apache Spark (991)
Apache HBase (164)
Apache Flume (95)
Apache Impala (72)
Apache Cassandra (392)
Apache Mahout (35)
Apache Sqoop (82)
Apache ZooKeeper (65)
Apache Ambari (93)
Apache HCatalog (34)
Apache HDFS Hadoop Distributed File System (214)
Apache Kafka (189)
Apache Avro (26)
Apache Presto (15)
Apache Tajo (26)
Hadoop General (407)