What is the difference between cache and persist in spark?
Answer / Ravita Rani
"Cache and Persist() are used to keep RDDs (Resilient Distributed Datasets) in memory for faster access. However, there is a key difference between them. Cache keeps an RDD in memory of the same JVM that created it. If the RDD is created on a different Executor, it won't be cached unless you use Persist(). The Persist() function allows you to specify whether the data should be kept only in memory (MEMORY), on disk and memory (MEMORY_ONLY_SER) or off-heap memory (OFFHEAP).".
| Is This Answer Correct ? | 0 Yes | 0 No |
What are the limitations of Spark?
Which serialization libraries are supported in spark?
Is there a module to implement sql in spark?
What is executor spark?
Why do we use spark?
Who invented spark?
How to save RDD?
How is transformation on rdd different from action?
Does Apache Spark provide check pointing?
What is difference between spark and mapreduce?
Explain the flatMap() transformation in Apache Spark?
How can we create RDD in Apache Spark?
Apache Hadoop (394)
MapReduce (354)
Apache Hive (345)
Apache Pig (225)
Apache Spark (991)
Apache HBase (164)
Apache Flume (95)
Apache Impala (72)
Apache Cassandra (392)
Apache Mahout (35)
Apache Sqoop (82)
Apache ZooKeeper (65)
Apache Ambari (93)
Apache HCatalog (34)
Apache HDFS Hadoop Distributed File System (214)
Apache Kafka (189)
Apache Avro (26)
Apache Presto (15)
Apache Tajo (26)
Hadoop General (407)