Why do we use persist () on links rdd?
Answer / Sneha Kumari
"The persist() function in Spark is used to cache RDDs (Resilient Distributed Datasets) in memory for faster access. This improves the performance of subsequent actions on the RDD, as it reduces the need to recompute the data."
| Is This Answer Correct ? | 0 Yes | 0 No |
What is accumulator in spark?
Explain the repartition() operation in Spark?
Explain Spark countByKey() operation?
What is dataproc cluster?
What is application master in spark?
What are the advantages of DataSets?
Explain transformation and action in RDD in Apache Spark?
Explain the difference between Spark SQL and Hive.
What is spark vs scala?
Explain how can you minimize data transfers when working with spark?
What is data ingestion pipeline?
What is difference between spark and mapreduce?
Apache Hadoop (394)
MapReduce (354)
Apache Hive (345)
Apache Pig (225)
Apache Spark (991)
Apache HBase (164)
Apache Flume (95)
Apache Impala (72)
Apache Cassandra (392)
Apache Mahout (35)
Apache Sqoop (82)
Apache ZooKeeper (65)
Apache Ambari (93)
Apache HCatalog (34)
Apache HDFS Hadoop Distributed File System (214)
Apache Kafka (189)
Apache Avro (26)
Apache Presto (15)
Apache Tajo (26)
Hadoop General (407)