State the difference between persist() and cache() functions.
Answer / Sunanda Biswas
"The main difference between Spark's persist() and cache() functions is that caching puts the RDD into memory by default, while persist() allows you to specify storage level. By default, the MEMORY_ONLY storage level is used for cache(), but you can use different levels like MEMORY_ONLY_SER or DISK_ONLY for persist()."n
| Is This Answer Correct ? | 0 Yes | 0 No |
On what all basis can you differentiate rdd, dataframe, and dataset?
What is a spark standalone cluster?
Is it possible to run Spark and Mesos along with Hadoop?
Is spark distributed computing?
Explain the difference between Spark SQL and Hive.
Define partitions in apache spark.
What is mlib?
What is a partition in spark?
Why do we use spark?
How is transformation on rdd different from action?
What do you understand by Transformations in Spark?
What is spark ml?
Apache Hadoop (394)
MapReduce (354)
Apache Hive (345)
Apache Pig (225)
Apache Spark (991)
Apache HBase (164)
Apache Flume (95)
Apache Impala (72)
Apache Cassandra (392)
Apache Mahout (35)
Apache Sqoop (82)
Apache ZooKeeper (65)
Apache Ambari (93)
Apache HCatalog (34)
Apache HDFS Hadoop Distributed File System (214)
Apache Kafka (189)
Apache Avro (26)
Apache Presto (15)
Apache Tajo (26)
Hadoop General (407)