What are the components of spark?
What is difference between hadoop and spark?
What is coalesce in spark sql?
If there is certain data that we want to use again and again in different transformations, what should improve the performance?
Explain fullOuterJoin() operation in Apache Spark?
Can you mention some features of spark?
Which one will you choose for a project –Hadoop MapReduce or Apache Spark?
Can you explain spark graphx?
What is shuffle in spark?
explain the concept of RDD (Resilient Distributed Dataset). Also, state how you can create RDDs in Apache Spark.
Can you define rdd lineage?
How tasks are created in spark?
Explain catalyst query optimizer in Apache Spark?
What is in memory in spark?
Explain Spark countByKey() operation?