What is lineage graph in Apache Spark?
Answer / Saurabh Mishra
In Apache Spark, a lineage graph is a data structure that keeps track of the history of each dataset within a Spark application. It records every transformation operation (like map, filter, join) applied to a dataset and stores its input and output datasets. This information allows Spark to perform optimizations, like caching frequently used datasets or recomputing missing transformations when needed.
| Is This Answer Correct ? | 0 Yes | 0 No |
Name some companies that are already using Spark Streaming?
Can you explain spark rdd?
How can you compare Hadoop and Spark in terms of ease of use?
When to use coalesce and repartition in spark?
Does Apache Spark provide checkpoints?
What is the command to start and stop the Spark in an interactive shell?
What is the difference between python and spark?
What is a shuffle block in spark?
Explain key features of Spark
What is cluster in apache spark?
What is lineage graph in spark?
What is the difference between spark and apache spark?
Apache Hadoop (394)
MapReduce (354)
Apache Hive (345)
Apache Pig (225)
Apache Spark (991)
Apache HBase (164)
Apache Flume (95)
Apache Impala (72)
Apache Cassandra (392)
Apache Mahout (35)
Apache Sqoop (82)
Apache ZooKeeper (65)
Apache Ambari (93)
Apache HCatalog (34)
Apache HDFS Hadoop Distributed File System (214)
Apache Kafka (189)
Apache Avro (26)
Apache Presto (15)
Apache Tajo (26)
Hadoop General (407)