Can you define rdd lineage?
Answer / Mohit Pal
RDD (Resilient Distributed Dataset) in Apache Spark is an immutable distributed collection of objects that can be processed in parallel across a cluster.
| Is This Answer Correct ? | 0 Yes | 0 No |
How do I use spark with big data?
What is the difference between rdd and dataframe?
Which serialization libraries are supported in spark?
What is spark driver application?
Is spark written in java?
Is spark a special attack?
What do spark executors manage?
What are the ways to launch Apache Spark over YARN?
How to create RDD?
What is meant by spark in big data?
Does spark need yarn?
Can you explain how to minimize data transfers while working with Spark?
Apache Hadoop (394)
MapReduce (354)
Apache Hive (345)
Apache Pig (225)
Apache Spark (991)
Apache HBase (164)
Apache Flume (95)
Apache Impala (72)
Apache Cassandra (392)
Apache Mahout (35)
Apache Sqoop (82)
Apache ZooKeeper (65)
Apache Ambari (93)
Apache HCatalog (34)
Apache HDFS Hadoop Distributed File System (214)
Apache Kafka (189)
Apache Avro (26)
Apache Presto (15)
Apache Tajo (26)
Hadoop General (407)