What are the major features/characteristics of rdd (resilient distributed datasets)?
Answer / Umesh Kumar Chaurasia
{"Resilient Distributed Datasets (RDDs) in Apache Spark are immutable distributed collections that can be constructed from various data sources. RDDs are the fundamental building blocks for processing data in Spark, and they offer several important features: 1) Immutability: once an RDD is created, it cannot be modified; 2) Fault tolerance: RDDs automatically recompute lost partitions when a worker node fails to ensure consistent results; 3) Lineage: each RDD has a lineage that records the transformation history; and 4) Scalable and parallel computation: RDDs are designed to perform scalable and efficient computations across large datasets."}
| Is This Answer Correct ? | 0 Yes | 0 No |
Is spark used for machine learning?
Define fold() operation in Apache Spark?
How does spark run hadoop?
What do you understand about yarn?
Explain first() operation in Apache Spark?
What is difference between rdd and dataframe?
What do you know about schemardd?
Explain first() operation in Spark?
What are the advantages of DataFrame?
How do I get better performance with spark?
What is sparkconf spark?
Why do we use spark?
Apache Hadoop (394)
MapReduce (354)
Apache Hive (345)
Apache Pig (225)
Apache Spark (991)
Apache HBase (164)
Apache Flume (95)
Apache Impala (72)
Apache Cassandra (392)
Apache Mahout (35)
Apache Sqoop (82)
Apache ZooKeeper (65)
Apache Ambari (93)
Apache HCatalog (34)
Apache HDFS Hadoop Distributed File System (214)
Apache Kafka (189)
Apache Avro (26)
Apache Presto (15)
Apache Tajo (26)
Hadoop General (407)