Can you define rdd?
Answer / Vishwas Mishra
Resilient Distributed Dataset (RDD) is an immutable distributed collection of data in Apache Spark. RDDs are the basic building block for constructing a complex data analysis pipeline in Spark.
| Is This Answer Correct ? | 0 Yes | 0 No |
Can you explain apache spark?
What is apache spark good for?
How Spark uses Akka?
Name some sources from where Spark streaming component can process real-time data?
What is spark in python?
What do you understand about yarn?
Define the term ‘sparse vector.’
Do I need to learn scala for spark?
Explain transformation in rdd. How is lazy evaluation helpful in reducing the complexity of the system?
Explain Spark Executor
Can spark be used without hadoop?
Is it necessary to install spark on all the nodes of a YARN cluster while running Apache Spark on YARN ?
Apache Hadoop (394)
MapReduce (354)
Apache Hive (345)
Apache Pig (225)
Apache Spark (991)
Apache HBase (164)
Apache Flume (95)
Apache Impala (72)
Apache Cassandra (392)
Apache Mahout (35)
Apache Sqoop (82)
Apache ZooKeeper (65)
Apache Ambari (93)
Apache HCatalog (34)
Apache HDFS Hadoop Distributed File System (214)
Apache Kafka (189)
Apache Avro (26)
Apache Presto (15)
Apache Tajo (26)
Hadoop General (407)