How do we represent data in Spark?
Answer / Hitesh Pandey
The primary way to represent data in Apache Spark is through Resilient Distributed Datasets (RDDs). An RDD is an immutable distributed collection of data that can be partitioned across nodes in a cluster. The data is divided into logical partitions, and each partition is stored on one or more nodes.
| Is This Answer Correct ? | 0 Yes | 0 No |
What exactly is spark?
What is the role of Driver program in Spark Application?
What is action, how it process data in apache spark
Explain what are the various types of Transformation on DStream?
how will you implement SQL in Spark?
Which one will you choose for a project –Hadoop MapReduce or Apache Spark?
Explain about the major libraries that constitute the Spark Ecosystem?
Discuss the various running mode of Apache Spark?
Can you explain spark core?
Can you mention some features of spark?
Define sparkcontext in apache spark?
Please provide an explanation on DStream in Spark.
Apache Hadoop (394)
MapReduce (354)
Apache Hive (345)
Apache Pig (225)
Apache Spark (991)
Apache HBase (164)
Apache Flume (95)
Apache Impala (72)
Apache Cassandra (392)
Apache Mahout (35)
Apache Sqoop (82)
Apache ZooKeeper (65)
Apache Ambari (93)
Apache HCatalog (34)
Apache HDFS Hadoop Distributed File System (214)
Apache Kafka (189)
Apache Avro (26)
Apache Presto (15)
Apache Tajo (26)
Hadoop General (407)