Apache Hadoop (394)
MapReduce (354)
Apache Hive (345)
Apache Pig (225)
Apache Spark (991)
Apache HBase (164)
Apache Flume (95)
Apache Impala (72)
Apache Cassandra (392)
Apache Mahout (35)
Apache Sqoop (82)
Apache ZooKeeper (65)
Apache Ambari (93)
Apache HCatalog (34)
Apache HDFS Hadoop Distributed File System (214)
Apache Kafka (189)
Apache Avro (26)
Apache Presto (15)
Apache Tajo (26)
Hadoop General (407) Can the balancer be run while Hadoop is in use?
Is it necessary to kill the topology while updating the running topology?
On what basis Namenode will decide which datanode to write on?
Mention the common features in Pig and Hive?
Explain pig architecture?
How do you define a partitioning key?
What is a namenode? How many instances of namenode run on a hadoop cluster?
Explain HDFS “Write once Read many” pattern?
What is spark vs scala?
What is Bucket in Hive?
Explain coalesce operation in Apache Spark?
Which storage level does the cache () function use?
What are the differences between Caching and Persistence method in Apache Spark?
Define Simple Strategy?
Can we say cogroup is a group of more than 1 data set?