Apache Hadoop (394)
MapReduce (354)
Apache Hive (345)
Apache Pig (225)
Apache Spark (991)
Apache HBase (164)
Apache Flume (95)
Apache Impala (72)
Apache Cassandra (392)
Apache Mahout (35)
Apache Sqoop (82)
Apache ZooKeeper (65)
Apache Ambari (93)
Apache HCatalog (34)
Apache HDFS Hadoop Distributed File System (214)
Apache Kafka (189)
Apache Avro (26)
Apache Presto (15)
Apache Tajo (26)
Hadoop General (407) What is a metastore in hive?
Mention what daemons run on a master node and slave nodes?
What is the difference between HDFS and NAS ?
Why do we need spark?
What is the use of HColumnDescriptor class?
Define a commodity hardware? Does commodity hardware include ram?
What is azure spark?
What are the complex datatypes in pig?
What is a namenode in hadoop?
Clarify how hive de-serialize and serialize the information?
How to optimize MapReduce Job?
What is the benifit of Distributed cache, why can we just have the file in HDFS and have the application read it?
Describe Accumulator in detail in Apache Spark?
Explain the rudimentary difference between Cassandra and HBase?
What is the default file format to import data using Apache Sqoop?