Apache Hadoop (394)
MapReduce (354)
Apache Hive (345)
Apache Pig (225)
Apache Spark (991)
Apache HBase (164)
Apache Flume (95)
Apache Impala (72)
Apache Cassandra (392)
Apache Mahout (35)
Apache Sqoop (82)
Apache ZooKeeper (65)
Apache Ambari (93)
Apache HCatalog (34)
Apache HDFS Hadoop Distributed File System (214)
Apache Kafka (189)
Apache Avro (26)
Apache Presto (15)
Apache Tajo (26)
Hadoop General (407)
What can skew the mean?
When Hive is run in embedded mode
What is session in Cassandra?
What are the benefits of lazy evaluation?
Explain the common input formats in hadoop?
What port does spark use?
What are "coordinator nodes" in cassandra?
What do you understand by schemardd in apache spark rdd?
What is meant by streaming access?
Web-ui shows that half of the datanodes are in decommissioning mode. What does that mean? Is it safe to remove those nodes from the network?
Differentiate between FileSink and FileRollSink?
What are the applications of Apache ZooKeeper?
List some use cases of apache kafka?
What is a sqoop metastore?
Where does the data of a Hive table gets stored?