Apache Hadoop (394)
MapReduce (354)
Apache Hive (345)
Apache Pig (225)
Apache Spark (991)
Apache HBase (164)
Apache Flume (95)
Apache Impala (72)
Apache Cassandra (392)
Apache Mahout (35)
Apache Sqoop (82)
Apache ZooKeeper (65)
Apache Ambari (93)
Apache HCatalog (34)
Apache HDFS Hadoop Distributed File System (214)
Apache Kafka (189)
Apache Avro (26)
Apache Presto (15)
Apache Tajo (26)
Hadoop General (407)
Why do the nodes are removed and added frequently in a hadoop cluster?
What are the various input and output types supported by mapreduce?
If a particular file is 50 mb, will the hdfs block still consume 64 mb as the default size?
Why is spark good?
Which type of data HBase can store?
When to choose "External Table" in Hive?
What is a Combiner?
How does spark work with python?
How to compress mapper output in Hadoop?
Can you explain logistic regression?
What load do concurrent queries produce on the namenode?
What is a Column family in hbase?
Define Thrift in Apache Cassandra?
What do you mean by inputformat?
What is the best practice to deploy the secondary name node?