Apache Hadoop (394)
MapReduce (354)
Apache Hive (345)
Apache Pig (225)
Apache Spark (991)
Apache HBase (164)
Apache Flume (95)
Apache Impala (72)
Apache Cassandra (392)
Apache Mahout (35)
Apache Sqoop (82)
Apache ZooKeeper (65)
Apache Ambari (93)
Apache HCatalog (34)
Apache HDFS Hadoop Distributed File System (214)
Apache Kafka (189)
Apache Avro (26)
Apache Presto (15)
Apache Tajo (26)
Hadoop General (407)
How does groupbykey work in spark?
Why Mapreduce output written in local disk?
What is the importance of eval tool?
Do I need scala for spark?
What is partitioner and its usage?
What does the command mapred.job.tracker do?
How will format the HDFS ?
Which language is better for spark?
Explain the key benefits of using storm for real time processing?
What is streaming in Hadoop?
What is the difference between hadoop and other data processing tools?
What is the difference between leader and follower in kafka?
How is anti-entropy associated with merkel tree?
What is the purpose of exploding in hive?
What is a block in HDFS? what is the default size in Hadoop 1 and Hadoop 2? Can we change the block size?