Apache Hadoop (394)
MapReduce (354)
Apache Hive (345)
Apache Pig (225)
Apache Spark (991)
Apache HBase (164)
Apache Flume (95)
Apache Impala (72)
Apache Cassandra (392)
Apache Mahout (35)
Apache Sqoop (82)
Apache ZooKeeper (65)
Apache Ambari (93)
Apache HCatalog (34)
Apache HDFS Hadoop Distributed File System (214)
Apache Kafka (189)
Apache Avro (26)
Apache Presto (15)
Apache Tajo (26)
Hadoop General (407) Does mapreduce programming model provide a way for reducers to communicate with each other? In a mapreduce job can a reducer communicate with another reducer?
725How would you tackle calculating the number of unique visitors for each hour by mining a huge apache log? You can use post processing on the output of the mapreduce job.
897If reducers do not start before all mappers finish then why does the progress on mapreduce job shows something like map(50%) reduce(10%)? Why reducers progress percentage is displayed when mapper is not finished yet?
764
What is the difference between kafka and mq?
Can you use spark to access and analyze data stored in cassandra databases?
How do you process big data with spark?
Is it necessary to kill the topology while updating the running topology?
Who is intended audience to learn HCatalog?
How to create Users in hadoop HDFS?
Mention how can you stop a partition form being queried?
What is the difference between External and Internal Table in Hive?
What is the difference between Primary, Partition and Cassandra ?
What is a namenode?
what is Bloom Filter is used for in Cassandra?
What is a partition in Hive?
Whether the output of mapper or output of partitioner written on local disk?
What do shuffling do?
Which serialization libraries are supported in spark?