Apache Hadoop (394)
MapReduce (354)
Apache Hive (345)
Apache Pig (225)
Apache Spark (991)
Apache HBase (164)
Apache Flume (95)
Apache Impala (72)
Apache Cassandra (392)
Apache Mahout (35)
Apache Sqoop (82)
Apache ZooKeeper (65)
Apache Ambari (93)
Apache HCatalog (34)
Apache HDFS Hadoop Distributed File System (214)
Apache Kafka (189)
Apache Avro (26)
Apache Presto (15)
Apache Tajo (26)
Hadoop General (407) Does mapreduce programming model provide a way for reducers to communicate with each other? In a mapreduce job can a reducer communicate with another reducer?
681How would you tackle calculating the number of unique visitors for each hour by mining a huge apache log? You can use post processing on the output of the mapreduce job.
826If reducers do not start before all mappers finish then why does the progress on mapreduce job shows something like map(50%) reduce(10%)? Why reducers progress percentage is displayed when mapper is not finished yet?
715
What is Distributed Cache in Hadoop?
What do you mean by logging in cassandra?
Define Nodetool Utility?
What are watches?
What are the languages in which Apache Spark create API?
How does data transfer happen from hdfs to hive?
What makes Apache Spark good at low-latency workloads like graph processing and machine learning?
What are the steps involved in MapReduce framework?
Explain Alter Table Statement in HCatalog?
Define consistency?
What is a yaml file in cassandra?
Define the term thrift
What is spark reducebykey?
What is ObjectInspector functionality?
Explain countByValue() operation in Apache Spark RDD?