Apache Hadoop (394)
MapReduce (354)
Apache Hive (345)
Apache Pig (225)
Apache Spark (991)
Apache HBase (164)
Apache Flume (95)
Apache Impala (72)
Apache Cassandra (392)
Apache Mahout (35)
Apache Sqoop (82)
Apache ZooKeeper (65)
Apache Ambari (93)
Apache HCatalog (34)
Apache HDFS Hadoop Distributed File System (214)
Apache Kafka (189)
Apache Avro (26)
Apache Presto (15)
Apache Tajo (26)
Hadoop General (407) Hadoop achieves parallelism by dividing the tasks across many nodes, it is possible for a few slow nodes to rate-limit the rest of the program and slow down the program. What mechanism Hadoop provides to combat this?
501Is it possible to provide multiple input to Hadoop? If yes then how can you give multiple directories as input to the Hadoop job?
474
Can you explain worker node?
What do you mean by Persistence?
What is spark master?
What co-group does in Pig?
What do you mean by logging in cassandra?
What happens if the preferred replica is not in the isr?
What is OutputCommitter?
Explain what happens if you alter the block size of a column family on an already occupied database?
How should you handle session_expired?
Is impala intended to handle real time queries in low-latency applications or is it for ad hoc queries for the purpose of data exploration?
Give key features of any NoSQL database?
List some use cases of apache kafka?
How to write MapReduce Programs?
What is Apache Kafka?
Explain about the different types of join in Hive?