Apache Hadoop (394)
MapReduce (354)
Apache Hive (345)
Apache Pig (225)
Apache Spark (991)
Apache HBase (164)
Apache Flume (95)
Apache Impala (72)
Apache Cassandra (392)
Apache Mahout (35)
Apache Sqoop (82)
Apache ZooKeeper (65)
Apache Ambari (93)
Apache HCatalog (34)
Apache HDFS Hadoop Distributed File System (214)
Apache Kafka (189)
Apache Avro (26)
Apache Presto (15)
Apache Tajo (26)
Hadoop General (407)
How will you implement joins in HBase?
What is hdfs block size?
What are the modes in which Apache Hadoop run?
What do you mean by replication factor?
Why is checkpointing important in hadoop?
State some DDL Command with brief Description?
What is the default block size in Hadoop 1 and in Hadoop 2? Can it be changed?
Explain what happens if, during the PUT operation, HDFS block is assigned a replication factor 1 instead of the default value 3?
Can we change the document present in hdfs?
Do I need to know hadoop to learn spark?
Is it legal to set the number of reducer task to zero? Where the output will be stored in this case?
What is rdd lineage graph? How is it useful in achieving fault tolerance?
Explain the data model of hbase.
Which serialization libraries are supported in spark?
Explain how you can get exactly once messaging from kafka during data production?