Apache Hadoop (394)
MapReduce (354)
Apache Hive (345)
Apache Pig (225)
Apache Spark (991)
Apache HBase (164)
Apache Flume (95)
Apache Impala (72)
Apache Cassandra (392)
Apache Mahout (35)
Apache Sqoop (82)
Apache ZooKeeper (65)
Apache Ambari (93)
Apache HCatalog (34)
Apache HDFS Hadoop Distributed File System (214)
Apache Kafka (189)
Apache Avro (26)
Apache Presto (15)
Apache Tajo (26)
Hadoop General (407)
What are the four characteristics of Big Data?
Why is there a need for broadcast variables when working with Apache Spark?
What is difference between Column and Super Column?
What is Apache Cassandra?
Explain what is a difference between an input split and hdfs block?
Replication causes data redundancy then why is is pursued in HDFS?
What happens in text format?
What is HBase?
Explain what combiners are and when you should use a combiner in a mapreduce job?
How is the option in Hadoop to skip the bad records?
What is configured in /etc/hosts and what is its role in setting Hadoop cluster?
How can you manually partition the rdd?
Why we use BloomMapFile?
What does map transformation do? Provide an example.
Can you explain how do ‘map’ and ‘reduce’ work?