Apache Hadoop (394)
MapReduce (354)
Apache Hive (345)
Apache Pig (225)
Apache Spark (991)
Apache HBase (164)
Apache Flume (95)
Apache Impala (72)
Apache Cassandra (392)
Apache Mahout (35)
Apache Sqoop (82)
Apache ZooKeeper (65)
Apache Ambari (93)
Apache HCatalog (34)
Apache HDFS Hadoop Distributed File System (214)
Apache Kafka (189)
Apache Avro (26)
Apache Presto (15)
Apache Tajo (26)
Hadoop General (407)
What is identity mapper and chain mapper?
What does apache spark do?
What is map in apache spark?
What is document store db? Explain with an example.
what are the key components of hbase?
Explain leftOuterJoin() and rightOuterJoin() operation in Apache Spark?
Differentiate Reducer and Combiner in Hadoop MapReduce?
When should you use cassandra?
How does rdd work in spark?
Did you ever ran into a lop sided job that resulted in out of memory error, if yes then how did you handled it ?
Is it possible to leverage real time analysis on the big data collected by flume directly? If yes, then explain how?
Name some AVRO Reference APIs?
How is RDD in Apache Spark different from Distributed Storage Management?
How HDFS client divide the file into the block while storing inside HDFS?
In a given spark program, how will you identify whether a given operation is Transformation or Action ?