Apache Hadoop (394)
MapReduce (354)
Apache Hive (345)
Apache Pig (225)
Apache Spark (991)
Apache HBase (164)
Apache Flume (95)
Apache Impala (72)
Apache Cassandra (392)
Apache Mahout (35)
Apache Sqoop (82)
Apache ZooKeeper (65)
Apache Ambari (93)
Apache HCatalog (34)
Apache HDFS Hadoop Distributed File System (214)
Apache Kafka (189)
Apache Avro (26)
Apache Presto (15)
Apache Tajo (26)
Hadoop General (407)
What is lazy evaluation and how is it useful?
Can you do real-time processing with Spark SQL?
How rdd can be created in spark?
What is Cassandra-CQL collection?
What is structured data?
What are combiners and its purpose?
Why do the nodes are removed and added frequently in a hadoop cluster?
Why not just use zookeeper for everything?
What is difference between map and flatmap?
What is the difference between a MapReduce InputSplit and HDFS block?
What is Apache Pig?
What is the unit of data that flows through a flume agent?
If DataNode increases, then do we need to upgrade NameNode in Hadoop?
State some Ambari components which we can use for automation as well as integration?
Explain what is a sequence file in hadoop?