Apache Hadoop (394)
MapReduce (354)
Apache Hive (345)
Apache Pig (225)
Apache Spark (991)
Apache HBase (164)
Apache Flume (95)
Apache Impala (72)
Apache Cassandra (392)
Apache Mahout (35)
Apache Sqoop (82)
Apache ZooKeeper (65)
Apache Ambari (93)
Apache HCatalog (34)
Apache HDFS Hadoop Distributed File System (214)
Apache Kafka (189)
Apache Avro (26)
Apache Presto (15)
Apache Tajo (26)
Hadoop General (407) Do we need hadoop for spark?
Name the two types of shared variable available in Apache Spark?
Can we have multiple entries in the master files?
What is apache spark for beginners?
What is a pipelinedrdd?
Name the most common input formats defined in hadoop?
What is a block in Hadoop HDFS? What should be the block size to get optimum performance from the Hadoop cluster?
Are there any problems which can only be solved by MapReduce and cannot be solved by PIG? In which kind of scenarios MR jobs will be more useful than PIG?
What is the difference between dataset and dataframe in spark?
What is a MapFile?
What is IdentityMapper?
Mention what is distributed cache in hadoop?
What is TextInputFormat in Hadoop?
What is difference between spark and mapreduce?
Why do we need Pig?