Apache Hadoop (394)
MapReduce (354)
Apache Hive (345)
Apache Pig (225)
Apache Spark (991)
Apache HBase (164)
Apache Flume (95)
Apache Impala (72)
Apache Cassandra (392)
Apache Mahout (35)
Apache Sqoop (82)
Apache ZooKeeper (65)
Apache Ambari (93)
Apache HCatalog (34)
Apache HDFS Hadoop Distributed File System (214)
Apache Kafka (189)
Apache Avro (26)
Apache Presto (15)
Apache Tajo (26)
Hadoop General (407)
What is the difference between dataframe and dataset in spark?
What is a local repository and when will you use it?
How can we change the split size if our commodity hardware has less storage space?
What are the config properties of presto?
What are the different tools used for Ambari monitoring purpose?
What is zookeper?
how would you modify that solution to only count the number of unique words in all the documents?
Is apache spark a database?
What is spark tool?
How to restrict the number of lines to be printed in pig ?
What is Combiner in MapReduce?
What happens if you get a ‘connection refused java exception’ when you type hadoop fsck /?
Name job control options specified by mapreduce.
What will be the output of cast ('XYZ' as INT)?
What do you mean by commit log in Cassandra?