Apache Hadoop (394)
MapReduce (354)
Apache Hive (345)
Apache Pig (225)
Apache Spark (991)
Apache HBase (164)
Apache Flume (95)
Apache Impala (72)
Apache Cassandra (392)
Apache Mahout (35)
Apache Sqoop (82)
Apache ZooKeeper (65)
Apache Ambari (93)
Apache HCatalog (34)
Apache HDFS Hadoop Distributed File System (214)
Apache Kafka (189)
Apache Avro (26)
Apache Presto (15)
Apache Tajo (26)
Hadoop General (407)
Clarify what a task tracker is in hadoop?
What is single node cluster in Hadoop? for what all purposes Hadoop run on a single node cluster?
When running Spark applications, is it necessary to install Spark on all the nodes of YARN cluster?
Explain about the different types of trformations on dstreams?
When does queuefullexception occur?
What is the use of context object?
Clarify what is sqoop in hadoop?
Is impala production ready?
What is apache spark good for?
On what basis Namenode will decide which datanode to write on?
What do you understand by Pair RDD?
How to configure the number of the Combiner in MapReduce?
Is bigger than spark driver maxresultsize?
How to write a query in Cassandra?
After increasing the replication level, I still see that data is under replicated. What could be wrong?