Apache Hadoop (394)
MapReduce (354)
Apache Hive (345)
Apache Pig (225)
Apache Spark (991)
Apache HBase (164)
Apache Flume (95)
Apache Impala (72)
Apache Cassandra (392)
Apache Mahout (35)
Apache Sqoop (82)
Apache ZooKeeper (65)
Apache Ambari (93)
Apache HCatalog (34)
Apache HDFS Hadoop Distributed File System (214)
Apache Kafka (189)
Apache Avro (26)
Apache Presto (15)
Apache Tajo (26)
Hadoop General (407)
How many JVMs run on a slave node?
What is SparkSession in Apache Spark?
What happens if the preferred replica is not in the isr?
Where is hadoop-env.sh file present?
What are the types of cluster managers in spark?
How much memory is required?
What is the need for custom serde?
Explain Multi-tenancy?
Does impala use caching?
Define cell in HBase?
How does apache flume work?
Why do I have to use refresh and invalidate metadata, what do they do?
In which scenario Pig is better fit than MapReduce?
When to use –target-dir and when to use –warehouse-dir while importing data?
What is difference between rdd and dataframe?