Apache Hadoop (394)
MapReduce (354)
Apache Hive (345)
Apache Pig (225)
Apache Spark (991)
Apache HBase (164)
Apache Flume (95)
Apache Impala (72)
Apache Cassandra (392)
Apache Mahout (35)
Apache Sqoop (82)
Apache ZooKeeper (65)
Apache Ambari (93)
Apache HCatalog (34)
Apache HDFS Hadoop Distributed File System (214)
Apache Kafka (189)
Apache Avro (26)
Apache Presto (15)
Apache Tajo (26)
Hadoop General (407) Does impala performance improve as it is deployed to more hosts in a cluster in much the same way that hadoop performance does?
164What is RDD in Apache Spark? How are they computed in Spark? what are the various ways in which it can create?
338What role does worker node play in Apache Spark Cluster? And what is the need to register a worker node with the driver program?
355
Can Flume can distribute data to multiple destinations?
What is a generic udf in hive?
What type of data we should put in distributed cache? When to put the data in dc? How much volume we should put in?
Explain Zero Consistency?
Explain tokenize?
What is an accumulator in spark?
What are advantages of Spark over MapReduce?
How does the Pig platform handle relational systems data?
How does pipe operation writes the result to standard output in Apache Spark?
How to remove safemode of namenode forcefully in HDFS?
Define a combiner?
What is the maximum size of a message that can be received by the kafka?
Mention how many operational commands in hbase?
What is the distinction between apache driver and apache spark’s mllib?
How will you implement joins in HBase?