Apache Hadoop (394)
MapReduce (354)
Apache Hive (345)
Apache Pig (225)
Apache Spark (991)
Apache HBase (164)
Apache Flume (95)
Apache Impala (72)
Apache Cassandra (392)
Apache Mahout (35)
Apache Sqoop (82)
Apache ZooKeeper (65)
Apache Ambari (93)
Apache HCatalog (34)
Apache HDFS Hadoop Distributed File System (214)
Apache Kafka (189)
Apache Avro (26)
Apache Presto (15)
Apache Tajo (26)
Hadoop General (407) Does impala performance improve as it is deployed to more hosts in a cluster in much the same way that hadoop performance does?
164What is RDD in Apache Spark? How are they computed in Spark? what are the various ways in which it can create?
320What role does worker node play in Apache Spark Cluster? And what is the need to register a worker node with the driver program?
346
How to setup the local repository manually?
Mention what are the data components used by Hadoop?
How can we create a hadoop cluster from scratch?
Explain what is storage and compute nodes?
What are the different file permissions in the HDFS for files or directory levels?
What is a Seed Node in Cassandra ?
What do you understand from Node redundancy and is it exist in hadoop cluster?
What is map/reduce job in hadoop?
What are the types of transformation in RDD in Apache Spark?
Which storage level does the cache () function use?
What the information segments utilized by hadoop are?
How does impala achieve its performance improvements?
What is Clustring in Hive?
Mention key components of Hive Architecture?
What is SequenceFileInputFormat in Hadoop MapReduce?