Apache Hadoop (394)
MapReduce (354)
Apache Hive (345)
Apache Pig (225)
Apache Spark (991)
Apache HBase (164)
Apache Flume (95)
Apache Impala (72)
Apache Cassandra (392)
Apache Mahout (35)
Apache Sqoop (82)
Apache ZooKeeper (65)
Apache Ambari (93)
Apache HCatalog (34)
Apache HDFS Hadoop Distributed File System (214)
Apache Kafka (189)
Apache Avro (26)
Apache Presto (15)
Apache Tajo (26)
Hadoop General (407)
Define Partition and Partitioner in Apache Spark?
What is spark vectorization?
What operations does the "RDD" support?
State some Ambari components which we can use for automation as well as integration?
how you can reduce churn in ISR? When does broker leave the ISR?
How to come out of the insert mode?
What is the difference between apache mahout and cloudera oryx ?
When is it not recommended to use MapReduce paradigm for large
Can we run Apache Spark without Hadoop?
How would you tackle counting words in several text documents?
Name the operations supported by rdd?
What is the difference between hadoop and other data processing tools?
What are the disservices of utilizing Apache Spark over Hadoop MapReduce?
What is the difference between leader and follower in kafka?
What is Resilient Distributed Dataset (RDD) in Apache Spark? How does it make spark operator rich?