Apache Hadoop (394)
MapReduce (354)
Apache Hive (345)
Apache Pig (225)
Apache Spark (991)
Apache HBase (164)
Apache Flume (95)
Apache Impala (72)
Apache Cassandra (392)
Apache Mahout (35)
Apache Sqoop (82)
Apache ZooKeeper (65)
Apache Ambari (93)
Apache HCatalog (34)
Apache HDFS Hadoop Distributed File System (214)
Apache Kafka (189)
Apache Avro (26)
Apache Presto (15)
Apache Tajo (26)
Hadoop General (407)
What is distributed copy (distcp)?
What is the major difference between local and remote meta-store?
Explain the flatMap() transformation in Apache Spark?
Data node block size in HDFS, why 64MB?
Is Mapreduce Required For Impala? Will Impala Continue To Work As Expected If Mapreduce Is Stopped?
What is difference between dataset and dataframe?
What is a block and block scanner in HDFS?
Is there any benefit of learning MapReduce, then?
What is the Reducer used for?
How does hdfs ensure information integrity of data blocks squares kept in hdfs?
How does Cassandra write?
What are the independent extensions that are contributed to the ambari codebase?
What are shared variables?
What is the method to create a data frame?
Explain how can you minimize data transfers when working with spark?