Apache Hadoop (394)
MapReduce (354)
Apache Hive (345)
Apache Pig (225)
Apache Spark (991)
Apache HBase (164)
Apache Flume (95)
Apache Impala (72)
Apache Cassandra (392)
Apache Mahout (35)
Apache Sqoop (82)
Apache ZooKeeper (65)
Apache Ambari (93)
Apache HCatalog (34)
Apache HDFS Hadoop Distributed File System (214)
Apache Kafka (189)
Apache Avro (26)
Apache Presto (15)
Apache Tajo (26)
Hadoop General (407)
How to set mappers and reducers for MapReduce jobs?
Explain the rudimentary difference between Cassandra and HBase?
how you can get exactly once messaging from Kafka during data production?
Mention Hive default read and write classes?
What is structured data?
Why can we not create directory /user/dataflair/inpdata001 when name node is in safe mode?
Give me examples of unstructured data?
Is it possible to create multiple table in hive for same data?
Explain the role of Streams API?
What are the features of RDD, that makes RDD an important abstraction of Spark?
Explain the various Transformation on Apache Spark RDD like distinct(), union(), intersection(), and subtract()?
Specify the partitions in hive?
Tell any two features of flume?
How can one set space quota in Hadoop (HDFS) directory?
HDFS is used for applications with large data sets, not why Many small files?