Apache Hadoop (394)
MapReduce (354)
Apache Hive (345)
Apache Pig (225)
Apache Spark (991)
Apache HBase (164)
Apache Flume (95)
Apache Impala (72)
Apache Cassandra (392)
Apache Mahout (35)
Apache Sqoop (82)
Apache ZooKeeper (65)
Apache Ambari (93)
Apache HCatalog (34)
Apache HDFS Hadoop Distributed File System (214)
Apache Kafka (189)
Apache Avro (26)
Apache Presto (15)
Apache Tajo (26)
Hadoop General (407) What alternate way does HDFS provides to recover data in case a Namenode, without backup, fails and cannot be recovered?
35If I create a folder in HDFS, will there be metadata created corresponding to the folder? If yes, what will be the size of metadata created for a directory?
38
What ensures load balancing of the server in Kafka?
Give a list of Collection data type in Cassandra?
When a large data set is maintained?
How hdfs is different from traditional file systems?
When is it not recommended to use MapReduce paradigm for large scale data processing?
What is kafka?
Whenever we run hive query, new metastore_db is created. Why?
Which command is available to show the current HBase user?
I have a relation r. How can I get the top 10 tuples from the relation r?
Explain a simple Map/Reduce problem.
What is the bottom layer of abstraction in the Spark Streaming API ?
What do you mean by Schema Resolution?
What is Spark DataFrames?
What should be the HDFS Block size to get maximum performance from Hadoop cluster?
What is flatten in pig?