Why do we use HDFS for applications having large data sets and not when there are lot of small files?
What is a 'block' in HDFS?
Explain what is sqoop in Hadoop ?
What is 'Key value pair' in HDFS?
What is the difference between rdbms and hadoop?
Mention what is the number of default partitioner in Hadoop?
Explain the features of stand alone (local) mode?
How would you use Map/Reduce to split a very large graph into smaller pieces and parallelize the computation of edges according to the fast/dynamic change of data?
How can one increase replication factor to a desired value in Hadoop?
How to keep HDFS cluster balanced?
Have you ever used Counters in Hadoop. Give us an example scenario?
What is a heartbeat in HDFS?
What is Apache Hadoop? Why is Hadoop essential for every Big Data application?