Why do we use HDFS for applications having large data sets and not when there are lot of small files?
What is safe mode in Hadoop?
what factors the block size takes before creation?
Explain the features of fully distributed mode?
In cloudera there is already a cluster, but if I want to form a cluster on ubuntu can we do it?
Why we cannot do aggregation (addition) in a mapper? Why we require reducer for that?
Explain the use of .mecia class?
What are the port numbers of namenode, job tracker and task tracker?
Explain InputFormat?
What is Apache Hadoop?
Explain how is hadoop different from other data processing tools?
Did you ever built a production process in hadoop ? If yes then what was the process when your hadoop job fails due to any reason?
What are combiners and its purpose?