Clarify what is shuffling in map reduce?
In MapReduce how to change the name of the output file from part-r-00000?
Explain what is “map” and what is "reducer" in hadoop?
List the network requirements for using Hadoop ?
How to compress mapper output in Hadoop?
Where is Mapper output stored?
What is the process of changing the split size if there is limited storage space on Commodity Hardware?
What is the default value of map and reduce max attempts?
What is OutputCommitter?
Explain the Reducer's reduce phase?
How many InputSplits is made by a Hadoop Framework?
Explain what you understand by speculative execution
What happens when a datanode fails ?
How would you tackle calculating the number of unique visitors for each hour by mining a huge apache log? You can use post processing on the output of the mapreduce job.
Which interface needs to be implemented to create Mapper and Reducer for the Hadoop?