Explain about the partitioning, shuffle and sort phase
how indexing in HDFS is done?
What is an identity mapper and identity reducer?
In Map Reduce why map write output to Local Disk instead of HDFS?
What is the data storage component used by Hadoop?
In MapReduce, ideally how many mappers should be configured on a slave?
What are the data components used by Hadoop?
How does Hadoop Classpath plays a vital role in stopping or starting in Hadoop daemons?
Compare RDBMS with Hadoop MapReduce.
Can there be no Reducer?
Why Mapper runs in heavy weight process and not in a thread in MapReduce?
Mention what are the main configuration parameters that user need to specify to run mapreduce job?
Which are the methods in the mapper interface?