How would you use Map/Reduce to split a very large graph into smaller pieces and parallelize the computation of edges according to the fast/dynamic change of data?
What is HDFS ? How it is different from traditional file systems?
Explain what is sqoop in Hadoop ?
What is HDFS block size and what did you chose in your project?
Explain the shuffle?
What are the two main parts of the hadoop framework?
Can we call vms as pseudos?
Which one is default InputFormat in Hadoop ?
Input Split & Record Reader and what they do?
what is meaning Replication factor?
What is the difference between Gen1 and Gen2 Hadoop with regards to the Namenode?
How to change replication factor of files already stored in HDFS?
Explain the Job OutputFormat?