Data Engineer Given a list of followers in the format:123, 345234, 678345, 123…Where column one is the ID of the follower and column two is the ID of the followee. Find all mutual following pairs (the pair 123, 345 in the example above). How would you use Map/Reduce to solve the problem when the list does not fit in memory?
748How would you use Map/Reduce to split a very large graph into smaller pieces and parallelize the computation of edges according to the fast/dynamic change of data?
694Post New Apache Hadoop Questions
What is configuration of a typical slave node on Hadoop cluster? How many JVMs run on a slave node?
What is the purpose of RawComparator interface?
How the Client communicates with HDFS?
What does the command mapred.job.tracker do?
Can NameNode and DataNode be a commodity hardware?
Which files are used by the startup and shutdown commands?
What are active and passive "NameNodes"?
What is Partioner in hadoop? Where does it run
What is a JobTracker in Hadoop? How many instances of JobTracker run on a Hadoop Cluster?
What is a checkpoint?
Where is the Mapper Output intermediate kay-value data stored ?
Explain the difference between gen1 and gen2 hadoop with regards to the namenode?
What is a spill factor with respect to the ram?
Which is the best hadoop certification?
Why is hadoop faster?