Hadoop (4218)
Big Data General (104)
Big Data AllOther (3) Which spark library allows reliable file sharing at memory speed across different cluster frameworks?
275Data Engineer Given a list of followers in the format:123, 345234, 678345, 123…Where column one is the ID of the follower and column two is the ID of the followee. Find all mutual following pairs (the pair 123, 345 in the example above). How would you use Map/Reduce to solve the problem when the list does not fit in memory?
787
what job does the conf class do?
Who invented spark?
How do I use spark with big data?
What are the methods to set up the local repository in different methods?
Can I install spark on windows?
Why HDFS stores data using commodity hardware despite the higher chance of failures?
Define a column family?
What is Sqoop?
Why we are using flume?
What is the best practice to deploy the secondary name node?
Explain the common input formats in hadoop?
Differentiate between PigLatin and Hive?
What is KeyValueTextInputFormat in Hadoop?
What do you understand by logging in cassandra?
What is spark sqlcontext?