Apache Hadoop (394)
MapReduce (354)
Apache Hive (345)
Apache Pig (225)
Apache Spark (991)
Apache HBase (164)
Apache Flume (95)
Apache Impala (72)
Apache Cassandra (392)
Apache Mahout (35)
Apache Sqoop (82)
Apache ZooKeeper (65)
Apache Ambari (93)
Apache HCatalog (34)
Apache HDFS Hadoop Distributed File System (214)
Apache Kafka (189)
Apache Avro (26)
Apache Presto (15)
Apache Tajo (26)
Hadoop General (407) Data Engineer Given a list of followers in the format:123, 345234, 678345, 123…Where column one is the ID of the follower and column two is the ID of the followee. Find all mutual following pairs (the pair 123, 345 in the example above). How would you use Map/Reduce to solve the problem when the list does not fit in memory?
786How would you use Map/Reduce to split a very large graph into smaller pieces and parallelize the computation of edges according to the fast/dynamic change of data?
730Write a Hive UDF that returns a sentiment score. For example, if good = 1, bad = -1, and average = 0, then a review of a restaurant states "Good food, bad service," your score might be 1 - 1 = 0.
732Suppose that your data is stored in collections, for instance, some binary data, message data or metadata is all keyed on the same value. Will you use HBase for this?
162
Which one is the master node in HDFS? Can it be commodity hardware?
What Is Difference Between Mapreduce and Pig ?
What is the best method for Storing Objects in Cassandra ?
Can we write map reduce program in other than java programming language. How?
What is atom in pig?
Explain how can we change the split size if our commodity hardware has less storage space?
How Facebook Uses Hadoop, Hive and Hbase ?
What mechanism does hadoop framework provides to synchronize changes made in distribution cache during runtime of the application?
Difference Between Hadoop and HDFS?
What are the befefits of nosql over relational database?
Compare Hadoop and Spark?
What are the side effects of not running a secondary name node?
Is impala production ready?
Explain the term commitlog?
What is the use of dataframe in spark?