Apache Hadoop (394)
MapReduce (354)
Apache Hive (345)
Apache Pig (225)
Apache Spark (991)
Apache HBase (164)
Apache Flume (95)
Apache Impala (72)
Apache Cassandra (392)
Apache Mahout (35)
Apache Sqoop (82)
Apache ZooKeeper (65)
Apache Ambari (93)
Apache HCatalog (34)
Apache HDFS Hadoop Distributed File System (214)
Apache Kafka (189)
Apache Avro (26)
Apache Presto (15)
Apache Tajo (26)
Hadoop General (407) Data Engineer Given a list of followers in the format:123, 345234, 678345, 123…Where column one is the ID of the follower and column two is the ID of the followee. Find all mutual following pairs (the pair 123, 345 in the example above). How would you use Map/Reduce to solve the problem when the list does not fit in memory?
748How would you use Map/Reduce to split a very large graph into smaller pieces and parallelize the computation of edges according to the fast/dynamic change of data?
694Write a Hive UDF that returns a sentiment score. For example, if good = 1, bad = -1, and average = 0, then a review of a restaurant states "Good food, bad service," your score might be 1 - 1 = 0.
684Suppose that your data is stored in collections, for instance, some binary data, message data or metadata is all keyed on the same value. Will you use HBase for this?
158
Define the term ‘Lazy Evolution’ with reference to Apache Spark
What is the Physical plan in pig architecture?
How the write operation is performed on Cassandra node ?
What is Catalyst framework?
What is the procedure of data storage in cassandra?
What are the main features and Characteristics of Hadoop which makes it the most popular and powerful Big Data tool?
What if a namenode has no data?
Explain why are replications critical in kafka?
Explain some Disadvantages of Avro?
Explain how can you minimize data transfers when working with spark?
What is a generic UDF in the hive?
Why do we need Hadoop Archives? How is it created?
The difference between GROUP and COGROUP operators in Pig?
what is Memtable in Cassandra?
What are the advantages of DataSets?