Apache Hadoop (394)
MapReduce (354)
Apache Hive (345)
Apache Pig (225)
Apache Spark (991)
Apache HBase (164)
Apache Flume (95)
Apache Impala (72)
Apache Cassandra (392)
Apache Mahout (35)
Apache Sqoop (82)
Apache ZooKeeper (65)
Apache Ambari (93)
Apache HCatalog (34)
Apache HDFS Hadoop Distributed File System (214)
Apache Kafka (189)
Apache Avro (26)
Apache Presto (15)
Apache Tajo (26)
Hadoop General (407) Which are the methods to create rdd in spark?
What are the key features of HDFS?
Why comparison of types is important for MapReduce?
What are the relation operations in Pig? Explain any two with examples?
What are the different methods to set up local repositories?
Name different types of primary keys in Cassandra?
Can you explain recommendation engine?
Why there is need of pig language?
What is the definition of Hive?
How to set property in apache tajo?
Why HDFS stores data using commodity hardware despite the higher chance of failures in hadoop?
Data Engineer Given a list of followers in the format:123, 345234, 678345, 123…Where column one is the ID of the follower and column two is the ID of the followee. Find all mutual following pairs (the pair 123, 345 in the example above). How would you use Map/Reduce to solve the problem when the list does not fit in memory?
What are the debugging tools used for Apache Pig scripts?
What is Distributed Cache in Hadoop?
How Hive distributes the rows into buckets?