Apache Hadoop (394)
MapReduce (354)
Apache Hive (345)
Apache Pig (225)
Apache Spark (991)
Apache HBase (164)
Apache Flume (95)
Apache Impala (72)
Apache Cassandra (392)
Apache Mahout (35)
Apache Sqoop (82)
Apache ZooKeeper (65)
Apache Ambari (93)
Apache HCatalog (34)
Apache HDFS Hadoop Distributed File System (214)
Apache Kafka (189)
Apache Avro (26)
Apache Presto (15)
Apache Tajo (26)
Hadoop General (407)
What is a generic udf in hive?
What is LazyOutputFormat in Hadoop?
Explain Spark SQL caching and uncaching?
Explain various Apache Spark ecosystem components. In which scenarios can we use these components?
What do you understand by Pair RDD?
Does Cassandra works on Windows?
What are the main features of impala?
Explain a scenario where you will be using spark streaming.
Hdfs stores data using commodity hardware which has higher chances of failures. So, how hdfs ensures the fault tolerance capability of the system?
Tell any two features of flume?
Explain what do you understand by cassandra- cql collections?
Does spark use yarn?
What is Cassandra?
Explain pipe() operation in Apache Spark?
How to create custom key and custom value in MapReduce Job?