Apache Hadoop (394)
MapReduce (354)
Apache Hive (345)
Apache Pig (225)
Apache Spark (991)
Apache HBase (164)
Apache Flume (95)
Apache Impala (72)
Apache Cassandra (392)
Apache Mahout (35)
Apache Sqoop (82)
Apache ZooKeeper (65)
Apache Ambari (93)
Apache HCatalog (34)
Apache HDFS Hadoop Distributed File System (214)
Apache Kafka (189)
Apache Avro (26)
Apache Presto (15)
Apache Tajo (26)
Hadoop General (407)
What is a topic in kafka?
Can you list few commonly used hive services?
Is it mandatory to set input and output type/format in MapReduce?
Which command do we use to show the version?
What is Sqoop?
Is it possible to leverage real time analysis on the big data collected by flume directly? If yes, then explain how?
what is a Task Tracker in Hadoop?
What is dataframe api?
What happens when two clients try to access the same file in the hdfs?
What is partitioning key?
Can spark work without hadoop?
Explain InputSplit in Hadoop MapReduce?
How do I get better performance with spark?
Is it necessary to install spark on all the nodes of a YARN cluster while running Apache Spark on YARN ?
Is databricks a database?