Apache Hadoop (394)
MapReduce (354)
Apache Hive (345)
Apache Pig (225)
Apache Spark (991)
Apache HBase (164)
Apache Flume (95)
Apache Impala (72)
Apache Cassandra (392)
Apache Mahout (35)
Apache Sqoop (82)
Apache ZooKeeper (65)
Apache Ambari (93)
Apache HCatalog (34)
Apache HDFS Hadoop Distributed File System (214)
Apache Kafka (189)
Apache Avro (26)
Apache Presto (15)
Apache Tajo (26)
Hadoop General (407)
When Hive is run in embedded mode
What is accumulators and broadcast variables in spark?
What is a 'block' in HDFS?
Explain what are the tools used in Big Data?
What is the use of flume in hadoop?
Explain Accumulator in Spark?
What is the need for custom serde?
Is it possible to provide multiple input to Hadoop? If yes then how?
After increasing the replication level, I still see that data is under replicated. What could be wrong?
What is the role of “ambari-qa” user?
What is spark sqlcontext?
What is presto verifier?
Is secondary namenode a substitute to the namenode?
Is there another way to check whether Namenode is working?
Can the balancer be run while Hadoop is in use?