Apache Hadoop (394)
MapReduce (354)
Apache Hive (345)
Apache Pig (225)
Apache Spark (991)
Apache HBase (164)
Apache Flume (95)
Apache Impala (72)
Apache Cassandra (392)
Apache Mahout (35)
Apache Sqoop (82)
Apache ZooKeeper (65)
Apache Ambari (93)
Apache HCatalog (34)
Apache HDFS Hadoop Distributed File System (214)
Apache Kafka (189)
Apache Avro (26)
Apache Presto (15)
Apache Tajo (26)
Hadoop General (407)
How will you calculate the number of executors required to do real-time processing using Apache Spark? What factors need to be considered for deciding on the number of nodes for real-time processing?
Define replication factor?
What is SequenceFileInputFormat in Hadoop MapReduce?
Explain how you can get exactly once messaging from kafka during data production?
What are the limitations of importing RDBMS tables into Hcatalog directly?
How hdfs is different from traditional file systems?
Explain about the replication and multiplexing selectors in Flume?
When does the queue full exception emerge inside the manufacturer?
In the Producer, when does QueueFullException occur?
How can we change the split size if our commodity hardware has less storage space?
Which all languages Apache Spark supports?
What do we mean by Paraquet?
What are channel selectors?
What is difference between hadoop and spark?
What is table in hbase?