Apache Hadoop (394)
MapReduce (354)
Apache Hive (345)
Apache Pig (225)
Apache Spark (991)
Apache HBase (164)
Apache Flume (95)
Apache Impala (72)
Apache Cassandra (392)
Apache Mahout (35)
Apache Sqoop (82)
Apache ZooKeeper (65)
Apache Ambari (93)
Apache HCatalog (34)
Apache HDFS Hadoop Distributed File System (214)
Apache Kafka (189)
Apache Avro (26)
Apache Presto (15)
Apache Tajo (26)
Hadoop General (407) How will you calculate the number of executors required to do real-time processing using Apache Spark? What factors need to be considered for deciding on the number of nodes for real-time processing?
318In a given spark program, how will you identify whether a given operation is Transformation or Action ?
361
Explain Multi-tenancy?
How to configure the number of the Combiner in MapReduce?
Define data replication?
State some impala hadoop benefits?
What do the master class and the output class do?
How to skip header rows from a table in Hive?
Explain the hdfs architecture and list the various hdfs daemons in hdfs cluster?
What is a block and block scanner in HDFS?
Does spark sql use hive?
Define various running modes of apache spark?
What are the differences between relational databases and impala?
What is a Backup node in Hadoop?
What is the use of spark sql?
What is an "RDD Lineage"?
Mention what are the main configuration parameters that user need to specify to run mapreduce job?