Apache Hadoop (394)
MapReduce (354)
Apache Hive (345)
Apache Pig (225)
Apache Spark (991)
Apache HBase (164)
Apache Flume (95)
Apache Impala (72)
Apache Cassandra (392)
Apache Mahout (35)
Apache Sqoop (82)
Apache ZooKeeper (65)
Apache Ambari (93)
Apache HCatalog (34)
Apache HDFS Hadoop Distributed File System (214)
Apache Kafka (189)
Apache Avro (26)
Apache Presto (15)
Apache Tajo (26)
Hadoop General (407)
State some key Points about Apache Avro?
Why do we need Hadoop?
What is the difference between an inputsplit and a block?
Explain what do you understand by cassandra- cql collections?
Give the sqoop command to see the content of the job named myjob?
Explain a scenario where you will be using spark streaming.
How are joins performed in impala?
What is Distributed Cache?
What are the port numbers of namenode, job tracker and task tracker?
What is the default replication factor?
What is Data Locality in Hadoop?
Define hadoop archives?
Which among the two is preferable for the project- Hadoop MapReduce or Apache Spark?
Explain what is the function of mapreduce partitioner?
What happens to job tracker when namenode is down?