Apache Hadoop (394)
MapReduce (354)
Apache Hive (345)
Apache Pig (225)
Apache Spark (991)
Apache HBase (164)
Apache Flume (95)
Apache Impala (72)
Apache Cassandra (392)
Apache Mahout (35)
Apache Sqoop (82)
Apache ZooKeeper (65)
Apache Ambari (93)
Apache HCatalog (34)
Apache HDFS Hadoop Distributed File System (214)
Apache Kafka (189)
Apache Avro (26)
Apache Presto (15)
Apache Tajo (26)
Hadoop General (407)
Hadoop achieves parallelism by dividing the tasks across many nodes, it is possible for a few slow nodes to rate-limit the rest of the program and slow down the program. What mechanism Hadoop provides to combat this?
What is a yaml file in cassandra?
Explain the core components of hadoop?
What is sc parallelize?
How can an application connect to Hive run as a server?
What happens when two clients try to access the same file on HDFS?
what is distributed cache in mapreduce framework?
what is storage and compute nodes?
How is Ambari different from ZooKeeper?
List of some best tools that can be useful for data-analysis?
How would you use Map/Reduce to split a very large graph into smaller pieces and parallelize the computation of edges according to the fast/dynamic change of data?
What is azure spark?
What kind of applications is supported by Apache Hive?
Can rdd be shared between sparkcontexts?
Explain Machine Learning library in Spark?