Hadoop (4218)
Big Data General (104)
Big Data AllOther (3)
Can you define rdd?
What are the features of kafka?
State some command line options?
What is shuffle spill in spark?
Explain the role of the Kafka Producer API?
What is the process of creating an Ambari client?
What is the module in HDFS?
Did you ever ran into a lop sided job that resulted in out of memory error, if yes then how did you handled it ?
What is Speculative Execution in Apache Spark?
What is Partition table in Hive?
List various commonly used machine learning algorithm?
What are the cases where Apache Spark surpasses Hadoop?
What is sink processors?
Explain keys() operation in Apache spark?
How would you use Map/Reduce to split a very large graph into smaller pieces and parallelize the computation of edges according to the fast/dynamic change of data?