Hadoop (4218)
Big Data General (104)
Big Data AllOther (3)
How is RDD in Apache Spark different from Distributed Storage Management?
What is dataframe api?
List Hadoop’s three configuration files?
What is throughput? How does hdfs provides good throughput?
When do you have to avoid secondary indexes?
What load do concurrent queries produce on the namenode?
Why was spark created?
What is the significance of cluster class in Cassandra?
What is the FlatMap Transformation in Apache Spark RDD?
Explain leftOuterJoin() and rightOuterJoin() operation in Apache Spark?
Are multiline comments supported in Hive?
Can you explain difference between apache mahout and apache spark’s mllib?
How to overwrite an existing output file/dir during execution of Hadoop MapReduce jobs?
What are the relational operators available related to combining and splitting in pig language?
What is an identity mapper and identity reducer?