Hadoop (4218)
Big Data General (104)
Big Data AllOther (3)
What are all stats classes in the org.apache.pig.tools.pigstats package?
What is single node cluster in Hadoop? for what all purposes Hadoop run on a single node cluster?
What is partitioning key?
What is the problem in having lots of small files in hdfs?
What happen if the number of the reducer is 0 in MapReduce?
Do I need to know hadoop to learn spark?
List some use cases where Spark outperforms Hadoop in processing.
How does spark work with python?
Difference Between Hadoop and HDFS?
Can we run spark without hadoop?
what is difference between pig and sql?
How does reducebykey work in spark?
What are the major features/characteristics of rdd (resilient distributed datasets)?
Is spark written in scala?
Can you explain speculative execution?