Hadoop (4218)
Big Data General (104)
Big Data AllOther (3) Hadoop achieves parallelism by dividing the tasks across many nodes, it is possible for a few slow nodes to rate-limit the rest of the program and slow down the program. What mechanism Hadoop provides to combat this?
509Is it possible to provide multiple input to Hadoop? If yes then how can you give multiple directories as input to the Hadoop job?
482
Explain leftOuterJoin() and rightOuterJoin() operation in Apache Spark?
How many ways we can create rdd in spark?
How Big Data can help increase the revenue of the businesses?
What is the difference between Internal Table and External Table in Hive?
How to Rename a table in Hive
Does spark load all data in memory?
What are some of the apache pig use cases you can think of?
Assume that an HBase table Student is disabled. So, how to access the student table once it is disabled, by using Scan command?
How to submit extra files(jars,static files) for MapReduce job during runtime in Hadoop?
Elucidate the concept of cap theorem?
Explian the Limitations of HBase?
What is key-value store db?
How to create RDD?
What is spark etl?
What is the Virtual Node in Cassandra ?