Hadoop (4218)
Big Data General (104)
Big Data AllOther (3) Hadoop achieves parallelism by dividing the tasks across many nodes, it is possible for a few slow nodes to rate-limit the rest of the program and slow down the program. What mechanism Hadoop provides to combat this?
461Is it possible to provide multiple input to Hadoop? If yes then how can you give multiple directories as input to the Hadoop job?
439
Explain cap theorem?
List commonly used machine learning algorithm?
Should the region server be located on all DataNodes?
What is the role of the namenode?
While reading data from hbase, from which three places data will be reconciled before returning the value?
How Cassandra stores data?
Define HRegionServer in HBase
What is action, how it process data in apache spark
What do you mean by the High Availability of a NameNode in Hadoop HDFS?
List the files associated with metadata in hdfs?
What are the various data sources available in SparkSQL?
How can you use adminclient api?
What is the command to start and stop the Spark in an interactive shell?
What is the difference between an RDBMS and Hadoop?
Define Apache Pig?