What is the benifit of Distributed cache, why can we just have the file in HDFS and have the application read it?
Answer / Vikalp Chauhan
The Distributed Cache feature in Hadoop allows applications to put additional files into the distributed cache on each node where the job runs. This feature can be useful when you need files that are not part of your input or output data, but are needed by the application for processing. Instead of reading these files from HDFS every time, they are cached on the nodes, reducing network traffic and improving performance.
| Is This Answer Correct ? | 0 Yes | 0 No |
Define data integrity? How does hdfs ensure data integrity of data blocks stored in hdfs?
What is Fault Tolerance in HDFS?
Explain what is difference between an input split and hdfs block?
File permissions in HDFS?
Explain the difference between an hdfs block and input split?
What is throughput? How does HDFS provide good throughput?
What is a namenode in hadoop?
How data or a file is written into hdfs?
How does HDFS Index Data blocks? Explain.
Why rack awareness algorithm is used in hadoop?
Characterize data integrity? How does hdfs ensure information integrity of data blocks squares kept in hdfs?
How NameNode tackle Datanode failures in HDFS?
Apache Hadoop (394)
MapReduce (354)
Apache Hive (345)
Apache Pig (225)
Apache Spark (991)
Apache HBase (164)
Apache Flume (95)
Apache Impala (72)
Apache Cassandra (392)
Apache Mahout (35)
Apache Sqoop (82)
Apache ZooKeeper (65)
Apache Ambari (93)
Apache HCatalog (34)
Apache HDFS Hadoop Distributed File System (214)
Apache Kafka (189)
Apache Avro (26)
Apache Presto (15)
Apache Tajo (26)
Hadoop General (407)