What is distributed cache in Hadoop?

A distributed cache is a facility in the Hadoop MapReduce framework that is used to cache files. These cached files are frequently needed by the applications. It is used to cache read-only files such as text files, archives, jar files, etc. Hadoop will use the cached file for the job on each data node where the map/reduce tasks are running. When the task is completed, these files are deleted from the data nodes. A distributed cache is used because we can send a copy of the file to all data nodes instead of reading the file every time from the HDFS.

What is distributed cache in Hadoop?

Related Questions

Related Questions

What is iNode on Linux and more details on that?

How do you check physical memory on Linux?

How do you copy a local file to HDFS?

What is distributed cache in Hadoop?

Related Questions

Subscribe Our NewsLetter

Recent Articles

Featured Categories

Related Questions

Amazon DevOps Engineer Interview Questions

What is iNode on Linux and more details on that?

Amazon DevOps Engineer Interview Questions

How do you check physical memory on Linux?

Amazon DevOps Engineer Interview Questions

How do you copy a local file to HDFS?