What is distributed cache in Hadoop?

devquora
devquora

Posted On: Feb 22, 2018

 

A distributed cache is a facility in the Hadoop MapReduce framework that is used to cache files. These cached files are frequently needed by the applications. It is used to cache read-only files such as text files, archives, jar files, etc. Hadoop will use the cached file for the job on each data node where the map/reduce tasks are running. When the task is completed, these files are deleted from the data nodes. A distributed cache is used because we can send a copy of the file to all data nodes instead of reading the file every time from the HDFS.

    Related Questions

    Please Login or Register to leave a response.

    Related Questions

    Amazon DevOps Engineer Interview Questions

     What is iNode on Linux and more details on that?

    The iNode in Linux is an entry table containing information about the regular file and directory. It can be viewed as a data structure that contains the metadata about the files. The following are the...

    Amazon DevOps Engineer Interview Questions

    How do you check physical memory on Linux?

    There are a number of ways to check for physical memory size on Linux. The popular ways are, Free command: Type the free command to check the physical memory size. free -b //gives the size in bytes ...

    Amazon DevOps Engineer Interview Questions

    How do you copy a local file to HDFS?

    The fs put command is used to copy or upload a file from the local filesystem to the specific HDFS. Syntax fs put --from source_path_and_file --to dest_path_and_file ...