2025-04-02
2025-03-20
2025-02-27
Abstract—The data replication is a crucial process to increase the performance of distributed systems by replicating multiple replicas of same data at different sites. With the evolution of the cloud storage, many people are transferring their crucial data to the cloud storage. Replication is used in cloud storage systems to improve availability, replication factor, access time and storage efficiency. Replication factor of a file refers to the number of replicas or copies of that file in the system. Every file in HDFS is stored in the form of blocks. HDFS has a default replication factor of 3 for all file blocks. In our proposed strategy a novel replication algorithm is proposed it decides the replication factor based on the support values of the file block then we will find the Frequent Block Access Pattern of the popular files and the placement of the frequent access pattern of the file blocks is based on the local support values at each data nodes. According to the results of the performance analysis, the proposed strategy's algorithm outperforms the Hadoop distributed file system's replication process.