How to copy files in linux faster and safer than cp

Sometimes a simple cp -a command is a very painful and slow process. It's true that -v (verbose) option can give you some information on the details of the copy process, but not normally the progress of it. In fact, cp -a is a quite slow process that sometimes is faster (and safer) implemented by tar, for example:
$ tar cf - . | (cd /dst; tar xvf -)
Usually faster, and more verbose. Another commands such as pv can help you too, to monitor the progress of a copy between two directories, for example:
$ tar cf - . | pv | (cd /dst; tar xf -)
2,06GB 0:00:09 [ 194MB/s] [  <=>                     ]
But copying several gigabytes/terabytes of data and many files between quite old NFS disks is painful via cp. Let's see two alternatives for:
  •  Monitoring the progress of the copy and the copied files.
  •  Skipping to next file before an error (gcp)
  •  Syncing directories (rsync)
  •  Copying files via network (rsync)
One of the better commands for doing copies is rsync, that allows you to synchronize two directories, and in this sense src/ can have live data, that incrementally is synced to dst/ in several executions of the command
$ rsync --info=progress2 -auvz ~/Music/ /data/music/
giving a result like this:
Jake Bugg - Jake Bugg Album 2012/
Jake Bugg - Jake Bugg Album 2012/01 - Lighting Bolt.mp3
  1,913,897,967  15%   22.79MB/s    0:01:20 (xfr#277, ir-chk=1019/1825)
Jake Bugg - Jake Bugg Album 2012/05 - Simple As This.mp3
  1,936,698,070  15%   22.80MB/s    0:01:21 (xfr#281, ir-chk=1015/1825)
You can also use it with -n option to perform a dry run (this is more used than the skype test call), that checks and lists the differences between the two given directories. You can use it too with "-e ssh" user@host:dst/ or without --info option in older versions of rsync. It is slower for copying but it does a lot of useful things such syncing, checkings md5sums.... You will remember rsync if something goes bad.
Another fantastic command for copy is gcp. Besides of progress estimation, gcp does not copy when the file exists, skips to the next file if occurs an error, and all the fails are written to a journal file. 
$ gcp -rv ~/Music/* /data/music/
Copying 13.53 GiB   2% |#                                  | 165.50 MB/s ETA:  0:01:25
Please check journal: /home/cesar/.gcp/journal
$ cat /home/cesar/.gcp/journal

/home/cesar/Music/Alabama Shakes-Boys & Girls (2014)/01 - Alabama Shakes - Hold On.mp3
FAILED: already exists
/home/cesar/Music/Alabama Shakes-Boys & Girls (2014)/03 - Alabama Shakes - Hang Loose.mp3
FAILED: already exists
In an Alfresco context, many simple migrations (or restoring processes) are tracked via CIFS or Webdav drives. In these cases the above commands are useful. Even they can be useful, if you are doing a local copy in an Alfresco instance, for performing a later Filesystem Bulk process in Alfresco. From a system administrator point of view, when restoring huge contentstores or Lucene / SOLR indices, or moving backups, these commands can save you so much time.
Another day we took some time in alternatives for scp copies between two machines.
Some useful links for reading and just patience for copying:

NOTE: ~/Music and /data/music are simple tests on a local SSD disk. 


Más entradas de blog


Añadir comentarios

xxz zxzsda Hace 3 años

Great information

Fred Lechat Hace 3 años

You should remove the z option from rsync: this compresses/uncompresses all the content. This option should only be used when using rsync accross the network (I don't even understand why this option has an effect when syncing local drives)


chris r Hace 3 años

I was going crazy trying to copy 5GB~ of development files to an external SSD in Windows 10 (which I use for some legacy programs and occasional gaming).  Extremely slow copy speeds - from 0-400k(ish)/s!  And, yes, attempted most/all proposed solutions found online, without success.


Then I booted up my 'travel stick' -- Manjaro installed on a thumb drive -- and rsync'd the same files from/to the same drives at 12MB/s.


So, many thanks!  I'd sort of forgotten about Rsync after using it quite a bit several years ago in a homespun backup script.

Sam William Hace 3 años

Thank you so much for the informative post. I forgot about Rsync after using it many years ago. Your post contains lots of valuable information.

<b><a href="">rsync for windows</a></b>

Jose Antonio Gutierrez Hace 1 año

Thanks you Cesar Capillas. Very help helpful, and I was able to solve my copying issue

Matt Boobs Hace 1 año

Thank you for this - I implemented the tar cf - . | pv | (cd /dst; tar xf -) and 'screen' between local zfs datasets on a 4TB transfer. Getting ~200Mib/s on pooled hard disks. I wanted to use zfs send/receive but destination dataset is encrypted and will not allow it. 

Rick Ster Hace 12 meses

I use tar a lot, but I find that with "pv" it actually slows the xfer by ~5-10%.

Instead, I normally use tar xvf ...