Open IT Experts for Enterprise

Zylk empresa de desarrollo de ecommerce

How to copy files in linux faster and safer than cp

Cesar Capillas
Cesar Capillas
Sometimes a simple cp -a command is a very painful and slow
process. It’s true that -v (verbose) option can give you some
information on the details of the copy process, but not normally the
progress of it. In fact, cp -a is a quite slow process that sometimes
is faster (and safer) implemented by tar, for example:
$ tar cf - . | (cd /dst; tar xvf -)
Usually faster, and more verbose. Another commands such as pv can
help you too, to monitor the progress of a copy between two
directories, for example:
$ tar cf - . | pv | (cd /dst; tar xf -)
2,06GB 0:00:09 [ 194MB/s] [  <=>                     ]
But copying several gigabytes/terabytes of data and many files
between quite old NFS disks is painful via cp. Let’s see two
alternatives for:
  •  Monitoring the progress of the copy and the copied files.
  •  Skipping to next file before an error (gcp)
  •  Syncing directories (rsync)
  •  Copying files via network (rsync)
One of the better commands for doing copies is rsync, that allows
you to synchronize two directories, and in this sense src/ can have
live data, that incrementally is synced to dst/ in several executions
of the command
$ rsync --info=progress2 -auvz ~/Music/ /data/music/
giving a result like this:
Jake Bugg - Jake Bugg Album 2012/
Jake Bugg - Jake Bugg Album 2012/01 - Lighting Bolt.mp3
  1,913,897,967  15%   22.79MB/s    0:01:20 (xfr#277, ir-chk=1019/1825)
Jake Bugg - Jake Bugg Album 2012/05 - Simple As This.mp3
  1,936,698,070  15%   22.80MB/s    0:01:21 (xfr#281, ir-chk=1015/1825)
You can also use it with -n option to perform a dry run (this is
more used than the skype test call), that checks and lists the
differences between the two given directories. You can use it too with
«-e ssh» user@host:dst/ or without –info option in older
versions of rsync. It is slower for copying but it does a lot of
useful things such syncing, checkings md5sums…. You will remember
rsync if something goes bad.
Another fantastic command for copy is gcp. Besides of progress
estimation, gcp does not copy when the file exists, skips to the next
file if occurs an error, and all the fails are written to a journal file.
$ gcp -rv ~/Music/* /data/music/
Copying 13.53 GiB   2% |#                                  | 165.50 MB/s ETA:  0:01:25
Please check journal: /home/cesar/.gcp/journal
$ cat /home/cesar/.gcp/journal

/home/cesar/Music/Alabama Shakes-Boys & Girls (2014)/01 - Alabama Shakes - Hold On.mp3
FAILED: already exists
/home/cesar/Music/Alabama Shakes-Boys & Girls (2014)/03 - Alabama Shakes - Hang Loose.mp3
FAILED: already exists
In an Alfresco context, many simple migrations (or restoring
processes) are tracked via CIFS or Webdav drives. In these cases the
above commands are useful. Even they can be useful, if you are doing a
local copy in an Alfresco instance, for performing a later Filesystem
Bulk process in Alfresco. From a system administrator point of
view, when restoring huge contentstores or Lucene / SOLR indices, or
moving backups, these commands can save you so much time.
Another day we took some time in alternatives for scp copies
between two machines.
Some useful links for reading and just patience for copying:
  • http://www.linuxtecnico.es/2014/04/benchmark-cp-vs-tar-vs-cpio-vs-rsync.html
  • http://askubuntu.com/questions/17275/progress-and-speed-with-cp

NOTE: ~/Music and /data/music are simple tests on a local SSD disk.

Si te ha parecido interesante comparte este post en RRS

Facebook
LinkedIn
Telegram
Email

Leer más sobre temas relacionados

Deja un comentario

Tu dirección de correo electrónico no será publicada. Los campos obligatorios están marcados con *