Copy Files Compressed with Tar via ssh from and to a Linux Server
Reading time: 5 minutes
Copying files from a development machine or a server to another one may take up a lot of time, resources and traffic which dependent on the task may be more or less a problem. There are common Linux/Unix tools like scp and rsync to do the job, but they may be a wrong choice dependent on the task like:
- copy a lot of small files (like node_modules)
- are big like text files or images in uncompressed formats like BMP or TIFF
- have special attributes set, which you want to transport over the wire (like permissions or user/group IDs set)
- files with encoding in names, which should be just transferred with errors
The most common Solutions in both Directions
These are the most common combination to copy files using tar to compress and ssh for transport (using pipes) which may be suitable for most solutions.
Copy the folder data/ from current machine (development machine or current server) to remote server or system to folder /opt. The sever must support ssh and have tar installed, which is common for most Linux/Unix systems.
tar czf - data/ | ssh user@remoteserver "cd /opt && tar -xvzf - "
Copy the folder data/ in /opt to current machine (development machine or current server) from remote server or system. The server must support ssh and have tar installed, which is common for most Linux/Unix systems.
ssh user@remoteserver "cd /opt && tar cfz - data/" | tar xfzv -
Requirements
Both system must support ssh and have tar and gzip installed. The remote server should have a ssh-server (called sshd) running to connect to the remote machine.
Installing tar and gzip in Debian and Ubuntu based system is simple:
sudo apt install tar gzip
If you want to use bzip2 for compression install it too:
sudo apt install tar bzip2
Transferring files without compression
Sometimes you have files, which you know can’t be compressed (will get smaller) or are already compressed. Than it is faster to transfer them without compression but still with tar to have small files transferred faster over the wire:
From local machine to remote:
tar cf - data/ | ssh user@remoteserver "cd /opt && tar -xfv - "
From remote machine to local:
ssh user@remoteserver "cd /opt && tar cf - data/" | tar xfv -
This will increase the transfer speed especially on very fast connections (like from one server to another server inside a data center.
Transferring files through slow connections
Previously we used gzip for compression as it is faster than bzip2, but if the connection is slow, than you should try to compress the data as good as possible to, to reduce the amount of transferred data. We have to install bzip2 on both sides and use the y parameter for bzip2 instead of z for gzip.
From local machine to remote:
tar cfy - data/ | ssh user@remoteserver "cd /opt && tar -xyvf - "
From remote machine to local:
ssh user@remoteserver "cd /opt && tar cfy - data/" | tar xfyv -
ATTENTION: If you have a very slow CPU, the compression will take a lot of time and this way reduce the transfer speed heavily.
Disable verbose output to speedup transfer
If you transfer a lot of small files (line node_modules) you’re transfer speed may be slowed down by console output. Just remove the v parameter to stop printing transferred files in the console.
See progress of the transfer using pv
If you stop verbose output, you don’t know whether the transfer is still running. What you don’t see too, is the speed the files being transferred. This can be done using Pipe Viewer (pv).
On Debian/Ubuntu it can be installed through apt:
apt install pv
You can use it between the pipe to see the amount of data being transferred.
From local machine to remote:
tar czf - data/ | pv | ssh user@remoteserver "cd /opt && tar -xvzf - "
459MiB 0:00:08 [52.2MiB/s] [ <=> ]
From remote machine to local:
ssh user@remoteserver "cd /opt && tar cfz - data/" | pv | tar xfzv -
459MiB 0:00:09 [50.4MiB/s] [ <=> ]
See progress using buffer command
Instead of Pipe Viewer (pv) you can use buffer - very fast reblocking system.
On Debian/Ubuntu it can be installed through apt:
apt install buffer
You can use it between the pipe to see the amount of data being transferred. It can be used similar to pv but requires a parameter to print an update of progress:
- -S size After every chunk of this size has been written, print out how much has been written so far. Also prints the total throughput. By default this is not set.
From local machine to remote:
tar czf - data/ | buffer -S 10m | ssh user@remoteserver "cd /opt && tar xvzf - "
471000K, 59855K/s
From remote machine to local:
ssh user@remoteserver "cd /opt && tar cfz - data/" | buffer -s 10m | tar xfzv -
471000K, 51883K/s
Buffer can be used to cache the data and these way improve the transfer on connections with fluctuating network speed.
Conclusion
Transferring files from one machine to another using ssh and tar can greatly improve the transfer speed especially for many small files. Dependent on the task you can control the compression using g for gzip and y for bzip2 to compress the files before transferring them over the wire. If you have a fast internet connection transferring without compression is mostly the fastest way to transfer the data.
Stop output of files being transferred by removing v parameter, you can increase the speed of transfer for many small files, as the output to the console may slow down the transfer.
To see a progress in the amount of data being transferred, you can use Pipe Viewer (pv) or buffer.