Backups with tar

tar

tar is an archiving utility, it creates and extracts (compressed) archives (aka tarballs).

tar -czpf foo.tar.gz sourceFiles file1 file2 # creates compressed archive
tar -xpf foo.tar.gz # extracts archive
tar -xpf foo.tar.gz -C dest/ # extracts archive in the `dest/` directory
  • c or --createcreates
  • x or --extract extracts
  • z or --gzip/--gunzip zip, compresses or uncompresses the archive with gzip
  • p or --preserve-permissions preserves file and directory permissions
  • f provide the File name (foo.tar.gz in the above example)

  • Compress your backups for faster transfer, less bandwidth usage, and less disk space usage (you will get charged for the disk space and bandwidth if you’re transferring backups off-site, to a service like Amazon S3).

  • Since backups are usually automated, you can skip -v for verbosity.

ownership

You can optionally preserve and restore file ownerships as well with the -s, --preserve --same-owner flags.

s
                             from the archive (default for ordinary users)
      --numeric-owner        always use numbers for user/group names
      --owner=NAME           force NAME as owner for added files
  -p, --preserve-permissions, --same-permissions
                             extract information about file permissions
                             (default for superuser)
      --preserve             same as both -p and -s
      --same-owner           try extracting files with the same ownership as
                             exists in the archive (default for superuser)

add backup dates

tar -czpf foo.'/bin/date + \%y%m\%d'.tar.gz

compression

  • bzip is the best in terms of compression ration, but is very CPU and RAM intensive
  • gzip has a decent compression ratio, and a decent resource usage

See what’s inside a backup

You might want to for different reasons.. Let’s say you want to find out what date the files inside a tarball were backed up/created

tar -tf foo.tar.gz # list the files in the tar archive
tar -tvf foo.tar # list all files in foo.tar verbosely (permissions, ownerships, file size, time)
tar --list -f foo.tar.gz # -t and --list are the same thing (equivalent of `tar -tf foo.tar.gz`)
# tar -tf foo.tar.gz
foo/
foo/file2.txt
foo/file3.txt
foo/file9.txt
foo/file4.txt
foo/file1.txt
foo/file5.txt
foo/file8.txt
foo/file7.txt
foo/file6.txt
# tar -tvf foo.tar.gz
drwxr-xr-x root/root         0 2017-08-17 06:48 foo/
-rw-r--r-- root/root         0 2017-08-17 06:45 foo/file2.txt
-rw-r--r-- root/root         0 2017-08-17 06:45 foo/file3.txt
-rw-r--r-- root/root         0 2017-08-17 06:45 foo/file9.txt
-rw-r--r-- root/root         0 2017-08-17 06:45 foo/file4.txt
-rw-r--r-- root/root         0 2017-08-17 06:48 foo/file1.txt
-rw-r--r-- root/root         0 2017-08-17 06:45 foo/file5.txt
-rw-r--r-- root/root         0 2017-08-17 06:45 foo/file8.txt
-rw-r--r-- root/root         0 2017-08-17 06:45 foo/file7.txt
-rw-r--r-- root/root         0 2017-08-17 06:45 foo/file6.txt

A bash script to automate the whole thing

Here’s a script that i have used on one of my sites. It creates a file backup of a website in /var/www and saves it in a backups directory on the server. It also deletes backups older than 5 days, and can optionally sync backups to S3.

DIR='/backups'
TIMESTAMP=`date +%Y%b%d`
YEAR=`date +%Y`

# Create & Compress
echo "Backing up: foo.com"
tar -czpf ${DIR}/${TIMESTAMP}.foo.com.tar.gz /var/www/foo.com/public_html/

echo "Success: backup created"

# Delete old backups (older than 5 days)
echo "Deleting old backups.."
find ${DIR}/${YEAR}*.*.tar.gz -type f -lastmod +5 -delete
# -delete might not work on all systems
#find ${DIR}/${YEAR}*.*.tar.gz -type f -lastmod +5 -exec rm -f {} \;

# Sync to S3
# s3cmd sync /backups/ s3://s3.foo.com/
# echo "Success: backup synced with S3"