多线程压缩工具pigz的使用
Pigz是什么?
简单来说,Pigz就是支持并行压缩的gzip。Pigz默认用当前逻辑CPU个数来并发压缩,如果无法检测CPU个数,则默认并发8个线程,也可以使用-p指定线程数。但需要注意的是其CPU使用会比较高。
安装
yum install -y pigz
使用方法
~]# pigz --help
Usage: pigz [options] [files ...]
will compress files in place, adding the suffix '.gz'. If no files are
specified, stdin will be compressed to stdout. pigz does what gzip does,
but spreads the work over multiple processors and cores when compressing.
Options:
-0 to -9, -11 Compression level (11 is much slower, a few % better)
--fast, --best Compression levels 1 and 9 respectively
-b, --blocksize mmm Set compression block size to mmmK (default 128K)
-c, --stdout Write all processed output to stdout (won't delete)
-d, --decompress Decompress the compressed input
-f, --force Force overwrite, compress .gz, links, and to terminal
-F --first Do iterations first, before block split for -11
-h, --help Display a help screen and quit
-i, --independent Compress blocks independently for damage recovery
-I, --iterations n Number of iterations for -11 optimization
-k, --keep Do not delete original file after processing
-K, --zip Compress to PKWare zip (.zip) single entry format
-l, --list List the contents of the compressed input
-L, --license Display the pigz license and quit
-M, --maxsplits n Maximum number of split blocks for -11
-n, --no-name Do not store or restore file name in/from header
-N, --name Store/restore file name and mod time in/from header
-O --oneblock Do not split into smaller blocks for -11
-p, --processes n Allow up to n compression threads (default is the
number of online processors, or 8 if unknown)
-q, --quiet Print no messages, even on error
-r, --recursive Process the contents of all subdirectories
-R, --rsyncable Input-determined block locations for rsync
-S, --suffix .sss Use suffix .sss instead of .gz (for compression)
-t, --test Test the integrity of the compressed input
-T, --no-time Do not store or restore mod time in/from header
-v, --verbose Provide more verbose output
-V --version Show the version of pigz
-z, --zlib Compress to zlib (.zz) instead of gzip format
-- All arguments after "--" are treated as files
压缩文件
tar cf - <目录/文件/多个文件> | pigz -p 12 > 文件.tar.gz
tar cf - aaa.txt | pigz -p 12 > aaa.tar.gz
tar cf - aaa.txt bbb.txt | pigz -p 12 > file.tar.gz
解压文件
pigz -p 12 -dc file.tar.gz| tar xf -
也可以用 tar -xzvf 来进行解压。
各个压缩时间的比较(目标文件为40G):
程序 | 线程数 | 时间 |
---|---|---|
gzip | 1 | 5m28.824s |
pigz | 4 | 1m18.236s |
pigz | 8 | 0m42.670s |
pigz | 16 | 0m23.643s |
pigz | 32 | 0m17.523s |
从上面可以看出,使用多线程pigz进行压缩能进行大大的缩短压缩时间,特别是从单线程的gzip到4线程的pigz压缩时间缩短了4倍,继续加多线程数,压缩时间减少逐渐不那么明显。虽然pigz能大幅度的缩短运行时间,但这是以牺牲cpu为代价的,所以对于cpu使用较高的场景不太宜使用较高的线程数,一般而言使用4线程或8线程较为合适。