验证码识别工具GraphicsMagick1.3.25和Tesseract-OCR4.0.0 Linux安装部署过程记录
GraphicsMagick1.3.25安装
安装相关依赖包
yum install -y gcc libpng libjpeg libpng-devel libjpeg-devel ghostscript libtiff libtiff-devel freetype freetype-devel
下载GraphicsMagick
wget ftp://ftp.graphicsmagick.org/pub/GraphicsMagick/1.3/GraphicsMagick-1.3.25.tar.gz
解压GraphicsMagick
tar -zxvf GraphicsMagick-1.3.25.tar.gz
编译安装
cd GraphicsMagick-1.3.25
./configure
make && make install
验证安装
gm version
输出一下信息说明安装成功
GraphicsMagick 1.3.21 2015-02-28 Q8 http://www.GraphicsMagick.org/
Copyright (C) 2002-2014 GraphicsMagick Group.
Additional copyrights and licenses apply to this software.
See http://www.GraphicsMagick.org/www/Copyright.html for details.
Feature Support:
Native Thread Safe yes
Large Files (> 32 bit) yes
Large Memory (> 32 bit) yes
BZIP no
DPS no
FlashPix no
FreeType yes
Ghostscript (Library) no
JBIG no
JPEG-2000 no
JPEG yes
Little CMS no
Loadable Modules no
OpenMP yes (201107)
PNG yes
TIFF yes
TRIO no
UMEM no
WebP no
WMF no
X11 no
XML no
ZLIB yes
Host type: x86_64-unknown-linux-gnu
Configured using the command:
./configure
Final Build Parameters:
CC = gcc -std=gnu99
CFLAGS = -fopenmp -g -O2 -Wall -pthread
CPPFLAGS = -I/usr/include/freetype2
CXX = g++
CXXFLAGS = -pthread
LDFLAGS = -L/usr/lib
LIBS = -ltiff -lfreetype -ljpeg -lpng15 -lz -lm -lgomp -lpthread
OK,至此,GraphicsMagick安装完成。
Tesseract-OCR4.0.0 的安装
依赖安装
yum install -y autoconf automake libtool libjpeg libpng libtiff zlib libjpeg-devel libpng-devel libtiff-devel zlib-devel
下载安装Leptonica
wget http://www.leptonica.org/source/leptonica-1.77.0.tar.gz
tar -zxvf leptonica-1.77.0.tar.gz
cd leptonica-1.77.0
./configure
make && make install
下载安装Tesseract-OCR
wget https://github.com/tesseract-ocr/tesseract/archive/4.0.0.tar.gz
tar -zxvf 4.0.0.tar.gz
./autogen.sh
./configure --with-extra-includes=/usr/local/include --with-extra-libraries=/usr/local/include
make && make install
这里./configure 会提示错误
error: Leptonica 1.74 or higher is required. Try to install libleptonica-dev package"
解决办法如下:
vi /etc/profile
把下面配置添加到最后
export LD_LIBRARY_PATH=$LD_LIBRARY_PAYT:/usr/local/lib
export LIBLEPT_HEADERSDIR=/usr/local/include
export PKG_CONFIG_PATH=/usr/local/lib/pkgconfig
再执行source /etc/profile
命令,让配置立即生效
之后再
./configure --with-extra-includes=/usr/local/include --with-extra-libraries=/usr/local/include
make && make install
检测Tesseract-OCR 支持的语言
cd 到Tesseract-OCR 指令安装目录:/usr/local/bin/tesseract
github下载全套tessdata_fast并上传至/usr/local/share/文件夹下,将tessdata_fast改名为tessdata,(建议下载需要的语言包:eng.traineddata、chi_sim.traineddata)
github链接验证安装,出现类似如下表示安装成功
tesseract -v
tesseract 4.00.00alpha
leptonica-1.77.1
libgif 4.1.6(?) : libjpeg 8d (libjpeg-turbo 1.5.0) : libpng 1.6.20 : libtiff 4.0.6 : zlib 1.2.8 : libwebp 0.4.3 : libopenjp2 2.1.0
至此 安装完成
记录下在node下使用的过程
下载两个npm包
npm install gm
npm install node-tesr
代码:
gm(imgPath)
.threshold(62,true)
.write(imgPath, (err) => {
if(!err){
tesseract(imgPath,{ l: 'eng', oem: 3, psm: 3 }, function(err, data) {
// 此处获得识别内容
data = data.replace(/\s+/g,"");
console.log("识别到验证码:", data);
})
}
});