vlambda博客
学习文章列表

验证码识别工具GraphicsMagick1.3.25和Tesseract-OCR4.0.0 Linux安装部署过程记录

GraphicsMagick1.3.25安装

  1. 安装相关依赖包

yum install -y gcc libpng libjpeg libpng-devel libjpeg-devel ghostscript libtiff libtiff-devel freetype freetype-devel
  1. 下载GraphicsMagick

wget ftp://ftp.graphicsmagick.org/pub/GraphicsMagick/1.3/GraphicsMagick-1.3.25.tar.gz
  1. 解压GraphicsMagick

tar -zxvf GraphicsMagick-1.3.25.tar.gz
  1. 编译安装

cd GraphicsMagick-1.3.25
./configure
make && make install
  1. 验证安装

gm version

输出一下信息说明安装成功

GraphicsMagick 1.3.21 2015-02-28 Q8 http://www.GraphicsMagick.org/
Copyright (C) 2002-2014 GraphicsMagick Group.
Additional copyrights and licenses apply to this software.
See http://www.GraphicsMagick.org/www/Copyright.html for details.

Feature Support:
Native Thread Safe yes
Large Files (> 32 bit) yes
Large Memory (> 32 bit) yes
BZIP no
DPS no
FlashPix no
FreeType yes
Ghostscript (Library) no
JBIG no
JPEG-2000 no
JPEG yes
Little CMS no
Loadable Modules no
OpenMP yes (201107)
PNG yes
TIFF yes
TRIO no
UMEM no
WebP no
WMF no
X11 no
XML no
ZLIB yes

Host type: x86_64-unknown-linux-gnu

Configured using the command:
./configure

Final Build Parameters:
CC = gcc -std=gnu99
CFLAGS = -fopenmp -g -O2 -Wall -pthread
CPPFLAGS = -I/usr/include/freetype2
CXX = g++
CXXFLAGS = -pthread
LDFLAGS = -L/usr/lib
LIBS = -ltiff -lfreetype -ljpeg -lpng15 -lz -lm -lgomp -lpthread

OK,至此,GraphicsMagick安装完成。

Tesseract-OCR4.0.0 的安装

  1. 依赖安装

yum install -y autoconf automake libtool libjpeg libpng libtiff zlib libjpeg-devel libpng-devel libtiff-devel zlib-devel
  1. 下载安装Leptonica

wget http://www.leptonica.org/source/leptonica-1.77.0.tar.gz
tar -zxvf leptonica-1.77.0.tar.gz
cd leptonica-1.77.0
./configure
make && make install
  1. 下载安装Tesseract-OCR

wget https://github.com/tesseract-ocr/tesseract/archive/4.0.0.tar.gz
tar -zxvf 4.0.0.tar.gz
./autogen.sh
./configure --with-extra-includes=/usr/local/include --with-extra-libraries=/usr/local/include
make && make install
这里./configure 会提示错误

error: Leptonica 1.74 or higher is required. Try to install libleptonica-dev package"

解决办法如下:

vi /etc/profile把下面配置添加到最后

export LD_LIBRARY_PATH=$LD_LIBRARY_PAYT:/usr/local/lib
export LIBLEPT_HEADERSDIR=/usr/local/include
export PKG_CONFIG_PATH=/usr/local/lib/pkgconfig

再执行source /etc/profile命令,让配置立即生效
之后再

 ./configure --with-extra-includes=/usr/local/include --with-extra-libraries=/usr/local/include
make && make install
  1. 检测Tesseract-OCR 支持的语言
    cd 到Tesseract-OCR 指令安装目录:/usr/local/bin/tesseract
    github下载全套tessdata_fast并上传至/usr/local/share/文件夹下,将tessdata_fast改名为tessdata,(建议下载需要的语言包:eng.traineddata、chi_sim.traineddata)
    github链接

  2. 验证安装,出现类似如下表示安装成功

tesseract -v

tesseract 4.00.00alpha
leptonica-1.77.1
libgif 4.1.6(?) : libjpeg 8d (libjpeg-turbo 1.5.0) : libpng 1.6.20 : libtiff 4.0.6 : zlib 1.2.8 : libwebp 0.4.3 : libopenjp2 2.1.0

至此 安装完成

记录下在node下使用的过程

下载两个npm包

npm install gm
npm install node-tesr
代码:


gm(imgPath)
.threshold(62,true)
.write(imgPath, (err) => {
if(!err){
tesseract(imgPath,{ l: 'eng', oem: 3, psm: 3 }, function(err, data) {
// 此处获得识别内容
data = data.replace(/\s+/g,"");
console.log("识别到验证码:", data);
})
}
});