2014/10/05

Build Tesseract OCR library 3.02.02 with Qt 5.1 on Windows

The different steps in this article have been done:
  • under Windows 7 Home Premium.
  • with Qt 5.1.


Tesseract OCR source code

Download tesseract-ocr-3.02.02.tar.gz and extract it.

Leptonica library

From the Leptonica web site:
Leptonica is a pedagogically-oriented open source site containing software that is broadly useful for image processing and image analysis applications.
Leptonica is quite tedious to build because of all its dependencies. Fortunately, someone did this work for us.

Here is the link to his repository: https://github.com/zdenop/tesseract-mingw .

Many thanks to zdenop for saving us time!

Download the following libraries from the bin folder:
  • libgif-4.dll
  • libjbig-1.dll
  • libjpeg-8.dll
  • liblept-3.dll : the Leptonica library.
  • libpng15-15.dll
  • libtiff-3.dll
  • libtiffxx-3.dll
  • libwebp-2.dll
  • zlib1.dll
Maybe, you've noticed that a libtesseract-3.dll is also available. I've tried to use it in my projects but it didn't work. That's why I've decided to build it my way.

You must also get the source code. I didn't use the header files in zdenop's repo but you could try. I used the original headers from Leptonica version 1.69.

Extract Leptonica archive, create a bin directory in the new folder then copy all the libraries mentioned above in it.

tesseract-ocr.pro file

In this section, we won't analyze the whole file but only the lines you will have to understand.
#_-_-_-_-_-_SOME DIRECTORIES_-_-_-_-_-_
OCR_DIR = D:/prog/ocr
LEPTONICA_DIR = $$OCR_DIR/leptonica-1.69
MINGW_LIB_DIR = D:/Programs/Qt/Qt5.1.0/Tools/mingw48_32/i686-w64-mingw32/lib
#-_-_-_-_-_-
  • OCR_DIR : base directory for my OCR tools.
  • LEPTONICA_DIR : Leptonica extraction directory.
  • MINGW_LIB_DIR : this one is needed to link against winsock2 library.
DESTDIR = ../tesseract-ocr_release
The build output directory.
DEFINES += _tagBLOB_DEFINED
DEFINES += USE_STD_NAMESPACE
DEFINES += WINDLLNAME=\\\"$$TARGET.dll\\\"
Here, we add preprocessor definitions.
  • _tagBLOB_DEFINED : to avoid conflicting declarations between wtypes.h (MinGW) and platform.h (tesseract) if you work with Qt.
  • USE_STD_NAMESPACE : I have not searched its exact purpose but it must be declared.
  • WINDLLNAME : used by ccutil files.
#_-_-_-_-_-_LINKING_-_-_-_-_-_
win32:LIBS += $$LEPTONICA_DIR/bin/liblept-3.dll
win32:LIBS += $$MINGW_LIB_DIR/libws2_32.a
#-_-_-_-_-_-
Linking against Leptonica and winsock2 libraries.

tesseract-ocr.pro can be downloaded from my repository:

https://github.com/broija/tesseract_ocr_qt

8 comments:

  1. Hi, I'm able to compile the DLL. Where are the header files? I want to be able to call tesseract API within my program.

    Thanks.

    ReplyDelete
    Replies
    1. Hi,

      Headers are spread across different directories in the tesseract-ocr archive:
      - api
      - ccmain
      - ccstruct
      - ccutil
      - ...

      Take a look at the ".pro" file to see the complete header list: https://github.com/broija/tesseract_ocr_qt/blob/master/tesseract-ocr.pro

      You can also check out a project I'm currently working on that is based on tesseract-ocr. Maybe its project file could help you:

      https://github.com/broija/subdetection

      Let me know if you have more questions.

      Regards.

      Delete
  2. This comment has been removed by the author.

    ReplyDelete
  3. hi, when i try to compile the dll i got :
    No rule to make target 'api/baseapi.cpp', needed by '../tesseract-ocr_release/obj/baseapi.o'. Stop

    any ideas ?

    thanks

    ReplyDelete
    Replies
    1. unzip tesseract source code to Broija's project folder

      Delete
    2. Hello Dorian,

      Sorry for the late answer. Did you succeed in compiling the DLL?

      Thanks for helping him Pavel!

      Delete
  4. Hi,
    Initially I faced the same problem faced by Dorian Haye, So I unzipped tesseract source code to the project file. But It does not work. I am getting following error
    1. C:/Qt/Qt5.4.1/Tools/mingw491_32/bin/../lib/gcc/i686-w64-mingw32/4.9.1/../../../../i686-w64-mingw32/bin/ld.exe:E:/ocr/leptonica-1.69/bin/liblept-3.dll: file format not recognized; treating as linker script

    2. C:/Qt/Qt5.4.1/Tools/mingw491_32/bin/../lib/gcc/i686-w64-mingw32/4.9.1/../../../../i686-w64-mingw32/bin/ld.exe:E:/ocr/leptonica-1.69/bin/liblept-3.dll:4: syntax error

    3 collect2.exe: error: ld returned 1 exit status

    I am using QT 5.4.1 on window 7 professional and MinGW 4.9.1 32 bit compiler .
    I have done following change to the .pro file
    I have kept leptonica-1.69 inside E:/ocr folder.
    project directory is: E:\tessetact-ocr
    Compiler location: C:\Qt\Qt5.4.1\Tools\mingw491_32\bin

    I made following changes to the .pro file
    #_-_-_-_-_-_SOME DIRECTORIES_-_-_-_-_-_
    OCR_DIR = E:/ocr
    LEPTONICA_DIR = $$OCR_DIR/leptonica-1.69
    MINGW_LIB_DIR = C:/Qt/Qt5.4.1/Tools/mingw491_32/i686-w64-mingw32/lib
    #-_-_-_-_-_-

    ReplyDelete
    Replies
    1. Hi,

      I will try with the very same versions of Qt and MinGW. I've downloaded them from this link:

      http://download.qt.io/official_releases/qt/5.4/5.4.1/qt-opensource-windows-x86-mingw491_opengl-5.4.1.exe

      Could you please confirm that it matches your versions? You can make an md5 sum of this installer and the one you used to install in order to be sure.

      Delete