Compiling SVN Tesseract on OSX

Should ever the need to compile Tesseract from SVN arise (version v3.01 at the time of the writing) Please note:

In order to fetch the source issue:

bash-3.2$ svn checkout http://tesseract-ocr.googlecode.com/svn/trunk/ tesseract-ocr-read-only

you have to install Leptonica beforehand (or via macports like me)

bash-3.2$ sudo port install leptonica

if you want to use autotools and libtool from macports (again like me) you’ll have to hack the  runautoconf in the tesseract source directory (‘tesseract-ocr-read-only\‘) prior to running it to call glibtoolize instead of libtoolize, rumor has it that libtoolize has been renamed to glibtoolize by the MacrPorts maintainers to avoid eclipsing  the apple /usr/bin/libtoolize from apple (that conveniently  enough is not compatible with it’s GNU counter part). following is the modified line in runautoconf:

.
.
.
echo "Running libtoolize"
glibtoolize
.
.
.

The next step is to run the modified runautoconf:

bash-3.2$ ./runautoconf

next you’ll have to hack the tesseract ./configure script to include where macports installs leptonica (which is /opt/local/include/leptonica)

.
.
.
have_lept=no
if test "$LIBLEPT_HEADERSDIR" = "" ; then
  LIBLEPT_HEADERSDIR="/usr/local/include /usr/include /opt/local/include/leptonica"
fi
.
.
.

if you skip or mess up the previous step you’ll see  the following error when runnig ./configure:

bash-3.2$ ./configure
checking build system type... i686-apple-darwin9.8.0
.
.
checking for Leffler libtiff library... checking linking with -ltiff... ok
setting LIBTIFF_CFLAGS=
setting LIBTIFF_LIBS=-ltiff
checking for leptonica... configure: error: leptonica not found

it’s easy to forget to run runautoconf script before running ./configure.
and … offcourse call 'make'

note that it is also important to call

sudo make install

in order for the language files to be copied to the location Tesseract expects to find them at.

About these ads
This entry was posted in foss, ocr, OSX and tagged , , . Bookmark the permalink.

12 Responses to Compiling SVN Tesseract on OSX

  1. kmcital says:

    All looked fine (sort of) until I did the final make install and got this:

    $ sudo make install
    Making install in ccstruct
    /bin/sh ../libtool –tag=CXX –mode=compile g++ -DHAVE_CONFIG_H -I. -I.. -I../ccutil -I../cutil -I../image -I../viewer -I/opt/local/include -I/opt/local/include/leptonica/. -g -O2 -MT blobbox.lo -MD -MP -MF .deps/blobbox.Tpo -c -o blobbox.lo blobbox.cpp
    mv -f .deps/blobbox.Tpo .deps/blobbox.Plo
    mv: rename .deps/blobbox.Tpo to .deps/blobbox.Plo: No such file or directory
    make[2]: *** [blobbox.lo] Error 1
    make[1]: *** [install-recursive] Error 1
    make: *** [install-recursive] Error 1

    • mcradle says:

      not sure what blobbox.Plo is but I do have such a file in my tessearct source directory.
      cd temp/ocr/tesseract-ocr-read-only

      computer:tesseract-ocr-read-only mcradle$ find ./ -iname “blobbox.Plo”
      //ccstruct/.deps/blobbox.Plo

      • Spruce says:

        This is the Problem, because the script want’s to copy the Tpo to the Plo file but can’t find the Tpo. When copying same Plo to Tpo it compiles for quit some time.

  2. saved me some time here, thanks!

  3. Pingback: Compiling tesseract OCR | Zen of Linux

  4. Kasim says:

    Just wondering why I can’t compile in my optware device, saved me! Thx

  5. Pingback: goes Zen » Compiling tesseract OCR

  6. acsrdesign says:

    Seems that in the current svn checkout of 3.0.2 revision 725+ the file runautoconfig is changed to autogen.sh and configure is renamed to configure.ar ! the glibtoolize change seems to be replected in the script!

  7. acsrdesign says:

    Seems that in the current SVN checkout rev 725 of version 3.0.2 the mentioned files are not present. instead of configure, there is a file named configure.ar (ar aka autorun?) and runautoconfig seems to be replaced by autogen.sh (containing the ‘ echo “Running libtoolize”‘ line.
    In the following lines “glibtoolize” seems to be recognized!

    If I gain sucess! I come back later to document this!

  8. acsrdesign says:

    Update (tested on MacOSX 10.5.8)
    As expected: first run .autogen.sh as described in the INSTALL doks to create the “confige” file. There will be no runautoconfig anymore! It seems now to be enough to hack the tesseract ./configure script to include where macports installs leptonica by adding “/opt/local/include/leptonica” to the LIBLEPT_HEADERSDIR. Then continue as desribed.

    Do not forget to set the environment variable for tessdata:
    e.g.
    export TESSDATA_PREFIX=”/usr/local/share/”
    Please make sure the TESSDATA_PREFIX environment variable is set to the parent directory of your “tessdata” directory.

    and copy the necessary languagedata from
    ../tesseract-ocr-read-only/tessdata/[mylanguage].traineddata
    to
    /usr/local/share/tessdata/

  9. moath says:

    thnx alot ,,you saved my day :)

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s