open-source character recognition
Main index|
Download|
Screenshots|
Examples|
Developers|
Support|
Links
News
- July 22, 2001
- Released version 0.7.1. The MDK will be released soon.
- June 24, 2001
- Current CVS version is, for the first time, working from head to tail.
A new packaged version will be released soon.
About
libgocr is an attempt to create a library that have all functionality that you
may need to develop a OCR engine. Instead of wasting time to write I/O
functions, linked lists, all the steps in the recognition process, etc, etc,
just code your new revolutionary algorithm at once!
libgocr is completely modular, using a plugin system: you can have tens of
plugins to process your text, resulting in a much more precise recognition.
This system also stimulates recognition of not-standard text: think about
partitures, equations, block diagrams, etc: it all can be done, and, what's
best, TOGETHER.
Features
- File input: support most common image types.
- Unicode support.
- Module system allows you to develop specific code (such as only
segmentation, etc) and integrate with existing code, without
recompilation.
- Useful bits of code: linked lists, hash tables.
- Automatic parallelism (not implemented yet).
- Debug routines.
- Frontend comm system (not implemented yet).
- More. :)
Current state
libgocr is still a new project, in beta. It's written completely from scratch,
taking advantage of what we learned coding the original gocr program. Here's
our plan:
- Develop libgocr until it's stable. In parallel, continue the development
of original gocr, focusing only in the recognition engine.
- Once libgocr is stable and usable, gocr will be converted into a plugin,
probably named "gocr main module" (gmm).
- From now on, libgocr and gmm will be developed in parallel. Since libgocr
is just the API, we expect most of the wrok will be directed to gmm.
WE NEED DEVELOPERS, i.e., people that ACTIVELY code. And, of course, your
comments and ideas are VERY apreciated. Tell us what you think.
About the Module Development Kit which is mentioned in libgocr
documentation: it exists, but since the module system is not working (well)
yet, and the API is likely do change, the MDK was not released. If you want
it, a ask me. UPDATE: See the developers
section below for a preview of the MDK.
Documentation
The API is well documented: there's a manually written LaTeX file which is
almost a tutorial, which comes in the package. You can get the
gzipped postscript here. You can browse online automatic generated documentation using Doxygen.
It comes in the package too, but you need Doxygen to build it.
Download
You can get it here. Remember, it's a
development version, and we recommend that you do not "make install". Some test
programs are available.
Developers
(2001/06/24): just uploaded a preview of the Module
Development Kit, with a simple module that I have been using to test and debug
libgocr. The MDK will be officialy released soon.
Contact us: see the Support page.
jOCR is at
since June 2000
(announcement mailing list, etc.)