Since Feb 2005 / Last update: May 31, 2009
Project page at the freshmeat.
Announcements
IMPORTANT !
Due to a maintenance of our university's power facilities,
some WeOCR servers including this web server
are due to be unavailable on Aug. 30, 2009.
The services will resume soon after the maintenance.
[May 31, 2009]
NHocr - the Japanese OCR ver 0.17 has been released.
WeOCR service is also available.
[Apr 26, 2009]
WeOCR-toolkit ver.0.13 has been released.
Reading Assistant - online edition has been released.
It uses WeOCR as a back-end OCR server.
Dear developers,
You can help visually disabled people through WeOCR. See here:
Introduction
WeOCR is a platform for Web-enabled OCR
(Optical Character Reader/Recognition) systems
that enables people to use character recognition over networks.
A WeOCR server receives document images from users,
recognize texts in the images, and return recognition results to the users.
WeOCR does not have its own character recognition engine.
Instead, it is intended to accommodate various character recognition engines.
WeOCR provides a simplified user interface
so that more people can benefit from OCR easily.
Although some people would worry about the privacy of their documents,
we think there are still a lot of applications of
OCR in which privacy does not matter.
We hope WeOCR will expand the range of OCR applications further.
Objectives
- Design the architecture of WeOCR.
- Develop a toolkit that enables OCR developers and researchers
build their own Web-based OCR sites easily.
- Encourage people to develop OCRs for various languages
and to open them to the public
either as a Web service or as a Free Software.
- Make some useful tools and libraries for Web-based OCR systems.
Features
WeOCR-toolkit has the following features.
- Receive a document image from each client computer,
pass the image to the back-end OCR engine,
generate HTML data from the result data,
and send the data back to the client.
- Uncompress the incoming image file if required.
- Limit the size of the input data to protect the server
from huge data.
- Examine the integrity of image file headers.
- Convert the input image into a common image format (PNM).
- Limit the number of jobs to prevent the server from
processing too many documents at once
and to maintain acceptable server response.
- Terminate the OCR engine after a specified time has passed,
if the engine continues running (in vain) due to
unexpected input data or bugs in the engine.
- Support server search function using spec files in XML.
Documentation
Download
License
The license is the
Apache License, Version 2.0.
(An MIT-X derivative applies to weocr-toolkit-0.12 and older.)
You don't need to open the source codes of your
OCR engine to the public, if you wish so.
TODO
- Deploy more WeOCR servers. (ASAP)
- Advertisement! (ASAP)
- Encourage researchers/developers to provide their own WeOCR services. (ASAP)
- Find open source OCRs for various languages. (midterm)
- Improve the UI. (midterm)
- Write documentations. (midterm)
-
Develop an OCR for Japanese (ASAP)
- ... etc.
Recent changes
- [Apr 26, 2009]
- WeOCR-toolkit ver.0.13 has been released.
- [Sep 26, 2008]
-
- [Sep 9, 2008]
-
- [Aug 19, 2008]
- WeOCR-toolkit ver.0.12 has been released.
- [May 12, 2007]
- WeOCR-toolkit ver.0.11 has been released.
- [Apr 8, 2007]
-
- [Feb 12, 2007]
- The server search CGI can now produce server lists in XML.
This would be useful for various web applications using WeOCR.
Pass parameter "fmt=xml" to the CGI.
- [Aug 18, 2006]
- An automatic spec collector is now up and running.
Once your server is registered for the server list,
your spec file will be examined periodically (twice a day)
and used for updating the list.
- [Jun 26, 2006]
- WeOCR-toolkit ver.0.10 has been released.
- [Jun 9, 2006]
- Hebrew OCR (hocr) has been added (see
here).
- [Jun 7, 2006]
-
- [Feb 26, 2006]
- WeOCR-toolkit ver.0.10beta has been released.
- [Feb 19, 2006]
- The OCR engine at ocr1/e1 has been updated to ocrad-0.14.
- [Jan 22, 2006]
- WeOCR-toolkit has been released (at last).
- [Jan 18, 2006]
- The project has been renamed, since the previous name ocrweb was
too popular in another community.
- [Nov 6, 2005]
- A new server with GOCR has been released.
- [Oct 14, 2005]
- The OCR engine at ocr1 has been updated to ocrad-0.13.
- A filter for adaptive thresholding has been added.
- [Oct 7, 2005]
-
- [Sep 22, 2005]
- JPEG (JFIF) support has been added.
- [Aug 28, 2005]
- Some modifications to internal codes. (No new feature.)
- [Jun 10, 2005]
- The OCR engine used at ocr1 has been updated to ocrad-0.12,
which runs much faster.
- ocr1 now accepts gzipped image files as well as raw files.
Comments
Send
feature requests, questions, bug reports, or other comments.
Note that no reply will be sent, basically.
Answers to some common questions may appear on the website.
keywords:
Optical Character Recognition, WeOCR, OCR Web, OCRWeb, Web OCR, WebOCR,
online OCR, free OCR
© 2005-2009 Hideaki Goto
www.imglab.org