WeOCR Project

Since Feb. 2005 / Last update: Sep. 15, 2019

End of Services
Thank you very much for using the WeOCR services for more than twelve years. As we can see a lot of nice online OCR systems today, I have decided to close our services. The servers will be shutdown gradually, although some will remain working only for demonstration purposes.

When I started developing "Online OCR service" in 2004, there were only a couple of experimental ones, which had been almost abandoned. Who at that time imagined that online version of OCR service would be practical enough despite the privacy concerns? Who would provide free OCR services when the OCR engines costed much? We can see the consequences today. There's a high demand. Some high-performance OCR engines are available for free.

I don't know if the WeOCR project had some contributions and/or impacts in the world of web service developments. But, it is obvious the services here are no longer attractive. So, it's time.

Thanks again to everyone who tried our services and programs, developed his/her own service, got some inspirations (if existed) for new online services.
Aug. 25, 2017 Hideaki Goto

IMPORTANT !
Don't use bots (automated programs) regularly. The server performance is limited, and people suffer from high loads! You should use a local OCR engine if you need a regular processing since it is much more reliable and faster basically.

WeOCR servers
- Other servers:
  Web-based/online OCR services and demos
Introduction
Objectives
Features
Documentation / API
Publications
Download (no longer available)
License
TODO
Recent changes
Comments

Introduction

WeOCR is a platform for Web-enabled OCR (Optical Character Reader/Recognition) systems that enables people to use character recognition over networks. A WeOCR server receives document images from users, recognize texts in the images, and return recognition results to the users. WeOCR does not have its own character recognition engine. Instead, it is intended to accommodate various character recognition engines. WeOCR provides a simplified user interface so that more people can benefit from OCR easily.

Although some people would worry about the privacy of their documents, we think there are still a lot of applications of OCR in which privacy does not matter. We hope WeOCR will expand the range of OCR applications further.

Objectives

Design the architecture of WeOCR.
Develop a toolkit that enables OCR developers and researchers build their own Web-based OCR sites easily.
Encourage people to develop OCRs for various languages and to open them to the public either as a Web service or as a Free Software.
Make some useful tools and libraries for Web-based OCR systems.

Features

WeOCR-toolkit has the following features.

Receive a document image from each client computer, pass the image to the back-end OCR engine, generate HTML data from the result data, and send the data back to the client.
Uncompress the incoming image file if required.
Limit the size of the input data to protect the server from huge data.
Examine the integrity of image file headers.
Convert the input image into a common image format (PNM).
Limit the number of jobs to prevent the server from processing too many documents at once and to maintain acceptable server response.
Terminate the OCR engine after a specified time has passed, if the engine continues running (in vain) due to unexpected input data or bugs in the engine.
Support server search function using spec files in XML.

Documentation

License

The license is the Apache License, Version 2.0.
(An MIT-X derivative applies to weocr-toolkit-0.12 and older.)

You don't need to open the source codes of your OCR engine to the public, if you wish so.

TODO

Deploy more WeOCR servers. (ASAP)

Advertisement! (ASAP)

Encourage researchers/developers to provide their own WeOCR services. (ASAP)

Find open source OCRs for various languages. (midterm)

Improve the UI. (midterm)

Write documentations. (midterm)

Develop an OCR for Japanese (ASAP)
... etc.

Recent changes

[Jun 15, 2012]

WeOCR-toolkit ver.0.14 has been released.

[Apr 26, 2009]

WeOCR-toolkit ver.0.13 has been released.

[Sep 26, 2008]

A new server `` Japanese character recognition - beta'' has been opened. It is equipped with OCRopus and NHocr.

[Sep 9, 2008]

A new server `` Japanese text line recognition - beta'' has been opened.

[Aug 19, 2008]

WeOCR-toolkit ver.0.12 has been released.

[May 12, 2007]

WeOCR-toolkit ver.0.11 has been released.

[Apr 8, 2007]

A new server `` Scene text recognition - beta'' has been opened.

[Feb 12, 2007]

The server search CGI can now produce server lists in XML. This would be useful for various web applications using WeOCR. Pass parameter "fmt=xml" to the CGI.

[Aug 18, 2006]

An automatic spec collector is now up and running. Once your server is registered for the server list, your spec file will be examined periodically (twice a day) and used for updating the list.

[Jun 26, 2006]

WeOCR-toolkit ver.0.10 has been released.

[Jun 9, 2006]

Hebrew OCR (hocr) has been added (see here).

[Jun 7, 2006]

Server search function is now available at WeOCR Server List.

[Feb 26, 2006]

WeOCR-toolkit ver.0.10beta has been released.

[Feb 19, 2006]

The OCR engine at ocr1/e1 has been updated to ocrad-0.14.

[Jan 22, 2006]

WeOCR-toolkit has been released (at last).

[Jan 18, 2006]

The project has been renamed, since the previous name ocrweb was too popular in another community.

[Nov 6, 2005]

A new server with GOCR has been released.

[Oct 14, 2005]

The OCR engine at ocr1 has been updated to ocrad-0.13.
A filter for adaptive thresholding has been added.

[Oct 7, 2005]

Another website has opened.

[Sep 22, 2005]

JPEG (JFIF) support has been added.

[Aug 28, 2005]

Some modifications to internal codes. (No new feature.)

[Jun 10, 2005]

The OCR engine used at ocr1 has been updated to ocrad-0.12, which runs much faster.
ocr1 now accepts gzipped image files as well as raw files.

Comments

Send feature requests, questions, bug reports, or other comments.
Note that no reply will be sent, basically. Answers to some common questions may appear on the website.

keywords: Optical Character Recognition, WeOCR, OCR Web, OCRWeb, Web OCR, WebOCR, online OCR, free OCR