Document Dewarping
Here we provide an on-line demo of our page dewarping algorithm that
is based on the assumption that the original page contained only straight
lines that were approximately equally spaced and sized.
This is often true for book pages.
Compare the publication
Adrian Ulges, Christoph H. Lampert, Thomas M. Breuel: Document Image Dewarping
using Robust Estimation of Curled Text Lines, International Conference on
Document Analysis and Recognition (ICDAR), pages 1001-1005, 2005.
Note that the algorithm has several limitations:
- it does not work if the assumption is not true for (parts of) the
document image, i.e. for headlines, paragraphs with spacing,
figures, etc.
- we (try to) use the largest box of text within the image only
- we assume that the image is given in the correct orientation
- the maximum angle that any part of the text line can deviate from the
horizontal is currently set to 0.5 radians (about 30 degrees)
- if there are large spaces between words, the line tracker sometimes
tries to make two lines out of one; this results in visually strong
distortions
You can either submit an image through the form interface, or you can
submit it programmatically through HTTP.
Form Interface
If you do not have an image at hand or want to try some of our images, try one of these (note that results are cached, so this is faster than using a new image):
Programmatic Interface
To submit your image programmatically, you can simply POST to this URL; the
image should be a parameter named "imagefile".
From the command line, you can do this using:
curl -D header.out -F 'imagefile=@input.jpg;type=image/jpeg' http://quito.informatik.uni-kl.de/dewarp/dewarp.php > output.jpg
You can also do this easily using the HTTP implementation in your favorite
programming language (C#, Python, Java, Perl, etc.).