Page 1 of 1

Scanning material for eBook?

Posted: Wed Mar 12, 2008 10:27 pm
by Tim Burton
I have some 100 year old magazines (Railroaders and Trashers) and would like to put them up on my website since few actually have these magazines and most who do would never put them on the web.

Some of the articles (especially the fiction) are interesting and entertaining (and now in public domain). I'd like to scan them, but I have a few questions.

I was recommended to use SimpleOCR for text conversion.

http://www.snapfiles.com/get/SimpleOCR.html

Does anyone have any other recommendations for scanned image to text conversion?

Also, I'd like to make PDFs, is there a way to just transfer the text and the ads to PDF and not get the paper (it is dingy yellow, which is normal for pulp magazines) color/background transfered?

I'd really like them printable quality, does anyone have any suggestions at how to do it? Any ideas about converting the text so it can be used in the Kindle or the Sony eReader?

Re: Scanning material for eBook?

Posted: Thu Mar 13, 2008 4:09 am
by DMB2000uk
Well you'd have to do a fair bit of image tweaking to get them from yellow to white backgrounds, it's do-able, if you have photoshop you could even set up batch operations to change the colour of each page to white, provided that the page colour is consistant.

SimpleOCR seems to be the best bet, I think for anything better you'd have to pay. Be warned that even though the OCR is recognising printed text, it won't get everything right, so you will probably have to go through correcting things. Could be quite a project...

You should post a sample one so that we can see the quality that it has to be changed from to get it 'printable'. (Use imageshack or something)

Dan

Re: Scanning material for eBook?

Posted: Fri Mar 14, 2008 10:25 pm
by Tim Burton
Ok, when I get back to Phoenix, I will do that.

Re: Scanning material for eBook?

Posted: Sat Mar 15, 2008 1:45 pm
by stev
If the yellow page isn't all that bad, I would keep it since these are historical documents. It adds to the preservation and authentication of the publication.

As for making the scanned images into a PDF, there are a few good FREE PDF tools on the internet. I personally use the PDF tools from PDF995 website. http://www.pdf995.com

First, you will need to place your scanned images into MS-Word or any other type of editor that will support graphics. Make sure you use the Page-Setup in the program to make the scanned images fit well. Once a magazine is all entered, just do a File > Print > PDF995 to a PDF file. It's that simple.

Also, you will need to make the scanned images clearly and readable at the lowest resolution possible without sacrificing the quality. This way, the PDF file will be doable to upload, download or to view on-line.

I've done this very thing at work for documents to preserve them and to store them in Share-Point.

Many of the big name companies in industry use this easy software tool too. The use the low cost purchased version with tech support.

OCR scanning is hit or miss. If the pages have graphics on them, then the OCR tries to format the page where the text would go. In many instances it's only a mess. Plus, if the scan does work for the layout, the OCR text will need to be proof read and spelling corrections to be made.

If you scan in color, expect the file sizes per page to be large vs. B&W pages. If the magazines are in B&W, then the yellow background would be a light gray or nothing for the scanning. You will need to play with the scanning options. Some scanning softwares allow for background image correction too.

Hope this helps some!