It looks like you're using an Ad Blocker.
Please white-list or disable AboveTopSecret.com in your ad-blocking tool.
Thank you.
Some features of ATS will be disabled while you continue to use an ad-blocker.
originally posted by: DexterRiley
Now the problem with using something like that with the document set presented by the OP is that the vast number of different sources probably means there are a non-trivial number of fonts and typefaces in use. Is it necessary to specify a single set of font training files for each OCR session? Or can you create a "catalog" of training file sets that Tesseract can choose from?
originally posted by: DexterRiley
When you are able to get some better scans let me know. I'd like to work on this a bit more to see if we can establish a process to streamline and enhance the digitization effort.