Have come across a document that was transcribed with OCR; it’s a mess.
Walt Whitman
-
Supplementary File Part 3
-
Walt Whitman Papers in the Charles E. Feinberg Collection: Supplementary File, 1806-1981; Printed Matter, 1866-1969; Articles; Brown, Henry Harrison-Buehrer, Edwin T.
-
mss1863002201-248
When I have time, if it is still there, i will delete the entire thing and start over.
When is OCR ever correct? Obviously the transcriber is not proofing their work. It’s all gibberish. Why can’t the site get rid of the OCR option.
OCR can be a helpful starting place for standardized, modern text but as you say, volunteers still need to review and edit the output. I think the table of content formatting of this page tripped up the OCR feature. This page is still “in progress” so volunteers can make changes - which means it is still making its way through the normal process!
We disable OCR on some campaigns where we know most of the materials are handwritten. So for instance, you can’t use OCR in the Garfield or Schoolcraft campaigns. And the feature restricted to registered users only. We do reach out and deactivate the accounts for volunteers when we see a pattern of misusing OCR so if you see a lot of bad OCR in a row - let us know and we can look into it!
2 Likes
ok, thank you. i appreciate your explanation.
I too have found many gibberish-OCR transcripts that supposedly are ready for review. When I find one, I immediately click Edit, try OCR again (usually isn’t really better), and then delete it all. I type in a few lines and send it back to In Progress if I don’t have time right then to transcribe. Just agreeing with you-- it can be a pain. But when OCR works, it’s a great gift, isn’t it!
1 Like
I wanted to mention also that if the original transcriber will choose the appropriate language in the OCR pulldown menu, it transcribes better. I’ve seen several other-language entries transcribed with the default English option. But when I re-OCR’d (is that a word, haha) in, say, German, it looked to me as though it was a cleaner transcription. This won’t help the gibberish pages, but may help with some others.
2 Likes
I use the OCR transcription all the time… in different languages and I find it does an excellent first draft. It does not do weel with numbers or indexes Please do NOT get rid of it.
2 Likes
I’ve also run across pages that I suspect were transcribed with AI that are just awful. These were in the Schoolcraft campaign, where OCR is not turned on. The pages were clearly not reviewed because even some very readable words were wrong.
1 Like