I've used Tesseract.js to recognise the https://** links from the camera input and to make them clickable.
First issue I've encountered was the text recognition performance. Depending on the camera input (if the image contained something that looked like the text or not) I've got 2-20+ seconds per 640x640px image for text recognition on iPhone X. Not so fast as you may see. But the recognition was pretty accurate though.
The performance, as expected, improves when the image size is getting smaller and the amount of text on the image is also smaller.
Since I did't want to recognise the whole text, but only the links, I've used the TensorFlow Object Detection model to quickly find the areas with the text http://**. Then, instead of recognising the whole image I needed to do it only for smaller parts of the image. This gave some improvements to the performance: from the variable 2-20 seconds per frame I've got more stable 0.5-1 seconds. Also not good, but several times faster.
Tesseract sucked for me. Had a simple use case where I was trying to read numbers (in a computer font) from .png files and at completely predictable locations in the image -- and Tesseract was getting it horribly wrong a huge percent of the time. Went with AWS Rekognition and results were instantly 1000x better.
Post processing is absolutely essential with tesseract. Not to self promote but I discussed this at some length in this blog post, if you're interested: https://kn100.me/taking-back-data-from-eufy/
We really need better open source {OCR, TTS, dictation, ...}. All of the common FOSS tools for these tasks are so horribly behind the state of the art.
The sad thing is most of the state of the art models and algorithms are open research, they just are usually not written by software engineers and need to be rewritten to be deployable. Usually you just get some shell script like "run_eval.sh" that generates the figures in the paper through a bunch of spaghetti code, and most of the time it will depend on a specific old version of Tensorflow, that probably isn't available for your CUDA version, and probably won't compile on your system without hours of Googling.
Had the _exact_ same situation! Was just trying to OCR values of screenshots, which were always of the same screen (app screenshots taken by users) and it was so bad. Ended up just using AWS Rekognition and it worked really well.
I think it's mostly for OCR'ing high-resolution scans of printed media. I scanned and OCR'd a several hundred page printed book (my grandfather's memoirs) with great results. The text needed very little processing. But I had to manually transcribe all of the image captions, because they were scans of photocopies of photocopies of typewriter labels stuck to photos by hand, and thus very poor quality, and Tesseract produced complete gibberish.
I spend 2 months 2 years ago on building a passport data extractor. For KYC (know your customer) purposes.
Unfortunately I did not manage to get to a situation where the extracted data was really useful.
I just tried this JS version (sure the native one is the same) and without changing anything (apart from the training dataset) I got much better results. Exciting.
I've used this library in the past for prototyping a project to extract Chinese subtitles from youtube videos in a chrome extension. It worked pretty well. The only problem is the library couldn't really handle realtime video. Can't really fault it for that though I was sending it every frame. The throughput was good but latency kept increasing probably because I was giving it to much data.
There's a mode where you can increase the number of worker threads. Tesseract is also designed for text documents and the preprocessing filter I made to convert the images to look more like a text document was pretty naive.
I'm taking an online computer vision class next semester and hope to pick the project back up after learning a bit more.
I wanted to use Tesseract for a project but found it to be a bit too slow for my needs. Doesn't it have options to speed up it's recognition or is there another OCR project out there that's made to be faster?
I found an error in the chinese demo, with the example you provided (4th character wasn't the same). I know no OCR is perfect, but IMHO at least your own demo should be free of errors.
You try to show how well it works, not that it works perfectly well (which is false). Edit: especially since we know that OCR is hardly perfect - we expect errors to be minimized, not absent, and the first interest is to see where the engine fails.
There's one in the English demo too: "hail!" -> "haill". They're both pretty bad images though. In practice I've found (command line) Tesseract very accurate on 300dpi scans of printed documents, with colour/greyscale, not binary.
Yes, it's kind of weird, since there's no benefit to claiming false things like "Tesseract.js is a pure Javascript port [...]". Say it's WASM, since people associate that with speed and newness (and heavyweight dependencies, but there's no hiding that).
Skimming the download, this does indeed use wasm, but it's also possible to build to pure JS with emscripten (in WASM=0 mode, wasm2js compiles the wasm to JS). Perhaps that's what they used to do and the docs have not been updated or something like that.
It's mostly not JavaScript, since it uses the emscripten port of the Tesseract OCR Engine, and if you want to do things in the browser, JavaScript has to be involved.
Ease of deployment. Deploying a client-side JavaScript application remains far, far easier and less expensive than anything that runs server-side (or native compiled) code.
Also privacy: running OCR in someone's browser rather than sending the images back to the server keeps them fully in control of the data they are working with.
Client-side web apps. With today's smartphones, it does make sense to not do everything solely on the server side.
Theretically, cross platform support would be another possibility. But one could argue native C code could be bundled as well, albeit with separate integration being needed. (Android and iOS do support such extensions).
First issue I've encountered was the text recognition performance. Depending on the camera input (if the image contained something that looked like the text or not) I've got 2-20+ seconds per 640x640px image for text recognition on iPhone X. Not so fast as you may see. But the recognition was pretty accurate though.
The performance, as expected, improves when the image size is getting smaller and the amount of text on the image is also smaller.
Since I did't want to recognise the whole text, but only the links, I've used the TensorFlow Object Detection model to quickly find the areas with the text http://**. Then, instead of recognising the whole image I needed to do it only for smaller parts of the image. This gave some improvements to the performance: from the variable 2-20 seconds per frame I've got more stable 0.5-1 seconds. Also not good, but several times faster.
I've described the challenges in more details here https://trekhleb.dev/blog/2020/printed-links-detection/. But to sum up, I had a good recognition quality with an arguable performance with Tesseract.js