Scannio is a desktop app that converts PDFs to EPUB or TXT files. It intelligently handles both regular and scanned PDFs by using multiple OCR technologies—including on-device Tesseract and PaddleOCR, as well as cloud-based AI services—to accurately extract text.
No followers yet
Once you ship this you can't edit the description of the project, but you'll be able to add more devlogs and re-ship it as you add new features!
I added persistent settings management using electron-store it now saves and loads your preferences, all of these settings are saved automatically as you change them. I restructured the main form to be more logical and intuitive; you now select the OCR engine first, and the language selection sections will only appear if you choose an engine that requires them, (Tesseract). For better control, a Cancel button now appears during the conversion process, allowing you to stop the operation at any time and terminate the background task to free up system resources. I also added a Clear button to easily reset the PDF file input and any language selections, making it faster to start a new task.
The goal was to integrate new, more powerful engines to give users greater flexibility and improve text recognition quality.
The first new addition was a more experimental feature: AI Vision OCR. This engine connects to a locally running AI model (via LM Studio) with image analysis capabilities. In this mode, Scannio sends the rendered PDF pages directly to the model, which looks at them and interprets their content.
Next, I integrated PaddleOCR. Instead of sticking purely to JavaScript-based solutions, I opted to run an external Python script. This allows the application to leverage PaddleOCR's advanced capabilities, which should significantly increase precision, especially with non-standard documents. The entire operation is handled by the dedicated worker thread, ensuring the user interface remains fully responsive.
My first major task was converting PDFs to images for OCR. I initially tried a Node.js wrapper for Poppler, but it was slow and inefficient at saving files. I then switched to pdf-to-png-converter, which was better, but PDFium, the engine from Google Chrome, proved to be the fastest and most effective solution.
With the conversion method decided, I started working on the img extracting and OCR, Then I moved the PDF processing and OCR to a separate worker thread to keep the user interface responsive, even with large files.
Currently, I'm exploring some post-processing steps to automatically correct common errors, fix weird formatting, and remove other junk the OCR process leaves behind.