Please sign in to access this page

EBookToPDF

EBookToPDF

12 devlogs
38h 44m
•  Ship certified

This python script will convert these annoying Digi4School Books to PDFs.

The script will launch Chrome(so u have to have Chrome installed) in headless mode (u wont see any windows open) and visit the Digi4School Site. Then it checks if you are already logged in, if not, it will ask you for ur login credentials. After a successful login, you will be able to choose which ebook you want to convert.

Disclaimer:
This script does not constitute legal advice. Copyright laws and digital rights management (DRM) regulations vary by jurisdiction and are complex. You are solely responsible for ensuring your use of this script complies with all applicable local, national, and international laws, including copyright law and any terms of service or licensing agreements associated with Digi4School and its content.

Timeline

Ship 2

1 payout of shell 69.0 shells

Manuel Hofmarcher

about 2 months ago

Manuel Hofmarcher Covers 1 devlog and 4h 11m

convert all ebooks feature added. Had to spend some debugging because the script always expected the first subbook collection which leaded to a NoSuchElementException in selenium.

Update attachment

Ship 1

1 payout of shell 614.0 shells

Manuel Hofmarcher

3 months ago

Manuel Hofmarcher Covers 11 devlogs and 34h 33m

Completely tested the script on Windows and Linux, fixed some loading times and added ASCII Art :)

Update attachment

Finally got all my book types working with the script (some ebooks redirect u from digi4school to some other site, and they all ofc use different layouts).
Still found some bugs ill be working on the next days. Also added some docstrings. Sleeping times are still not fully dynamic (despite me having said that in a previous devlog, but at least now they are partially dynamic using webdriverwait)
Fixed the next page popup issue.
Most of my time I spent on building the mechanism for the BiBox books, cause I tried many ways of obtaining the crop coordinates (BiBox uses a canvas for the book <- makes it way harder).

Update attachment

Discoveries:
Books hosted by scook use iframes
One of my math books cannot be found using XPATH for some reason
DigiBox books use a canvas (makes the process of obtaining the crop coordinates harder)

What I spent my time on:
Made the script compatible with EBooks hosted by DigiBox and Scook. Therefore I split up the savebookaspdf function into the book-specific part (selecting first page, obtaining crop coordinates, closing pop-ups and adjusting the zoom) and the general part "savebooksaspdf_main" which now only saves the images and crops them using the already gained values.

I also just noticed that I never mentioned when I switched to Chrome. The reason for the change was that chrome supports the print page feature. I no longer use this feature, nevertheless I stayed with chrome.

Update attachment

Fixed a bug with a Math book (for some reason selenium couldnt click the name of the book, so i changed it from searching for the bookname to searching for the nth entry heading). the script also closes other popups that may occur.

Update attachment

normalized sleep times
moved exports to their own directory
started the all option in the book selection (not finished)
created subbok selection (some ebooks contain subdirectories for ressources, audios and the real ebook)
fixed the mechanism of how the window is zoomed and the pics are cropped
changed the way of how the savebookas_pdf function notices the end of the book

Update attachment

I found out that some books have a different width to height ratio. My previous approach involved using fixed window sizes and a fixed zoom that I got with trial and error. Now I have spent a lot of time trying to build functions and testing different approaches. I think I will continue using a fixed zoom, but crop the image dynamically according to the div on the site. I also fixed some issues that arose with some other books like specific pop-ups or weird page naming schemes. Rest were some minor changes.

Update attachment

Tried some other methods of capturing the pages (for a better pdf quality). In the end I decided to stay with my previous approach. Sleeping times are now dynamic (the script checks every 100ms if the page finished loading). The user can now specifiy the amount of loading time between the pages (so ppl with slower internet can set it higher). And some other minor changes.

Update attachment

It took me a bit to find the best way to capture the pages. My approach is to take a png of the whole page (before the capture, the script zooms accordingly), crop the image and after all the images have been collected, the pdf is created. The script still lacks some features like: deletion of already used images, dynamic sleep times (currently the script always waits the same amount of seconds that I think the page needs, but the script also could check every like 100ms if the page already loaded). I also added timestamps to the print commands.

Update attachment

Spent a lot of time trying to create a good capture of the pages. Done for today.

Update attachment

Now all the books are presented and u can select one of them. After u have selected one, the chosen book will be opened and the first page will be shown.

Update attachment

Started this project a few months ago, didnt come very far. Now I firstly updated the current script to match the new website (website got huge overhaul). Then I wanted to add the feature of listing the owned books (so the script can ask the user which of the owned books should be converted). In the process of writing this part, I found out about shadow roots and shadow DOMs. That made it a little more complex. Got not as much done as I planned, but its still smth.

In the attachment are the current features shown.