Rehatbir Singh
Check their projects out: Free VoIP, Toilet Paper
Once you ship this you can't edit the description of the project, but you'll be able to add more devlogs and re-ship it as you add new features!
Finally fixed the bug!!
Basically, I got panic (=fatal error) because I tried to move a value I previously put into a shared reference (a pointer in Rust that gives multiple places immutable access to some data) out of it, but tried that in the par_run() (the recursive multithreaded helper function) and not in the actual Crawler::run() function...
So now I will - again - test and benchmark and then ship eventually.
Still doing the context thingy, installing rustowl because of problems with the Arc the context is wrapped in, will probably upload tomorrow
Doing stuff with the type system to expose Context in an ergonomic way...
Next big refactor incoming...
[Example code below]
Basically done I'd say!
Everything is working, a crawler builder pattern (which is also lazy until .run(...) so the crawler can be stored as a config) is available for both versions, Crawler::new() and Crawler::newasync() and configs like file/folder regex and search depth.
I also conducted a small benchmark where the async (via tokio) and multi-threaded (via rayon) versions took the same time while the earlier single-threaded recursive version was 2x slower (you can still access it via [crate name]::legacy::singlethreadedrecursive::foreach_file(...)).
Tomorrow I'll maybe add simple synchronisation primitives for some context or maybe something else, idk
(Also, repo is up to date)
[Committed the code]
Worked much on the async version to get it finally working, it during this period it behaved very weirdly (dbg! statements affected the termination, it didn't terminate sometimes) which is solved now, so after having a non-async multi-threaded version, I'll conduct some benchmarks.
(Spent way too much time 😭)
Refactoring the async stuff to use a second internal function for the recursion (or not?), also thinking about design choices for passing the data in the async recursion thingy (Crawler vs Config struct)...
Will upload today/tomorrow
PLEASE FIX THE FORMATTING
Making a builder pattern now, added parameters (max search depth, file/folder filter regex, used manual filters previously...), getting the async version over to the builder pattern and I now think I know why it was slower: I just .await-ed all the task I spawned which basically means that it behaves like the single-threaded recursive version - but async...
So, what I'm doing now is using a task pool to still be able to wait for all the tasks (=user defined async actions for every specified file) before terminating.
Will commit now!
Finished the recursive async version, isn't really faster than the recursive single-threaded one...I will need to make ergonomics also, I used Box::leak :sob:
Since async isn't faster, I'll try a non-recursive version and also add more options and a file crawler builder!
Tried out different use cases (writing files to a file, count files). Found that you need a mutex anyway if you want to get references etc. into the closure that's applied to every file in the specified directory.
Since what the single threaded recursive version is basically trivial, I will work on the multithreaded version and add some ergonomics later so ppl don't need to mess with mutexes.
I don't like OneDrive, I want my files to be local instead of being in the cloud, so I built a simple recursive file crawler as a simple script but directly tried a multithreaded approach...it didn't really work, but took some hours. After that I made a simple recursive version which at least works... This gave me an idea: why don't make a customisable multithreaded (or async later?) file crawler that does whatever you want! Mine just opened and directly closed every file it encountered making onedrive download it, but what about modifications, counting size, whatever you want.