Osmium -- Chess Engine

Osmium -- Chess Engine Used AI

57 devlogs
120h 36m
Created by cubit010

It's a chess engine, currently trying to make it fast boiiiii (but prob not near stockfish level)
UCI support!!
AI used for ideas/research on chess engines, minimizing code writing because AI sucks at writing actual technical code anyway and any time I do use it I have to spend 10x the time debugging than if I were to write the code myself :sob:
Only AI code is the very basic code that was used when the engine was first conceived, recent modifications and optimizations is all me
Started this project back in May because after AP Comp Sci A was finished, I was bored and decided to learn C# on my own, and make a chess engine partly because C# has structs while Java doesn't

Repository

Timeline

apparently i missed smth last time, i found where the code was actually outputting the thing from, uh mb lol
i used ctrl F on the whole project but it only directed me to the on i had in the previous screenshot but whatever
Fixed it and now it outputs what i want but still doesn't give much of a clue as to which why during white's turn it outputted a black move, one that doesn't even exist either because there had to be a king on e7 which didn't exist... what is going on man :sob:

Update attachment

what the hell is going on with the compiler
I've been trying to modify the outputs of this block so i can see what might be going wrong
it's never outputting properly, so i tried modifying it with other things to see if it works
i tried changing other variables like my uci toggle switch and those took effect as i rebuilt the program
however this block just doens't seem to be effected
i even went and commented out the whole thing, rebuilt it, and yet it still runs???????????????

Update attachment
cubit010 cubit010 about 2 hours ago

i used ctrlf on the whole project and didn’t find anything
but for some reason there was another bit i forgot about??

uh i’m gonna see if modifying it does anything

Earned sticker

there's maybe some bugs going on with movegen... not sure what, but for some reason there's sometimes bad moves, such as a move being ke7d6, when the king is on e8, and the king in question is black when the move is white... not sure how something can go this wrong, this is not good bc I don't want to spend more time debugging i need to get this done. But also during some small case testing, the movegens appear to work correctly???? I have no idea what's going on anymore

Update attachment
Earned sticker

debugged the move filtering, turns out there's a small bug in pawn atk checks for king's to-squares, also added an interface for tuning search parameters like reduction factor, LMR margin, etc, so I can make self-play gradient descent tuning possible, because the alternative that can help boost play strength is NNUE and that's gonna take too long to implement before the time I want to get this done by (end of SOM)

Update attachment
Earned sticker

.............why is it now that i'm finding out that in some niche cases movefiltering over-filters some legal moves..... bruhhhhhhhhhhhhhhhhhh

Update attachment
Earned sticker

finished up basic uci time controls
also connected it to cutechess (at last, ik it took forever) and played a few games with it
it's surprisingly decent against me even with really bad time odds (10s with 1s inc)
(I did throw bc I'm a bit busy w/ school and wanted to get a quick game done bc I needed a screenshot, didn't have a saved game from before unfortunately)

Update attachment
Earned sticker

played the engine with 10s per move for it
got destroyed lmao
not saying much tho bc me myself is like 700-800 elo depending on timings
https://www.chess.com/analysis/library/2oVu6hPkqL/analysis
link to game (I played white) if yall wanna check it out
I will get the UCI done soon so i can start doing some more serious tuning with cutechess or whatnot

Update attachment
Earned sticker

been busy with school so haven't got much chance to work on the engine
fixed the bug though at least, by passing a separate variable, so that the stage of the sort and the sort division is known. However I'll be starting to work on a sort top X to try and speed it up because although simd is decently fast, it's not the most significant speed increase, as well as making sure that, although rare, capture + promotions get scored with combined score rather than same values because of early returns
update: finished that, now working on estimating time for next depth, and if estimate is wrong and is cut off, use last depth because latest depth isn't trusted
Right now for some reason, even though the time limit is 10s, it cuts off at 8.3s and uses depth 9 (which is only half of what it needs to do )

Update attachment
Earned sticker

continued debugging
i think i found the bug
sortlarge still sorts smaller strides with <><> pattern instead of <<>> when it needs to
gonna fix that tmr too tired tn
might try out sorting top x moves instead soon because as it turns out simd sorting the whole thing (at least in the way i have implemented) doesn't boost speed that much to my dismay

Update attachment

fixed one of the bugs, i forgot that during bitonic sort, i still have to recursively sort in halves all the way down to each vector, so now it's implemented
I love it when out of nowhere when you're not coding you just think of the solution out of nowhere
but anyways, there's still a bug that makes it so that large vector counts just don't get sorted properly, no idea why, so uh debug time
there's no magical formula to predict when and where the code fails so here's a sample debug stream

cubit010 cubit010 13 days ago

tried a slightly modified version where it outputs fen every pvs node before movegen/ordering to cut down on the output so that i don’t have to run this every single time.. but it doens’t appear to work for some reason with the last fen given
there’s no ordering in quiesce so idk why that would be happeneing

continued more work on trying to get the padding version for simd sorting to work
currently it seems the sortlarge isn't working for 64 elements with 8 vectors?
weird...
maybe that's why there's a variance between searched moves between insertion sorting and bitonic, esp because both are stable

Update attachment

it's not letting me add a devlog at school using school wifi for some reason.. this is the 3rd rewrite.
Anyways, pushed a commit to github today, going to try out some diff methods for using the simd sorting method, current being sorting the largest power of 2 using simd, then sorting the rest, if required separately using bit insertion sort, then merge them like in merge sort
that currently i believe is causing quite a few inefficiencies because the tail sort takes 2% and merge also takes 2% of self cpu, which i'm looking to eliminate / merge the logic more efficiently by having everything done in simd, when the num of moves isn't exactly a power of 2 it'll just pad the rest with blank scores and indices so that the simd logic will still work

Update attachment

added a few conditionals and now moves with same scores don't get duped

Update attachment

small caveat, if the scores are the same, the mask gets set to 00 instead of 01 or 10 pairs, so moves can get duplicated, will need to address that so that they stay put if the scores are the same

Update attachment

it took so long but YESSSSSSSS IT FINALLY WORKS
SIMD SORTING
YESSSSSSSSS
now i gotta make sure they move the moves correctly to its needed spot

Update attachment
cubit010 cubit010 19 days ago

some terminologies aren’t correct like the intrasort isn’t actually intrasort i just used it to label the intra-vector bitonic sorting section

automated the spreadsheet, saw some patterns that apply to a lot of the cases, so i wondered if some parts can be removed, it took a while but found a case that if the groups of 4s are not bitonicly correct, it doesn't sort properly, so i'll stick with the original pattern, where it's stride 1 to sort pairs, stride 2, 1, to sort groups of 4 bitonicly, then 421, then 8421 and so on

Update attachment

did some tracing on paper, and did a pseudocode version in google sheets, and i think i know how to properly implement bitonic now

Update attachment
cubit010 cubit010 22 days ago

i just realized i did something wrong in that, the 3rd step should be a center-ed level 1 mask, but uh somehow tracing thru it it still worked? idk how that happened

added a tiny bit of an update, if due to timings, search for a depth was cut off, and if its iterative node count is less than 40% of last depth (which for now I think is a decent margin, considering nodes grow a lot per depth, but I think the move ordering is good enough that most good moves should be abled to be searched within that window, might increase it depending on later testing, but i still have to get simd move ordering done) it's not trusted and the last best move is used

(the move in the screenshot before I added the feature suggested g6 and -3 as eval, which is inaccurate, despite my search settings having the main branch (move ordered nxe3 branch) be searched first, i guess it doens't prevent wild nodes from faking a rly low eval because other branches weren't searched)

also might add an early abandonment feature to save time, where it estimates if it can get the next depth done within the given time, if not abandon and just save some time for the future

Update attachment

turned off the non-functional simd sorting and went back to scalar sorting
consistently around in profiling between 24% self-cpu time
it appears that scalar sorting is just expensive..? I tried many different sorting algs and this is one of the best performant ones

nps is only slightly impacted, decreased by 25% in exchange for sorting that actually works

still likely gonna work on simd (although over the course of the day having hand traced the sorting for a 16 element list it's gonna be very weirdly written)

Update attachment
cubit010 cubit010 23 days ago
the no inlining and the commented aggressive optimization is just so i can see the percentages of each line in the profiler otherwise it will attribute it to the method as a whole and can’t rly see the why behind it

not much coding got done, but did a lot of research on bitonic sorting, should be able to fully get the sorting fully functional within the week
(probably only gonna be done on the weekend cause uh classes r stressful)
Am starting to consider the overhead cost of loading vectors compared to the efficiency of simd
prob will do a test run with a bunch of test cases later once i get it working to see which one is better
I swear if simd end up being slower i'm gonna crash out bc i spent too long on this :sob:

Update attachment
cubit010 cubit010 23 days ago
i swapped to insertion that ik for sure works for a bit and nps dropped from 3mil max to 1.5-1.6m I did another profile using binary insertion only and uh with a 10s run from position 1.e4 Nf3 2.d4, it had 22% self cpu the sorting and overall moveordering being 29%, albeit i do currently have it in quiesce too, ima do another profiling and see if quiesce is causing the big amount of cpu
cubit010 cubit010 23 days ago
when i had the scalar sorting i did quite a few profiles and the ordering (scoring and sorting) was somehow consistently 10-17% depending on how optimized i made the other parts of the engine partly why i decided to try simd
nosrep nosrep 24 days ago
fwiw in my engine i just did a profile, the move selector is 1.5% of total runtime
cubit010 cubit010 24 days ago
bitonic is optimized for parallel CPU operations so it does all of the comparisons at once, with few iterations depending on data size, especially optimized with vectors and simd intrinsics. It only takes powers of 2’s amounts of inputs at one go (2,4,8,16 etc number of elems) and it’s faster bc of the parallelization. insertion sort is what i had before i believe, and is still what i use for the tail bit that doesn’t fit into the biggest pow2 there is possible for the moveset, and yeah i also looked into some stuff and was rly considering just not sorting the stuff after the top 5 or so moves, but bitonic seemed to promise performance while sorting the entire thing (although idk how much overhead it will have and impact actual gameplay), gonna do a bunch of testing probs
nosrep nosrep 24 days ago
there’s no way move ordering sorting is the holdup….. also i think the traditional way to do this is just selection sort (just getting the max score out of remaining moves) because with proper sorting/pruning techniques most of the time you’ll only be getting a handful of moves so sorting the rest is wasted work (is this what bitonic sort does? idk what it does.) if you want to look into more of thsi style the other main optimization i’ve heard of is staged movegen but its cursed imo
cubit010 cubit010 24 days ago
i implemented simd a while back while not understanding the whole thing so when it looked like it worked i didn’t check it but uh now that timings got added for uci as per previous devlogs i was wondering why the best move wasn’t getting searched, turns out it wasn’t ordered as the first move so hence it led me to discover the simd i had a few weeks back doesn’t actually work
cubit010 cubit010 24 days ago
the current idea is to split the move sets into a bitonic part and a regular linear tail, where the majority of the moves gets ordered more quickly, and merge the two sections later (to hopefully make moveordering faster)
cubit010 cubit010 24 days ago
move ordering was rly slow with linear sorting from before and i’m looking at other ways to do it faster it might not be worth the overhead so uh we’ll see once i implement it
nosrep nosrep 24 days ago
what are you bitonic sorting for????

let's not talk about how sometimes during refactoring i would just remove features on accident...
def didn't just realize today that move heuristics was off the entire time / however long it's been
well uh it's back now ig yippee

Update attachment
cubit010 cubit010 24 days ago
also school started and 5 APs is not gonna do wonders to my coding time :sob:

erm dug a bit deeper and turns out move ordering wasn't working this entire time............. UGHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH

Update attachment
cubit010 cubit010 27 days ago
btw index 10 was the infamous nxe4… for some reason the sorter put it in the middle… YAY

figured out stopping early, with searching the first move, the principal variation, to its full depth so it can exit nicely with a good eval score and a hopefully good move, however, for some reason during testing after playing as white, e4 Nf6 d4, the engine finds the right move from depth 1-8 but abandons it at depth 9 for some reason, probably due to the timings, but the pv should be searched to full so it shouldn't return a eval of 0 as well as returning a6 as the best move
after some more testing with 20s time limit it manages to find nxe4 at depth 9 and even depth 10, even though depth 10 was the one that got cut off, something's off and ig it's a sign to do more digging

Update attachment

ok finally figured out 3fold detection in search, and also fixed a bug that didn't show up until now for some reason that involved integer overflow, now although the 3f is now in place, it rarely detects any (from starting pos playing e4 searching to depth 10 doesn't result in any hits) unless I force it to search a possibility like playing some knight shuffles. I was confused by the 0 detections for a while though because I was thinking no way that my pruning and stuff got rid of all 3f cases, but I guess it does since I know it does work (famous last words, i hope not)
time to go back to UCI and stuff

Update attachment

I'm legit dumb
so, I forgot to implement 3 fold detection for some reason up until a few days ago, so I decided to side track to work on that because that's a very essential part of an engine, it was 2 parallel arrays with 2000 elements, one stores zobrist keys used as a stack for board comparison, one stores irreversibility so the code can break out early when it could, when a position reached is no longer possible to be repeated by any position that came before it, (such as a pawn advancing forward, no position before that pawn being advanced can be the exact same as now, because pawns can't reverse in direction).
Anyways, the implementation involved checking for moves that are irreversible and setting the correct index to true, but for some reason, the logic i used for checking if a piece that moved is a pawn, is the following: (move.PieceMoved &(bitwise and) (Piece.WhitePawn | Piece.BlackPawn))
the thing is, this is a technique used for flags, that use bits, but Piece is an enum with integer values, so bitwise ands don't work because the white and black pawns are not clean power of 2 integers, so a reversible move (such as a knight jumping back and forth) would be flagged as irreversible, so uh... it took me forever to debug this one simple thing, (and caused me to basically leave for a few days before coming back today and finally seeing the issue)
Below is the code segment, after being debugged, so now it's correct

Update attachment

why is it so difficult to increase TT hit rateeeee :sob:
anyways, worked on more UCI, profiled the performance a bit and improved hotspots, and also tried to increase TT hitrate by loosening the pruning and reduction factors, which raised the hit rate from ~~10% to ~~20%, but it's still terrible, and I have no idea why

Update attachment

FOR THE ISLAND (and the balloons lol)
anyways, actually continued working on UCI class (NO WAY BROS ACTUALLY DOING IT)
added some commands that are able to be parsed properly, still need to implement the rest of the commands like go and ucinewgame whatnot,
Also learned that engines reset after each move, or rather the way UCI does it is stateless, meaning they give you the board state after each move, so if the engine crashes they don't need to depend on internally stored boards to continue playing.
It does slightly annoy me though that a typical UCI game makes the engine reset, and then play the moves over, which is only a tiny bit costly computationally, but I personally would've preferred storing internal states and a stack of moves made, because that's what I've already implemented

Update attachment

continued to try and fix TT for better hit rates, added a bunch of statistical stuff to try and figure out the problem, which is likely that the move pruning is too aggressive, albeit collisions are also part of it. This also for some reason killed the nodes per sec from ~~2M to ~~1M, which i'll try and fix in the next update
Need to continue working on UCI too because I paused work on that to work on TT and stuff
Also started working on Chess2, a side project which is why you may see more gaps between updates for this engine, once again a bit burnt out

Update attachment

worked more on the transposition table more, trying to improve hitrates, did get it up to around 17% at one point but it's still kinda bad for depth 11 even though the TT was allocated 256MB ram (albeit from testing changing the allocation size from 16~256 didn't make too much difference), much improvements needed

Update attachment

Did some performance profiling to try and do more optimization, and added a Transposition table lookup/hit counter, and it turns out the TT hit rate is only 10.9% which is kinda low
:sob:
gonna try and improve that now

Update attachment

The chess engine now has a name!
also continued to work on UCI

Update attachment

Created a new version of magic number generators in C++ -- unlike C# like the engine, because it's faster -- to try and solve the problem of magics, especially for rooks, being realllllly slow. Well, exaggerated, but during profiling rook magic calls alone take 5% of the processing time which is way more than I thought, and most of it is only during quiesce, which is concerning, even though quiesce should have less nodes than the major branching by a lot, most of the magic call are from quiesce, might look into capture move generation logic later
Time:
3 hours from this from the magic gen, 2 and a half from the chess engine

Update attachment
cubit010 cubit010 about 2 months ago
alright thanks will try it out
nosrep nosrep about 2 months ago
instead of the bit twiddling stuff chessprogramming does i like ctz/clz (count trailing/leading zeroes) ops on cpus that support it. faster and much cleaner
cubit010 cubit010 about 2 months ago
note the version in screenshot is a version that was on https://www.chessprogramming.org/Looking_for_Magics with some modifications to syntax, I then made some more modifications to it to try and optimize it but only for it to generate bad magics that give me illegal moves which is JUST GREAT so had to revert back

started work on UCI support, decided I wanted to make this version UCI, and do some testing, also found out out of the blue that my pruning algorithm was backwards, since the depths I use for the PVS (negamax) uses reduction in depth, so each iterative depth it starts by calling it with depth(n) and when it gets to depth(0) it calls the quiesce, but for pruning those depth values aren't helpful, so now I include a ply counter variable, but adding it has had some consequences and has broken some things, which will need to be debugged

Also added in time tracked from other project files that was clones of the engine to help me debug it at points during development

Update attachment

fixed filtermoves, did an overhaul on pawn move generation because when profiled pawn moves were a bit slow, now uses bit tables in movegen. Started debugging on special move gen for the quiesce, because currently there are some bugs with it that sometimes makes white pawns go to 8th rank without promoting, and then try to generate a move further, made the engine print out all the fen it encounters in quiesce before the crash, at the depth i knew the crash was going to happen at, so i can scan thru some of them and try and figure out why

Update attachment

added some debug methods in a new class to help debug the new move filtering, fixed one issue where the king was walking into an area controlled by opposing sliders (bishops, rooks, queens) because the method for checking if the square is attacked was using magics for slider attacks, using the current board state, meaning that the king acted as a blocker, when it shouldn't have, now trying to figure out why checkmate doesn't render the whole move list empty like it should

Update attachment

After a week of burnout trying to debug the engine and some new features, a progress update:
fixed many bugs in the new filtermoves system, but there are still bugs because the node count for each depth is different when using the old play move, check legality, undo than the new system
Added a new incrementally updated material delta tracker after a few iterations of prototyping, to speed up material eval
lots of debugging in general, and getting close to 1.5M NPS

Update attachment

turns out, the magic number generator that I coded is faulty, and the numbers it outputted actually allowed illegal moves, such as Be3 on starting position, I never realized this until I came across an unrelated bricking bug a few days also as mentioned in the devlogs that started when I added the new move filtering, I have reverted to the old magic numbers, and will continue to work on debugging the bricking problem. I have made a copy of my last working code using List before to compare, and that's how I found out that it was the magic numbers giving wrong slider movegen. I will try to figure out why the new movefiltering bricks the whole engine though

Update attachment

debugging the new filtermoves system, still working on it, got a bit of work done because am on a road trip, so lotsa time to try and figure this out on my laptop
there's also some illegal move being generated which is weird

Update attachment

started working on a new move-filtering class, using bitboard-based filtering instead of simulating each possible move and deleting the ones that aren't legal.

Update attachment

some refactoring, first 1M NPS!

Update attachment

honestly kinda burnt out from trying to refactor to maximize nodes per second, not much progress today, hopefully will be a lot more progress tmr

Update attachment

refactoring and rewriting methods, as well as removing unnecessary features to improve nodes per second, went from 400k->900k today, very good progress

Update attachment
cubit010 cubit010 3 months ago

I plan to make a separate version just for UCI, just copying the core code over and adding a UCI class, shouldn’t be too bad, but for this one I plan to make a GUI myself, and allow people to play with it directly

nosrep nosrep 3 months ago

plans for UCI support? makes it easier for chess guis and to put on lichess with lichess-bot

tuning tuning tuning... added some debug statements to tell me how effective the pruning is at doing their job, so depth is quite reduced for now. Will continue to work on tuning upgrades, maybe better move ordering, and then, focusing on squeezing out every ounce of raw nodes-per-second I can get, currently around 400-500kNPS which is ok but I would much rather be at least 1M NPS, which would allow for much better searching at high depths

Update attachment

FINALLY got the new system to work, and also made some optimizations, such as a complete overhaul to the flagCheckAndMate method that is run on all legal moves, as a significant portion of time is spent there before, FIRST DEPTH 15 SEARCH

Update attachment

Still debugging the new Span version, it's somehow pretty inefficient, and also, still not doing the same thing as before. Will continue to debug until it has the same behavior as before, but I think I'm almost there
Almost at 24 hour mark!

Update attachment

AGHHHHHHHHHHHH chess engine still broken from trying to refactor everything to use Span instead of List asjdfjsdfjsaldfjasl;dfjak nothing works anymoreeeeeeeeeeeee
To be fair the change is basically updating the architecture of the whole program... bugs are expected...
Current status: engine breaks after depth 3, and just bricks itself, there's something running, because it doesn't let me type, but it doesn't end whatever process it was doing... never ending...

Update attachment
nosrep nosrep 3 months ago

slay

Refactoring, refactoring, refactoring...
Struggling to refactor all of my method that uses List to use Span, which I've found out to be a lot more efficient as there's a not a large amount of repeated memory allocation, will continue and hopefully finish up tmr

Update attachment

Trying to refactor movegen to be more efficient, as for a bitboard engine, despite the progress I've made, it's still slow in nodes per second, currently got it up to 400k nodes per second, which is decent, but not good. Will continue refactoring for efficiency tmr.
Screenshot is of the newly refactored flagcheckandmate that improves efficiency

Update attachment

TUNING... A LOT OF TUNING.... Lots, and lots, and lots, and lots of tuning
mostly just tuning the pruning parameters, as well as the aspiration window size, and futility margin
AT LEAST IT SEARCHES TO DEPTH 13 PRETTY GOOD NOW WOOOOOOOOOOOOOOO
ALSO FIRST DEPTH 15!!!!!!!!!!111!!11!!!1!!1 (albeit a huge node explosion after 14, it appears that there's always going to be a node explosion at some point, just how long you can delay it, and it eventually might get to the point that too much actual accuracy will be sacrificed that it won't be worth pruning that much
There's a chance I could be sacrificing a lot of accuracy by pruning so much, will do more testing later

Update attachment

added some more advanced eval, such as king safety, pawn struct, and tuned and fiddled around with some pruning parameters, getting real good now, might do some live testing soon against stockfish and see how things go

Update attachment

added some phased piece evaluation, will be working on king safety and pawn structure and such later

Update attachment

tried to add some more features to make it go faster or more efficient, only slowed it down or broke the engine... enough coding for the day

Update attachment

Added evaluation output, including formatting for mate in x by either color

Update attachment

FIXED NULL MOVE PRUNING BUGS BOIIIIII and also added more features, like aspiration window, futility pruning, and late move reductions, it's now a lot faster, (by cutting a lot of positions searched)

Update attachment

trying to debug null move pruning, currently very broken, returns an invalid move after searching past depth 5

Update attachment

Did a bunch of refactoring, added a small method that allows for future multithreading support, currently working on null move pruning for more search efficiency

Update attachment