It's a chess engine, currently trying to make it fast boiiiii (but prob not near stockfish level)
UCI support!!
AI used for ideas/research on chess engines, minimizing code writing because AI sucks at writing actual technical code anyway and any time I do use it I have to spend 10x the time debugging than if I were to write the code myself :sob:
Only AI code is the very basic code that was used when the engine was first conceived, recent modifications and optimizations is all me
Started this project back in May because after AP Comp Sci A was finished, I was bored and decided to learn C# on my own, and make a chess engine partly because C# has structs while Java doesn't
luke
Check their projects out: agar.io clone, dict_to_colormap, Snake2
nosrep
Check their projects out: my website, trout, human benchy
vidit2345
Check their projects out: MihanSolo Chess Engine, Terminal Of Hanoi
Henrique
Check their projects out: Online chess game via the terminal., CLI todo list , Editor de texto escrito e go
Once you ship this you can't edit the description of the project, but you'll be able to add more devlogs and re-ship it as you add new features!
apparently i missed smth last time, i found where the code was actually outputting the thing from, uh mb lol
i used ctrl F on the whole project but it only directed me to the on i had in the previous screenshot but whatever
Fixed it and now it outputs what i want but still doesn't give much of a clue as to which why during white's turn it outputted a black move, one that doesn't even exist either because there had to be a king on e7 which didn't exist... what is going on man :sob:
what the hell is going on with the compiler
I've been trying to modify the outputs of this block so i can see what might be going wrong
it's never outputting properly, so i tried modifying it with other things to see if it works
i tried changing other variables like my uci toggle switch and those took effect as i rebuilt the program
however this block just doens't seem to be effected
i even went and commented out the whole thing, rebuilt it, and yet it still runs???????????????
there's maybe some bugs going on with movegen... not sure what, but for some reason there's sometimes bad moves, such as a move being ke7d6, when the king is on e8, and the king in question is black when the move is white... not sure how something can go this wrong, this is not good bc I don't want to spend more time debugging i need to get this done. But also during some small case testing, the movegens appear to work correctly???? I have no idea what's going on anymore
debugged the move filtering, turns out there's a small bug in pawn atk checks for king's to-squares, also added an interface for tuning search parameters like reduction factor, LMR margin, etc, so I can make self-play gradient descent tuning possible, because the alternative that can help boost play strength is NNUE and that's gonna take too long to implement before the time I want to get this done by (end of SOM)
.............why is it now that i'm finding out that in some niche cases movefiltering over-filters some legal moves..... bruhhhhhhhhhhhhhhhhhh
finished up basic uci time controls
also connected it to cutechess (at last, ik it took forever) and played a few games with it
it's surprisingly decent against me even with really bad time odds (10s with 1s inc)
(I did throw bc I'm a bit busy w/ school and wanted to get a quick game done bc I needed a screenshot, didn't have a saved game from before unfortunately)
played the engine with 10s per move for it
got destroyed lmao
not saying much tho bc me myself is like 700-800 elo depending on timings
https://www.chess.com/analysis/library/2oVu6hPkqL/analysis
link to game (I played white) if yall wanna check it out
I will get the UCI done soon so i can start doing some more serious tuning with cutechess or whatnot
been busy with school so haven't got much chance to work on the engine
fixed the bug though at least, by passing a separate variable, so that the stage of the sort and the sort division is known. However I'll be starting to work on a sort top X to try and speed it up because although simd is decently fast, it's not the most significant speed increase, as well as making sure that, although rare, capture + promotions get scored with combined score rather than same values because of early returns
update: finished that, now working on estimating time for next depth, and if estimate is wrong and is cut off, use last depth because latest depth isn't trusted
Right now for some reason, even though the time limit is 10s, it cuts off at 8.3s and uses depth 9 (which is only half of what it needs to do )
continued debugging
i think i found the bug
sortlarge still sorts smaller strides with <><> pattern instead of <<>> when it needs to
gonna fix that tmr too tired tn
might try out sorting top x moves instead soon because as it turns out simd sorting the whole thing (at least in the way i have implemented) doesn't boost speed that much to my dismay
fixed one of the bugs, i forgot that during bitonic sort, i still have to recursively sort in halves all the way down to each vector, so now it's implemented
I love it when out of nowhere when you're not coding you just think of the solution out of nowhere
but anyways, there's still a bug that makes it so that large vector counts just don't get sorted properly, no idea why, so uh debug time
there's no magical formula to predict when and where the code fails so here's a sample debug stream
tried a slightly modified version where it outputs fen every pvs node before movegen/ordering to cut down on the output so that i don’t have to run this every single time.. but it doens’t appear to work for some reason with the last fen given
there’s no ordering in quiesce so idk why that would be happeneing
continued more work on trying to get the padding version for simd sorting to work
currently it seems the sortlarge isn't working for 64 elements with 8 vectors?
weird...
maybe that's why there's a variance between searched moves between insertion sorting and bitonic, esp because both are stable
it's not letting me add a devlog at school using school wifi for some reason.. this is the 3rd rewrite.
Anyways, pushed a commit to github today, going to try out some diff methods for using the simd sorting method, current being sorting the largest power of 2 using simd, then sorting the rest, if required separately using bit insertion sort, then merge them like in merge sort
that currently i believe is causing quite a few inefficiencies because the tail sort takes 2% and merge also takes 2% of self cpu, which i'm looking to eliminate / merge the logic more efficiently by having everything done in simd, when the num of moves isn't exactly a power of 2 it'll just pad the rest with blank scores and indices so that the simd logic will still work
small caveat, if the scores are the same, the mask gets set to 00 instead of 01 or 10 pairs, so moves can get duplicated, will need to address that so that they stay put if the scores are the same
it took so long but YESSSSSSSS IT FINALLY WORKS
SIMD SORTING
YESSSSSSSSS
now i gotta make sure they move the moves correctly to its needed spot
some terminologies aren’t correct like the intrasort isn’t actually intrasort i just used it to label the intra-vector bitonic sorting section
automated the spreadsheet, saw some patterns that apply to a lot of the cases, so i wondered if some parts can be removed, it took a while but found a case that if the groups of 4s are not bitonicly correct, it doesn't sort properly, so i'll stick with the original pattern, where it's stride 1 to sort pairs, stride 2, 1, to sort groups of 4 bitonicly, then 421, then 8421 and so on
did some tracing on paper, and did a pseudocode version in google sheets, and i think i know how to properly implement bitonic now
i just realized i did something wrong in that, the 3rd step should be a center-ed level 1 mask, but uh somehow tracing thru it it still worked? idk how that happened
added a tiny bit of an update, if due to timings, search for a depth was cut off, and if its iterative node count is less than 40% of last depth (which for now I think is a decent margin, considering nodes grow a lot per depth, but I think the move ordering is good enough that most good moves should be abled to be searched within that window, might increase it depending on later testing, but i still have to get simd move ordering done) it's not trusted and the last best move is used
(the move in the screenshot before I added the feature suggested g6 and -3 as eval, which is inaccurate, despite my search settings having the main branch (move ordered nxe3 branch) be searched first, i guess it doens't prevent wild nodes from faking a rly low eval because other branches weren't searched)
also might add an early abandonment feature to save time, where it estimates if it can get the next depth done within the given time, if not abandon and just save some time for the future
turned off the non-functional simd sorting and went back to scalar sorting
consistently around in profiling between 24% self-cpu time
it appears that scalar sorting is just expensive..? I tried many different sorting algs and this is one of the best performant ones
nps is only slightly impacted, decreased by 25% in exchange for sorting that actually works
still likely gonna work on simd (although over the course of the day having hand traced the sorting for a 16 element list it's gonna be very weirdly written)
not much coding got done, but did a lot of research on bitonic sorting, should be able to fully get the sorting fully functional within the week
(probably only gonna be done on the weekend cause uh classes r stressful)
Am starting to consider the overhead cost of loading vectors compared to the efficiency of simd
prob will do a test run with a bunch of test cases later once i get it working to see which one is better
I swear if simd end up being slower i'm gonna crash out bc i spent too long on this :sob:
let's not talk about how sometimes during refactoring i would just remove features on accident...
def didn't just realize today that move heuristics was off the entire time / however long it's been
well uh it's back now ig yippee
erm dug a bit deeper and turns out move ordering wasn't working this entire time............. UGHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH
figured out stopping early, with searching the first move, the principal variation, to its full depth so it can exit nicely with a good eval score and a hopefully good move, however, for some reason during testing after playing as white, e4 Nf6 d4, the engine finds the right move from depth 1-8 but abandons it at depth 9 for some reason, probably due to the timings, but the pv should be searched to full so it shouldn't return a eval of 0 as well as returning a6 as the best move
after some more testing with 20s time limit it manages to find nxe4 at depth 9 and even depth 10, even though depth 10 was the one that got cut off, something's off and ig it's a sign to do more digging
ok finally figured out 3fold detection in search, and also fixed a bug that didn't show up until now for some reason that involved integer overflow, now although the 3f is now in place, it rarely detects any (from starting pos playing e4 searching to depth 10 doesn't result in any hits) unless I force it to search a possibility like playing some knight shuffles. I was confused by the 0 detections for a while though because I was thinking no way that my pruning and stuff got rid of all 3f cases, but I guess it does since I know it does work (famous last words, i hope not)
time to go back to UCI and stuff
I'm legit dumb
so, I forgot to implement 3 fold detection for some reason up until a few days ago, so I decided to side track to work on that because that's a very essential part of an engine, it was 2 parallel arrays with 2000 elements, one stores zobrist keys used as a stack for board comparison, one stores irreversibility so the code can break out early when it could, when a position reached is no longer possible to be repeated by any position that came before it, (such as a pawn advancing forward, no position before that pawn being advanced can be the exact same as now, because pawns can't reverse in direction).
Anyways, the implementation involved checking for moves that are irreversible and setting the correct index to true, but for some reason, the logic i used for checking if a piece that moved is a pawn, is the following: (move.PieceMoved &(bitwise and) (Piece.WhitePawn | Piece.BlackPawn))
the thing is, this is a technique used for flags, that use bits, but Piece is an enum with integer values, so bitwise ands don't work because the white and black pawns are not clean power of 2 integers, so a reversible move (such as a knight jumping back and forth) would be flagged as irreversible, so uh... it took me forever to debug this one simple thing, (and caused me to basically leave for a few days before coming back today and finally seeing the issue)
Below is the code segment, after being debugged, so now it's correct
why is it so difficult to increase TT hit rateeeee :sob:
anyways, worked on more UCI, profiled the performance a bit and improved hotspots, and also tried to increase TT hitrate by loosening the pruning and reduction factors, which raised the hit rate from ~~10% to ~~20%, but it's still terrible, and I have no idea why
FOR THE ISLAND (and the balloons lol)
anyways, actually continued working on UCI class (NO WAY BROS ACTUALLY DOING IT)
added some commands that are able to be parsed properly, still need to implement the rest of the commands like go and ucinewgame whatnot,
Also learned that engines reset after each move, or rather the way UCI does it is stateless, meaning they give you the board state after each move, so if the engine crashes they don't need to depend on internally stored boards to continue playing.
It does slightly annoy me though that a typical UCI game makes the engine reset, and then play the moves over, which is only a tiny bit costly computationally, but I personally would've preferred storing internal states and a stack of moves made, because that's what I've already implemented
continued to try and fix TT for better hit rates, added a bunch of statistical stuff to try and figure out the problem, which is likely that the move pruning is too aggressive, albeit collisions are also part of it. This also for some reason killed the nodes per sec from ~~2M to ~~1M, which i'll try and fix in the next update
Need to continue working on UCI too because I paused work on that to work on TT and stuff
Also started working on Chess2, a side project which is why you may see more gaps between updates for this engine, once again a bit burnt out
worked more on the transposition table more, trying to improve hitrates, did get it up to around 17% at one point but it's still kinda bad for depth 11 even though the TT was allocated 256MB ram (albeit from testing changing the allocation size from 16~256 didn't make too much difference), much improvements needed
Did some performance profiling to try and do more optimization, and added a Transposition table lookup/hit counter, and it turns out the TT hit rate is only 10.9% which is kinda low
:sob:
gonna try and improve that now
Created a new version of magic number generators in C++ -- unlike C# like the engine, because it's faster -- to try and solve the problem of magics, especially for rooks, being realllllly slow. Well, exaggerated, but during profiling rook magic calls alone take 5% of the processing time which is way more than I thought, and most of it is only during quiesce, which is concerning, even though quiesce should have less nodes than the major branching by a lot, most of the magic call are from quiesce, might look into capture move generation logic later
Time:
3 hours from this from the magic gen, 2 and a half from the chess engine
started work on UCI support, decided I wanted to make this version UCI, and do some testing, also found out out of the blue that my pruning algorithm was backwards, since the depths I use for the PVS (negamax) uses reduction in depth, so each iterative depth it starts by calling it with depth(n) and when it gets to depth(0) it calls the quiesce, but for pruning those depth values aren't helpful, so now I include a ply counter variable, but adding it has had some consequences and has broken some things, which will need to be debugged
Also added in time tracked from other project files that was clones of the engine to help me debug it at points during development
fixed filtermoves, did an overhaul on pawn move generation because when profiled pawn moves were a bit slow, now uses bit tables in movegen. Started debugging on special move gen for the quiesce, because currently there are some bugs with it that sometimes makes white pawns go to 8th rank without promoting, and then try to generate a move further, made the engine print out all the fen it encounters in quiesce before the crash, at the depth i knew the crash was going to happen at, so i can scan thru some of them and try and figure out why
added some debug methods in a new class to help debug the new move filtering, fixed one issue where the king was walking into an area controlled by opposing sliders (bishops, rooks, queens) because the method for checking if the square is attacked was using magics for slider attacks, using the current board state, meaning that the king acted as a blocker, when it shouldn't have, now trying to figure out why checkmate doesn't render the whole move list empty like it should
After a week of burnout trying to debug the engine and some new features, a progress update:
fixed many bugs in the new filtermoves system, but there are still bugs because the node count for each depth is different when using the old play move, check legality, undo than the new system
Added a new incrementally updated material delta tracker after a few iterations of prototyping, to speed up material eval
lots of debugging in general, and getting close to 1.5M NPS
turns out, the magic number generator that I coded is faulty, and the numbers it outputted actually allowed illegal moves, such as Be3 on starting position, I never realized this until I came across an unrelated bricking bug a few days also as mentioned in the devlogs that started when I added the new move filtering, I have reverted to the old magic numbers, and will continue to work on debugging the bricking problem. I have made a copy of my last working code using List before to compare, and that's how I found out that it was the magic numbers giving wrong slider movegen. I will try to figure out why the new movefiltering bricks the whole engine though
debugging the new filtermoves system, still working on it, got a bit of work done because am on a road trip, so lotsa time to try and figure this out on my laptop
there's also some illegal move being generated which is weird
started working on a new move-filtering class, using bitboard-based filtering instead of simulating each possible move and deleting the ones that aren't legal.
honestly kinda burnt out from trying to refactor to maximize nodes per second, not much progress today, hopefully will be a lot more progress tmr
refactoring and rewriting methods, as well as removing unnecessary features to improve nodes per second, went from 400k->900k today, very good progress
I plan to make a separate version just for UCI, just copying the core code over and adding a UCI class, shouldn’t be too bad, but for this one I plan to make a GUI myself, and allow people to play with it directly
plans for UCI support? makes it easier for chess guis and to put on lichess with lichess-bot
tuning tuning tuning... added some debug statements to tell me how effective the pruning is at doing their job, so depth is quite reduced for now. Will continue to work on tuning upgrades, maybe better move ordering, and then, focusing on squeezing out every ounce of raw nodes-per-second I can get, currently around 400-500kNPS which is ok but I would much rather be at least 1M NPS, which would allow for much better searching at high depths
FINALLY got the new system to work, and also made some optimizations, such as a complete overhaul to the flagCheckAndMate method that is run on all legal moves, as a significant portion of time is spent there before, FIRST DEPTH 15 SEARCH
Still debugging the new Span version, it's somehow pretty inefficient, and also, still not doing the same thing as before. Will continue to debug until it has the same behavior as before, but I think I'm almost there
Almost at 24 hour mark!
AGHHHHHHHHHHHH chess engine still broken from trying to refactor everything to use Span instead of List asjdfjsdfjsaldfjasl;dfjak nothing works anymoreeeeeeeeeeeee
To be fair the change is basically updating the architecture of the whole program... bugs are expected...
Current status: engine breaks after depth 3, and just bricks itself, there's something running, because it doesn't let me type, but it doesn't end whatever process it was doing... never ending...
slay
Refactoring, refactoring, refactoring...
Struggling to refactor all of my method that uses List to use Span, which I've found out to be a lot more efficient as there's a not a large amount of repeated memory allocation, will continue and hopefully finish up tmr
Trying to refactor movegen to be more efficient, as for a bitboard engine, despite the progress I've made, it's still slow in nodes per second, currently got it up to 400k nodes per second, which is decent, but not good. Will continue refactoring for efficiency tmr.
Screenshot is of the newly refactored flagcheckandmate that improves efficiency
TUNING... A LOT OF TUNING.... Lots, and lots, and lots, and lots of tuning
mostly just tuning the pruning parameters, as well as the aspiration window size, and futility margin
AT LEAST IT SEARCHES TO DEPTH 13 PRETTY GOOD NOW WOOOOOOOOOOOOOOO
ALSO FIRST DEPTH 15!!!!!!!!!!111!!11!!!1!!1 (albeit a huge node explosion after 14, it appears that there's always going to be a node explosion at some point, just how long you can delay it, and it eventually might get to the point that too much actual accuracy will be sacrificed that it won't be worth pruning that much
There's a chance I could be sacrificing a lot of accuracy by pruning so much, will do more testing later
added some more advanced eval, such as king safety, pawn struct, and tuned and fiddled around with some pruning parameters, getting real good now, might do some live testing soon against stockfish and see how things go
added some phased piece evaluation, will be working on king safety and pawn structure and such later
tried to add some more features to make it go faster or more efficient, only slowed it down or broke the engine... enough coding for the day
FIXED NULL MOVE PRUNING BUGS BOIIIIII and also added more features, like aspiration window, futility pruning, and late move reductions, it's now a lot faster, (by cutting a lot of positions searched)
trying to debug null move pruning, currently very broken, returns an invalid move after searching past depth 5
Did a bunch of refactoring, added a small method that allows for future multithreading support, currently working on null move pruning for more search efficiency
i used ctrlf on the whole project and didn’t find anything
but for some reason there was another bit i forgot about??
uh i’m gonna see if modifying it does anything