Please sign in to access this page

SteveFish

SteveFish

2 devlogs
6h
Created by mattseq

A chess ai named after stockfish and steve. main dependencies are pytorch and python-chess. uses approximate q-learning with a neural net. didn't really get too far on this project but imma throw it up here anyway

Repository

Timeline

Earned sticker

Ok I already implemented the basic q-learning approach, kind of. Apparently I did it in a slightly different way. The normal way to do it is to give the model 2 inputs: the game state (in this case the chess board) and the action/move and it will evaluate its q value. Now instead of encoding the moves and passing them into the model I just evaluated the model at the state after that move. I'm not sure about the repercussions of this approach but i think it should work nearly the same. The model does seem to be learning! I only trained it on 200 games and it already has some basic idea of pushing pawns, backing up attacked pieces with other pieces, etc. It doesn't really seem motivated to capture pieces tho, which is strange. Another problem is that the reward function does not take into account the OTHER player's move. I'm not sure how the model does either tbh. I'm going to try to solve that now. I also recently learned about batch normalization so i'm going to add that too along with probably a replay memory. Here's some footage from a game, keep in mind that it only trained a hundred or so games. I wonder what its ELO would be?

Earned sticker

Ok so my friend Teo came to me with this idea a few days ago. I was busy with other projects so I didn't have time but now I'm pretty much at a good stopping point with all my projects on SoM so i'm just going to work on this until it's over. Who knows, I might even be able to ship this before the end of tomorrow. It's actually going pretty well so far. We selected pytorch and python-chess. Although I've worked on machine learning projects before, this is my first time using pytorch so there was a lot i don't know. For now though the model is preety simple and the implementation of q-learning for chess is the hard part. We've had some disagreements about how to do it but it seems like Teo doesnt really have too much time on his hands or just wants to learn more about pytorch and q-learning (and what i've already done) before contributing, so i've got the floor to myself for now. Here's a picture of the q-learning training method

Update attachment