June 16, 2025
I don't entirely know what the secret sauce was, but I rebuild the entire render pipeline from scratch starting from just drawing individual points, and I've made it back here: I can draw cubes using instanced rendering! Huzza!
Next, I'll make them look a bit more pretty and scale/color based on mass. Then, I'll try to integrate the physics simulation with it.
See that little black dot in the middle of the screen? That means my vertex shader isn't properly seeing the entities' positions. Jeez, Louise! More news at 6:00.
Rendering In Progress!
Bit of a teaser here-- I'm in the trenches squashing bugs; the latest issue is that, while I can get the render pipeline to accept all the data buffers properly without error, I can't get the damn thing to actually put pixels on the screen how I want them. :(
More progress on the main sim loop!
Lower effort devlog. I've finished the first draft of the main sim loop and written tests for entity management and buffer resizing, as well as keyboard and mouse inputs for camera control. Tests are passing nicely so far.
I also drafted the renderer using a basic render pipeline for BGFX that I painfully extracted information about from Claude. Thanks EnglishGarfield for suggesting Claude, it's actually pretty good and much better than ChatGPT for code.
I'm still procrastinating testing the rendering because I'm not excited to debug it :/
Drafting the Main Simulation Loop
This will take a lot of effort to complete, but I'm working on filling out the main simulation loop. This means I need to implement:
1. Camera and keyboard controls with SDL. Already drafted this, mostly complete as far as I can tell.
2. Logic for safely creating new entities and adding them to data buffers. This means:
1. Utilities for finding space in GPU buffers to slot new entities
2. Ways to track entities that exist on GPU side from the CPU side without knowing their positions etc.
3. Performing buffer resizing whenever more space is needed, again using another specialized compute shader.
3. Graphical rendering! I'll have to figure out some way to draw objects when only the GPU can know where they are or that they exist at all.
4. GUIs for interacting with the app. I will probably use Dear ImGui for this; it's supposed to be pretty easy to use with BGFX.
I have written a little bit of this code, but I still haven't gotten around to unit tests yet. I think this will mean a lot of manual labor coding followed by extensive sessions in the bug mines. Fun times, but once this is through I should have a minimum functional v1.0.0 to show off. Hopefully, my devlogs won't just be IDE photos much longer ;)
More debugging, and I fixed the gravity calculations.
Yes, another IDE photo. This time, I got the shader for gravity calculations fixed. I switched to use a 2D multithreading structure, making a sort of times table. For example:
. 1 2 3 4 5
1|
2|
3|
4|
5|
We run once for each cell here. The shader calculates force applied between the two entities it's run on, and adds the acceleration deltas to the entity on the X side. Pretty cool! I also had to use the atomicAdd()
function to avoid race conditions adding to the same entities at once. Either way, the photo below features my test case working :)
More BGFX Weirdness!!!
Okay, so I got the first test with adding, reading, and killing entities working just fine. Now, though, when I try to compute accelerations for gravity, I'm having issues. The BGFX frame call is hanging, but it goes through when I step through it with a debugger??? I'm still working through it, but I'll keep you posted. For now, I need to sleep.
Adding and reading single entities WORKS!!! I know it only says ~ 1 hour of time, but I forgot to get WakaTime set up on my Linux dev box for a while, so there's probably a lot of time here that didn't get recorded.
I tried so much stuff I honestly don't know what did the trick. The last thing I did was setting the texture image format to RGBA32F instead of RGBA8, which allows you to actually store a full set of floats, and reading data doesn't give back gibberish anymore. Not sure why that didn't occur to me before, but here we are. Now, you can add an entity to the data buffers and read it from CPU. Small steps. Oh yeah, and I got everything working on Linux... now to bring it to parity on macOS.
Big devlog coming soon, I think.
There is some WEIRD stuff happening in this poor project; I'm trying to test adding data to GPU buffers and reading them back to CPU with textures, but the data that's being spat up is pure nonsense. I'm now creating an Xcode project based off this CMake project so I can run the tests in Xcode and use the GPU frame capture debugger there to hopefully gain a bit of insight into what's going wrong.
Wish me luck!
Another IDE photo! I worked on basic scaffolding for the project and wrote drafts of the compute shaders for Velocity-Verlet integration and Newtonian gravitation.
It turns out data buffers for compute shaders are very unintuitive and a bit annoying to use. Basically, you can create buffers in CPU, and when doing so you flag them as readable and/or writable. You can never read data from a buffer on the CPU side, and you can only write data to a buffer if it's NOT marked as writable by the GPU. The only way, therefore, to write data to a GPU-writable buffer is to use a specialized compute shader that writes data on the GPU side.
The only way to read data back from the GPU side in your CPU code is to create a writable texture, write to it in your GPU shader, copy it to a non-writable texture with the readback flag in GPU using bgfx::blit()
, then copy the data from the texture to an already created space malloc'd in memory. All of this takes multiple frames to complete. Very annoying.
On the more algorithmic side of things, I've organized the procedure for calculating gravitation and motion. Basically, I will have six data buffers: old and new for each of position, velocity, and acceleration. According to the Velocity-Verlet algorithm, each frame, I will:
1. Calculate positions for each entity, using data from velocity_old
and acceleration_old
and depositing results in positions_new
2. Calculate gravitation for each entity, inputs positions_old
and outputs positions_new
(positions also stores mass in the w
component)
3. Calculate velocities for each entity, inputs velocities_old
and accelerations_old
and outputs velocities_new
4. Swap old and new buffers.
I won't go into too much detail but I'm sure you can work out how that would work. Anyway, this was long. Hopefully it all comes together nicely!
Sorry for all the IDE photos! I learned how to get GPU compute working and use data buffers to send data to the GPU; designed a quick, simple compute example. Now, I will get working on designing the architecture for the universe simulator proper... no less than 12 hours into the project :/
See above devlog for a nice description!
Fixed distribution. Added a release for Linux (you're welcome, reviewers! Please, see my project!) and made it... actually run on Linux. There was a wacky bug with SDL_ESCAPE being erroneously sent every frame. You should now be able to try it out if you're on Linux.
Shader compilation is FIXED!! Oh man, was this rough. I swear.
It turns out, the bgfx.cmake library I was using to make bgfx work nicely with CMake has a strange collection of issues that meant I couldn't get the shader compiler to include the directories I wanted. BGFX wants shaders to #include <bgfx_shader.sh>
or #include <bgfx_compute.sh>
in order to work, but these mysterious .sh
files are hidden in the source directory somewhere.
I couldn't get that included with the utility functions provided by bgfx.cmake
, so I wrote my own utility functions for interfacing with shaderc
. I also had to deal with some weird errors that I took way to long to realize were happening because of trying to compile shader types that aren't compatible with my toolchain. Regardless, I finally managed to compile a Metal-compatible compute shader on macOS, and I should (in theory) be able to do the same for Vulkan and OpenGL on other platforms.
Woah, BIG refactor. I had to start over from the very very beginning. I tried and tried, but it became clear that metal-cpp
would not work. It was too weak of a library to do much more than GPU compute, so I would have had to interface with Swift or Objective-C, but at that point, I might as well just write the whole thing in one of those languages for how much of a pain it would be to bridge.
I decided that, if I'm going to refactor what I'm doing, I might as well just build it from the ground up to be cross-platform. The point of locking into Apple was that Metal is supposed to be a walk in the park... but, at this point, it isn't. So, I decided to build it around bgfx (using bgfx.cmake to make it play nicer with CMake), with SDL3 as a backend. It was a huge pain to get started, but it all seems to basically be working now. Finally, I can get started working on GPU compute and the simulation architecture. Hopefully.
GPU Compute seems to be working! I got metal-cpp all set up (finally). This beautiful article on Apple Developer, translated into C++, is quite helpful for getting the basics down. Now, the more difficult task: how do I scaffold an actual freaking universe simulator??
jeez guys.... scaffolding a project that uses CMake, Metal, and metal-cpp is a surprisingly complex endeavor. I think I'm finally getting there, though.
Lightweight N-Body simulator to model entities being affected by gravity. Leverages GPU compute to simulate many entities in huge time scales. Still in early prototype stage.
FINAL VERSION!!!
Lots of time between this devlog and the ones before; I went on vacation for a while and didn't have an opportunity to work. Came back and discovered that I barely had enough steam to wrap up the project. Don't worry, I'll work on some more projects later :)
So what is 3DRenderer? Simple: it's a small, basic 3D rendering pipeline written in C to run on the CPU. 3D rendering pipelines are usually done almost entirely in hardware on the GPU, with OpenGL, Direct3D, Vulkan, etc. abstracting most of the process away from the developer. In an effort to learn more about 3D rendering, I made a simple implementation of all those processes purely in software. No hardware GPU assistance! It's a very simple program that only draws bright green lines and renders single objects.
If you're one of the ~26% of the world that runs macOS, you can check out the GitHub repo to demo it for yourself. Use WASD to move the camera and IJKL to rotate it. Enjoy one of the two 3D models included in the program, or bring your own Wavefront .obj file.
It turns out that 3D rendering is WAY more complicated than I thought; it's quite a formulaic process, but wrapping your head around it is very challenging, and I doubt it's possible to write a rendering pipeline without understanding it.
I did use GitHub Copilot to generate some code snippets and functions that I would have preferred not to write by hand.
'nother low effort devlog. Working on transitioning to a vertex and face type to store normals, colors, etc.
Low effort devlog so I don't lose my minutes. I refactored the culling logic and implemented z-buffering; working towards implementing colors! You'll here more about this in the near future, methinks.
-> Quaternion Rotation! <-
This is a polish improvement from something I had working before. I previously implemented camera controls including rotation, which worked using traditional matrix rotation manipulation. The problem with this is that, over hundreds of frames of manipulation, the camera matrix gets... corrupted, for lack of a better term. The rotation and scale numbers that take up the first 3x3 of the matrix become misaligned and it becomes impossible to properly reverse them in order to render the vertices.
The result of this is that, after rotating a bunch, the image gets really distorted and flattened in a weird way. Well, the solution to this is to encode rotation with quaternions, which aren't subject to any distortion. A quaternion is a 4D number represented as a + bi + cj + dk where a, b, c, and d are real numbers and i, j, and k are vectors. It's a bit wishy washy what all of that actually means, but the point is that we can use quaternions to encode rotation in 3D space.
Well, I wrote a super simple quaternion library, and now store the position of the camera as a simple vector3, and the rotation as a quaternion. It all still ends up as transformation matrices in the end, but it's not prone to misalignment like this. Very cool!
(P.S. movement uses QWEASD and rotation uses UIOJKL; press SPACE to reset rotation)
-> Camera Contols <-
This is a smaller one and there isn't much to talk about here. Basically just implemented a way to control the camera using WASD for movement and arrow keys to look. It's decently hacky and uses the transformation matrix system to work, so there is a likelihood for unintended behaviour. Just... don't try to do anything TOO fancy.
-> Vertex Culling! <-
The program no longer segfaults when there are vertices outside of the camera's view! Incredible! To do this, I implemented vertex culling, which checks for vertices outside of the bounds of the clip space cube.
If you saw my last devlog, you will know that clip space is an intermediate coordinate space in the graphics pipeline where the coordinates of all the vertices are represented with 3D coordinates that range between -1 and 1. Any vertices with coordinates outside of this range are outside of the camera's visual range. Pretty neat!
During the culling step, the program iterates through all the triangles in the scene, looking for vertices that are outside of the visual range.
1) If all three of a triangle's vertices are outside, it is simply discarded.
2) If two of the vertices are out of range, the program determines where the triangle's sides intersect with the plane that represents the border of the clip space cube. Then, it replaces the original triangle with a new triangle with the one original vertex that's inside and the two points where the sides intersect the borders.
3) Finally, if only one vertex is out of range, the program needs to create a trapezoid. That means two new triangles. It creates one triangle using the two points where the triangle's sides intersect the border and one of the original points that's in bounds, and a second triangle using the two points that are in bounds and one of the intersection points.
It's a bit hard to wrap your head around at first, but it's quite an elegant algorithm, I think. Next up, perhaps, I'll implement a way for the user to control the camera with the keyboard!
-> 3D WIREFRAMES! <-
First: my last post was super wrong about what the graphics pipeline is like. It's way more complicated than what I thought it was. The real pipeline looks a little something like this:
1) Interpret 3D models into a local data structure for vertices and faces (which vertices connect to which others to make triangles)
2) Transform the models into world space. This means adding a fourth coordinate (called the homogenous component and labeled w) on top of x, y, and z, and using matrix math to transform the vertices into absolute positions in the world.
3) Transform the models into camera space. This changes each vertex's coordinates to be relative to the camera by multiplying the inverse of the matrix that describes the camera's position in the world. Basically, where is each vertex relative to the camera?
4) Transform the models into clip space. This stage deals with perspective. Basically, this uses the camera's near and far viewing planes to create a frustum that describes what the camera can see, then normalizes it into a rectangular prism. In a frustum, the cross sections are larger the further you get from the camera. In clip space, we reshape the frustum so it becomes a rectangular prism, which draws points that are further from the camera closer into each other. That's the conceptual idea, anyway. This will significantly modify the homogeneous component (w) from earlier.
5) Normalize into NDC (Normalized Device Coordinates). This simply divides each vertex's x, y, and z coordinates by w to put all the coordinates in a range between -1 and 1. Vertices not in this range should be removed, but mine just segfaults instead.
6) Project to screen space coordinates. This extrapolates the x and y coordinates on the viewing screen for each vertex from the NDC space numbers using math that I don't entirely understand.
7) Create triangles from the vertices and the list of face mappings.
8) Draw the triangles to the screen using the system I devised earlier.
whew, that was a lot. My code has reached the point of doing all this stuff! Pretty neat right? I spent quite a lot of time researching this whole process. Most of the time, all of this processing happens inside the GPU, but this code implements the whole pipeline entirely with CPU. Next, I'll extend by culling out-of-frame vertices, filling in triangles, and adding support for color (maybe)!
Once I got through the boring task of scaffolding everything and getting SDL working to output my pixel buffer, it came time to start drawing triangles.
The basic render pipeline will look like this:
1. Interpret 3D model
2. Rasterize into triangles
3. Print triangles onto pixel buffer
4. Copy pixel buffer to display window
Step 4 is easy, and this log is about step 3. The core of this is the simple screenspace_draw_line()
function, which takes a pair of points and copies a line between them to the pixel buffer.
This function first determines the dy and dx of the line, then determines whether the slope is <= 1 (dx >= dy) or > 1 (dy > dx). If it's less than one, it iterates over every X-coordinate between the two points and determines the exact position of the line's Y-coordinate based on the slope and position of the first point. It then simply rounds the result to the nearest whole pixel. If the slope is greater than one, it does the same, iterating over the Y-coordinates and estimating X-coordinates instead.
I had some issues along the way which aren't terribly interesting: lines had drastically wrong slopes because I completely didn't realize I was doing integer division (whoops...). I was also having an issue with high slope lines appearing extremely jagged and sparse, which was what prompted me to implement the Y-iteration method for lines with dx > dy.
Next up, RASTERIZATION!
CPU-based 3D renderer in C. Uses all scratch-made math and algos to implement a complete render pipeline without the help of a GPU. See second-to-last devlog for a better description. NOTICE TO REVIEWERS! Please follow the instructions in the release notes for macOS-- you must allow it with Privacy and Security settings and THEN run it AGAIN with terminal passing the object model as a command line parameter!!!
This was widely regarded as a great move by everyone.