Re: TurboSphere
Reply #456 –
Upgrading to Yosemite made some old ghosts show up again.
Every once in a while, I used to have HUGE latency when I started TurboSphere. A frame would take an entire second to render, even though we were getting solid framerates. Once I updated, I found that this was always happening.
The issue is that before sometimes, and now always, I was getting bad timing with the engine and the render queue. This is because I was using a timeout system rather than a monitor system. So, no problem, I actually just wrote a monitor implementation that I rather like, and it's been tested to work well in Kashyyyk.
Using it made the delay much better, but now we were always getting only about 100 FPS, no matter how little drawing we do. The more I try to push through the pipeline, the slower it gets. I've noted before that mutexes are slow in OS X for whatever reason, but it turns out that condition variables are slow like that multiplied by how many monitors are waiting on them.
So, I went back to an older idea I had, which isn't as nice as just using monitors, but turns out to work very well.
We can consider a call to FlipScreen() to not really be a call to clear the screen, but rather to be a procedural marker that we are finished with a frame's worth of drawing, and that everything following it is in a new frame. So, what if we rendered our frames into not one queue, but multiple queues? We can cycle through, skipping any that are being drawn to, one frame per queue. This can result in frames rendering out of order on extremely rare occasions, but this chance decreases the more buffers we use. The likelihood is inversely exponential (I don't properly know why that is, I just did a bunch of profiling). With only 3 buffers as I originally tried it happens just about every other frame. Using 16 frames, it basically never happens.
I do know that for it to happen, we have to change from rendering many frames quickly to suddenly taking a long time to render a frame. This rarely happens, and the difference needs to be more than the time it takes to draw a frame in OpenGL (which is usually an order of magnitude higher than how long it takes to draw a frame from script). Additionally, if we keep rendering quite quickly again, we will outrun the render thread again.
This leads to dropped frames, but there's no way to avoid that if you draw faster then the maximum screen refresh rate. Well, there are software ways, but in hardware you lose frames.
Fixing out-of-order frames simple. We just need to flush all pipeline buffers except the one currently being drawn from when we cross over it. But, as noted, this will very rarely happen. Most often, we will just lose as many frames as we have buffers. Which is disappointing, but it will keep the FPS smooth (whereas flushing all render queues can be slow, especially to be sure we don't flush out an rendering-in-progress queue), and if we are really outrunning the render thread that quickly we were going to lose a lot of frames anyway. Better to do it smoothly, I suppose.
Using multiple render queues that we cycle through, we are back up to the huge theoretical framerates I saw on good runs before, and we are now constantly running quite smoothly.