Skip to main content

News

Topic: Sphere v2 API discussion (Read 34864 times) previous topic - next topic

0 Members and 1 Guest are viewing this topic.
  • Fat Cerberus
  • [*][*][*][*][*]
  • Global Moderator
  • Sphere Developer
Re: Sphere v2 API discussion
Reply #75
Yeah, Approach 2 seems pretty close to the optimization I have in mind.

Regarding that 4-vertex upload: Keep in mind that's 4 vertices per image every single frame.  In Galileo's case, I can optimize away some of the matrix uploads (the projection matrix rarely changes, e.g.), but you're always going to have to pay for those 4 vertices of a transformBlit.

I have a few optimization ideas that should give Galileo a performance advantage even for small workloads, but I'll refrain from mentioning them here until I've had time to test them in-house. :)
neoSphere 5.9.2 - neoSphere engine - Cell compiler - SSj debugger
forum thread | on GitHub

  • Fat Cerberus
  • [*][*][*][*][*]
  • Global Moderator
  • Sphere Developer
Re: Sphere v2 API discussion
Reply #76
neoSphere 5.9.2 - neoSphere engine - Cell compiler - SSj debugger
forum thread | on GitHub

  • Rhuan
  • [*][*][*][*]
Re: Sphere v2 API discussion
Reply #77

Regarding that 4-vertex upload: Keep in mind that's 4 vertices per image every single frame.  In Galileo's case, I can optimize away some of the matrix uploads (the projection matrix rarely changes, e.g.), but you're always going to have to pay for those 4 vertices of a transformBlit.

Sure transform Blit has to do 4 vertices per image every frame BUT Galileo has to do a Matrix and some number of uniforms per image every frame (in examples where the images move independently and can change colour), I suppose if we think purely in terms of how many numbers are passed each vertex should be 8 numbers (x, y, u, v, r, g, b, a), so 4 = 32 numbers, whilst a matrix for 2d drawing is 9 and a matrix for 3d drawing is 16. So as long as there aren't too many uniforms the Galileo approach involves less numbers needing to be sent - I'm not sure what extra overheads there are in sending each data type though.


See also https://github.com/fatcerberus/minisphere/issues/184#issuecomment-313927121
Thanks.

  • Fat Cerberus
  • [*][*][*][*][*]
  • Global Moderator
  • Sphere Developer
Re: Sphere v2 API discussion
Reply #78
An interesting observation I've made is that there is no fixed-function pipeline: Allegro emulates it using a shader.  Even something like setting a matrix amounts to uploading a uniform.  That's why the default shader has uniforms like `al_projview_matrix`.

This is good to keep in mind going forward, as it could have performance ramifications.  All this time I've been assuming the GPU was doing the transformations like in the old days, but it's all handled by the shader now.  That's... rather elegant, actually. :D
neoSphere 5.9.2 - neoSphere engine - Cell compiler - SSj debugger
forum thread | on GitHub

  • Rhuan
  • [*][*][*][*]
Re: Sphere v2 API discussion
Reply #79
Quick question, did you test the effect of putting the speed limit on - I think there's a serious problem of some kind with the implementation of frame rate limits.

  • Fat Cerberus
  • [*][*][*][*][*]
  • Global Moderator
  • Sphere Developer
Re: Sphere v2 API discussion
Reply #80
No, but I can test it out.  If your graphics driver enforces vsync that might be a cause of issues, though.  The framelimiter works by timing frames and then compensates by either adding delays (for short frames) or skipping subsequent frames (if a frame took too long).  If vsync is on the vsync delay can get "out of phase" with the framelimiter and drive the frame rate down even further.  However, that should be self-correcting in the long run.

If vsync is turned off then I have no idea what's wrong.  But I'll give it a go tonight and see if I can find out.
neoSphere 5.9.2 - neoSphere engine - Cell compiler - SSj debugger
forum thread | on GitHub

  • Fat Cerberus
  • [*][*][*][*][*]
  • Global Moderator
  • Sphere Developer
Re: Sphere v2 API discussion
Reply #81
Okay, you're right, there's definitely something wrong: With vsync disabled in my nVidia settings, I set speed_limit = true and modes 0 and 3 get 60/60 FPS, however modes 1 and 2 get ~30/60 FPS.  This indicates to me that every other frame is being skipped--the engine thinks every frame it draws is late, which doesn't make sense since both modes exceed 60 FPS with the limiter off.  Unfortunately I don't know what's causing it; it's not related to vsync like I initially thought.
neoSphere 5.9.2 - neoSphere engine - Cell compiler - SSj debugger
forum thread | on GitHub

  • Fat Cerberus
  • [*][*][*][*][*]
  • Global Moderator
  • Sphere Developer
Re: Sphere v2 API discussion
Reply #82
@Rhuan: With my experimental changes to avoid render target changes I managed to get the Galileo mode 0 to run faster (200 FPS) than the Sv1 mode (160 FPS).  Modes 1 and 2 are still slower and I can't figure out why (100 FPS).
neoSphere 5.9.2 - neoSphere engine - Cell compiler - SSj debugger
forum thread | on GitHub

  • Rhuan
  • [*][*][*][*]
Re: Sphere v2 API discussion
Reply #83

@Rhuan: With my experimental changes to avoid render target changes I managed to get the Galileo mode 0 to run faster (200 FPS) than the Sv1 mode (160 FPS).  Modes 1 and 2 are still slower and I can't figure out why (100 FPS).
I would think that Mode 2 should be the slowest, as it transforms and draws 1,000,000 vertices every frame. (The other 3 modes each draw just 4,000 vertices)

I can't understand why mode 1 would be slow though unless the slow down is from fiddling with the transformation matrix.

Modes 1 and 2 both do the below transformation actions for every ring every frame i.e. 1000 times per frame which Mode 0 doesn't have to do:

Code: [Select]
foo.matrix[1][0] = 3 * (Math.sin(rings[i].t * Math.PI/45) - Math.cos(rings[i].t * Math.PI/45)) / 8;
foo.matrix[1][1] = 0.5 - Math.sin(rings[i].t * Math.PI/45) - Math.cos(rings[i].t * Math.PI/45);
foo.matrix[1][2] = 3 * Math.cos(rings[i].t * Math.PI/45);
//...a few lines later
foo.translate(0, 50 * Math.cos(rings[i].t * Math.PI/100));


  • Rhuan
  • [*][*][*][*]
Re: Sphere v2 API discussion
Reply #84
I forgot to say above, great work fixing the rendering speed though from 90 FPS to 200 FPS is brilliant.

  • Fat Cerberus
  • [*][*][*][*][*]
  • Global Moderator
  • Sphere Developer
Re: Sphere v2 API discussion
Reply #85
One issue I'm having is that rendering to Sv1 surfaces doesn't seem to always work.  I can't reproduce it in isolation though, it only shows up with the textboxes in Spectacles (which use v1 surfaces), the text doesn't show up.  I'm guessing it's something to do with the transformation matrix not being set or something, but I haven't been able to pin it down yet.  It's always something, isn't it...
neoSphere 5.9.2 - neoSphere engine - Cell compiler - SSj debugger
forum thread | on GitHub

  • Fat Cerberus
  • [*][*][*][*][*]
  • Global Moderator
  • Sphere Developer
Re: Sphere v2 API discussion
Reply #86
Alright, the surface bug seems to be fixed and I've committed my changes.  Sphere v1 performance remains the same, but Galileo is much faster now and even beats the v1 primitives in performance.  I touched a lot of code to implement the fix, but ultimately it was a simple 3-pronged approach:


  • Avoid setting the shader or render target if it's the same as the previous one

  • Avoid resetting/undoing a render target change after drawing something

  • Call galileo_reset() before drawing a Sphere v1 primitive



I have a feeling there might be a bug with the shader management because Allegro tracks them per-bitmap whereas my code tracks the current shader globally.  But other than that, everything seems to work properly now. :D

edit: As for that test code, I get about 195 FPS for mode 0 when running under SSJ, and ~215 FPS for the non-debug engine.  That's a good sign: It means rendering performance is good enough now that JS execution is starting to become the bottleneck instead.  I still get no output though.  I thought maybe that might be something to do with my graphics driver, so I tested it on my laptop with its Core i7 iGPU and... same thing.  Modes 0 and 1 produce no output, only modes 2 and 3 show any rings.
  • Last Edit: July 11, 2017, 01:33:54 am by Fat Cerberus
neoSphere 5.9.2 - neoSphere engine - Cell compiler - SSj debugger
forum thread | on GitHub

  • Rhuan
  • [*][*][*][*]
Re: Sphere v2 API discussion
Reply #87

edit: As for that test code, I get about 195 FPS for mode 0 when running under SSJ, and ~215 FPS for the non-debug engine.  That's a good sign: It means rendering performance is good enough now that JS execution is starting to become the bottleneck instead.  I still get no output though.  I thought maybe that might be something to do with my graphics driver, so I tested it on my laptop with its Core i7 iGPU and... same thing.  Modes 0 and 1 produce no output, only modes 2 and 3 show any rings.
Good news on the speed. The lack of output is really really weird though as they all draw for me.

The common feature of modes 0 and 1 is that they both use a textured shape (mode 2 has no texture)
  • Last Edit: July 11, 2017, 04:14:23 am by Rhuan

  • Fat Cerberus
  • [*][*][*][*][*]
  • Global Moderator
  • Sphere Developer
Re: Sphere v2 API discussion
Reply #88
It seems like something must be going wrong with the texture generation.  I added this code:
Code: [Select]

var ringtex = output.toTexture();
while(true) {
    require('prim').blit(screen, 0, 0, ringtex);
    screen.flip();
}


And nothing shows up on the screen.

edit: It turns out that render-to-texture is broken in my latest builds.  I checked out v4.6.0 and rings show up in all modes.  Then I made a little test program to draw a line loop to a surface and that's broken in the latest build too (it fills the surface instead of drawing lines).  So at least now I have a lead.  More bugs, what fun... :P
  • Last Edit: July 11, 2017, 11:09:35 am by Fat Cerberus
neoSphere 5.9.2 - neoSphere engine - Cell compiler - SSj debugger
forum thread | on GitHub

  • Fat Cerberus
  • [*][*][*][*][*]
  • Global Moderator
  • Sphere Developer
Re: Sphere v2 API discussion
Reply #89
The bug looks like something screwy with the transformations.  The following code fills the surface with red when it should just draw a single pixel in the corner:

Code: (javascript) [Select]

const Prim = require('prim');
var s = new Surface(100, 100);
Prim.drawPoint(s, 0, 0, Color.Red);
var i = s.toTexture();
while (true) {
    Prim.blit(screen, 5, 5, i);
    screen.flip();
}


I haven't been able to figure out what's wrong.  git bisect says the bug was introduced here:
https://github.com/fatcerberus/minisphere/commit/910ad38a131ed85d372b8fb5080d76f034379ffe
but I have no idea what part of that change is causing it.

edit: Dammit, that shader bug bit me after all.  Galileo was using the wrong shader when rendering to a surface because of this:
Quote
I have a feeling there might be a bug with the shader management because Allegro tracks them per-bitmap whereas my code tracks the current shader globally.
  • Last Edit: July 11, 2017, 01:11:41 pm by Fat Cerberus
neoSphere 5.9.2 - neoSphere engine - Cell compiler - SSj debugger
forum thread | on GitHub