Skip to main content

News

Topic: TurboSphere (Read 190918 times) previous topic - next topic

0 Members and 3 Guests are viewing this topic.
  • Fat Cerberus
  • [*][*][*][*][*]
  • Global Moderator
  • Sphere Developer
Re: TurboSphere
Reply #555
I do remember you bringing this up a while back, and I think Radnen implemented the same in SSFML.  I should probably allow this in minisphere as well.  I think that would be the first change that might warrant a bump to v1.1. :P
neoSphere 5.9.2 - neoSphere engine - Cell compiler - SSj debugger
forum thread | on GitHub

  • Radnen
  • [*][*][*][*][*]
  • Senior Staff
  • Wise Warrior
Re: TurboSphere
Reply #556

I do remember you bringing this up a while back, and I think Radnen implemented the same in SSFML.  I should probably allow this in minisphere as well.  I think that would be the first change that might warrant a bump to v1.1. :P


Yep. In fact passing a function was the only fastest method, until I added the ability to store pre-compiled scripts in Jurassic... then you could do an either/or kind of situation. Though I wonder if the JIT compiler actually optimizes functions passed in better than compiling from essentially a new source utilizing the same context.

I need to benchmark it, but I wonder if:
Code: (javascript) [Select]

//Is this faster
SetUpdateScript(func);

// than this?
SetUpdateScript("func()");
If you use code to help you code you can use less code to code. Also, I have approximate knowledge of many things.

Sphere-sfml here
Sphere Studio editor here

  • Fat Cerberus
  • [*][*][*][*][*]
  • Global Moderator
  • Sphere Developer
Re: TurboSphere
Reply #557
In minisphere I can guarantee the former would be faster.  The engine tends to stutter in the lithonite demo because lithonite abuses delay scripts and Duktape takes a bit to compile, even if it's only a single line of code.  Don't know about Jurassic though--the JIT may even things out.
  • Last Edit: April 09, 2015, 06:38:12 pm by Lord English
neoSphere 5.9.2 - neoSphere engine - Cell compiler - SSj debugger
forum thread | on GitHub

Re: TurboSphere
Reply #558


I do remember you bringing this up a while back, and I think Radnen implemented the same in SSFML.  I should probably allow this in minisphere as well.  I think that would be the first change that might warrant a bump to v1.1. :P


Yep. In fact passing a function was the only fastest method, until I added the ability to store pre-compiled scripts in Jurassic... then you could do an either/or kind of situation. Though I wonder if the JIT compiler actually optimizes functions passed in better than compiling from essentially a new source utilizing the same context.

I need to benchmark it, but I wonder if:
Code: (javascript) [Select]

//Is this faster
SetUpdateScript(func);

// than this?
SetUpdateScript("func()");



It's really close-as-makes-no-difference in SM. In the second case, the parser can generate the bytecode so fast it doesn't really matter, and such a simple call still mostly relies on the functions it calls to be optimized, which would be necessary in the first case anyway.

You could theoretically get a tiny boost the second way since the parser in SM doesn't kick in (and the low optimizer in V8, a little slower in this instance), but small parses like that are lightning fast in both SM and V8 since they are very common on web pages. I would be surprised if, in any non-contrived usage, performance was notably different in SM, V8, or JSC or any other web-oriented JS Engine. I believe that V8 even caches small string literals that are compiled, and attempts to reuse the parse info when doing things like this.

I mainly advocate the first case because it is so much nicer, and encourages better practices.

It's good that it's also faster in non-web-oriented engines!

Re: TurboSphere
Reply #559
I've almost got the OpenGL 2.1 renderer working in Sapphire. This will let it work on almost all machines I own, most importantly my Windows laptop and my Solaris machines that have no Mesa drivers (which is usually slower, but bah). I've only merged in the changes that affect the implementation of the default OpenGL 3.3/4.1 renderer, though.

Now I need to read up on a couple ancient OpenGL 2.1 functions...I've basically forgotten all that I knew about it! I also need to remember how exactly GLSL 1.1 worked ;_; But it'll be worth it.

EDIT:
So I got the OpenGL 2.1 renderer fully working, sans shaders. It's seriously vertex-limited compared to the 4.4 renderer. It can manage a full 60 FPS on a map, but it begins to spend most of its time pushing vertices.
In general, the with the 4.4 renderer TurboSphere uses an almost constant 7% CPU to draw a map at 60 FPS. With the 2.1 renderer, we need 20% to draw the map since it has so many vertices. It's still usable, and allows Sapphire to run on older hardware. And it still uses the same asynchronous renderer that the 4.4 renderer uses, so even when I started pushing way too many vertices to the renderer to blow down the FPS (4 64x64 maps at once saturates the CPU, 8 brings my new MacBook down to 20 FPS), the FPS is still smooth, although low.

I would suspect that the 2.1 renderer actually makes better use of the asynchronous rendering system, since it's so much more CPU-intensive. The game can run regardless of the hundreds, sometimes thousands of GL calls that 2.1 generates per frame.
  • Last Edit: April 23, 2015, 09:49:51 pm by Flying Jester

  • DaVince
  • [*][*][*][*][*]
  • Administrator
  • Used Sphere for, like, half my life
Re: TurboSphere
Reply #560
So it supports both GL versions now? Kickass job, FJ. :)

  • Fat Cerberus
  • [*][*][*][*][*]
  • Global Moderator
  • Sphere Developer
Re: TurboSphere
Reply #561
So I was perusing the TurboSphere source, and I have to say, you make pretty creative use of preprocessor macros.

Code: (cpp) [Select]
    #define ENSURE_PROP(NAME, VALUATOR, WRAPPER)\
    if(has_##NAME){\
        JS::RootedValue out_value(ctx);\
        JS_GetProperty(ctx, that, #NAME, &out_value);\
        if(!VALUATOR){\
            return false;\
        }\
        WRAPPER\
    }
   
    ENSURE_PROP(x, true, vertex.x = out_value.toNumber();)
    ENSURE_PROP(y, true, vertex.y = out_value.toNumber();)
    ENSURE_PROP(u, out_value.isNumber(), vertex.u = out_value.toNumber();)
    ENSURE_PROP(v, out_value.isNumber(), vertex.v = out_value.toNumber();)
    ENSURE_PROP(color, out_value.isObject() && color_proto.instanceOf(ctx, out_value, nullptr),
        {TS_Color *color = color_proto.unsafeUnwrap(out_value.toObjectOrNull());
        vertex.color = TS_Color(color->red, color->green, color->blue, color->alpha);}
    )
   
    #undef ENSURE_PROP


I think somebody misses their closures. ;)
neoSphere 5.9.2 - neoSphere engine - Cell compiler - SSj debugger
forum thread | on GitHub

Re: TurboSphere
Reply #562
It's a simple way to avoid a ton of typing. It is also less error prone than copypasta. The naive alternative would be to type out:

Code: (c++) [Select]

    if(has_x){
        JS::RootedValue out_value(ctx);
        JS_GetProperty(ctx, that, "x", &out_value);
        vertex.x = out_value.toNumber();
    }
    if(has_y){
        JS::RootedValue out_value(ctx);
        JS_GetProperty(ctx, that, "y", &out_value);
        vertex.y = out_value.toNumber();
    }
    if(has_u){
        JS::RootedValue out_value(ctx);
        JS_GetProperty(ctx, that, "u", &out_value);
        if(!out_value.isNumber()){
            return false;
        }
        vertex.u = out_value.toNumber();
    }
    if(has_v){
        JS::RootedValue out_value(ctx);
        JS_GetProperty(ctx, that, "v", &out_value);
        if(!out_value.isNumber()){
            return false;
        }
        vertex.v = out_value.toNumber();
    }

    if(has_color){
        JS::RootedValue out_value(ctx);
        JS_GetProperty(ctx, that, "color", &out_value);
        if(!(out_value.isObject() && color_proto.instanceOf(ctx, out_value, nullptr))){
            return false;
        }
        TS_Color *color = color_proto.unsafeUnwrap(out_value.toObjectOrNull());
        vertex.color = TS_Color(color->red, color->green, color->blue, color->alpha);
    }


I saw this kind of thing a lot in the Sphere source, and just sort of ignored it. It's something I kind of learned to appreciate at Mozilla.
I mean, I could do something with templates, but some times that just ends up being really complex. Sometimes the most readable, simplest way to do it is just a macro. I don't think this warrants anything more complex.

Anyways...
I've updated TurboSphere to use SpiderMonkey 40. Which meant a total of about 20 lines of changes, mostly to Sapphire and how it handles Group and Shape assignments. Compared to how V8's API changes, SpiderMonkey is extremely stable.

I also modified the Delay function to perform GC'ing, if more than 10 milliseconds of delay is requested, and if we have not GC'ed in the last second.

SM 40 also uses much less memory...we went from 40 MB running the test map down to 23 MB! I'm really impressed, that's much less than I thought. It's nice to have a JS engine that can actually manage its own memory for once.
  • Last Edit: April 29, 2015, 07:07:10 pm by Flying Jester

  • Fat Cerberus
  • [*][*][*][*][*]
  • Global Moderator
  • Sphere Developer
Re: TurboSphere
Reply #563
GC'ing on Delay() is a good idea, I'll have to implement that now. :D

Agreed on the rationale behind the macros, although I will generally try to factor it into a function if possible when boilerplate starts to pile up too much.  Ive had too many instances where a syntax error in a commonly used macro causes the compiler to spew out a hundred useless errors that don't help me to find the problem at all (a missing close brace in a macro can be a nightmare).  You'd think things like macros would be mostly write-once... But I'm a refactoring fiend so those are about as stable as the rest of my code. :P
neoSphere 5.9.2 - neoSphere engine - Cell compiler - SSj debugger
forum thread | on GitHub

  • Fat Cerberus
  • [*][*][*][*][*]
  • Global Moderator
  • Sphere Developer
Re: TurboSphere
Reply #564
So in Sapphire, is there a reason you use a triangle strip instead of a fan for Shapes with >4 vertices?  I've debated switching minisphere to always use fans (which would of course then be incompatible with TS) because with a tri-strip, the following:

Code: (javascript) [Select]
var numPoints = 8;
for (var i = 0; i < numPoints; ++i) {
var phi = 2 * Math.PI * i / numPoints;
var x = Math.cos(phi) * 32;
var y = Math.sin(phi) * 32;
vertices.push(new Vertex(x, y));
}
var shape = new Shape(vertices, image);


...produces an odd crescent shape instead of an octagon. ???
neoSphere 5.9.2 - neoSphere engine - Cell compiler - SSj debugger
forum thread | on GitHub

Re: TurboSphere
Reply #565
I used strips because it's easier to make a composite shape with them, as shown in the map engine. That's much more difficult/impossible with a triangle fan. Using triangle strip, you can still emulate the triangle fan behaviour:

Code: (JavaScript) [Select]

    let numCorners = 8;
    let vertices = new Array(numCorners*2);  
    for (let i = 0, e = 0; i < numCorners*2; ++i) {
        if(!(i%2)){
            vertices[i] = {x:32, y:0};
        }
        else{
            let phi = 2 * Math.PI * (i>>1) / numCorners;
            vertices[i] = {x:Math.cos(phi) * 32, y:Math.sin(phi) * 32};
        }
    }


I tend to prefer strips because you can still emulate fans, and OpenGL 3/4 is more VAO-limited than vertex limited (so pushing more vertices is better than pushing multiple VAO's, which a Shape represents).

Perhaps a way to switch between the two behaviours? They are both useful in different situations. It would be very easy to make this changeable in Sapphire.

  • Fat Cerberus
  • [*][*][*][*][*]
  • Global Moderator
  • Sphere Developer
Re: TurboSphere
Reply #566
Maybe leave Shape with its current behavior, and then have specialized constructors that always use one method or the other?  Or do the same thing, but rename Shape to AutoShape.  That would violate the current Galileo spec though...

It's funny you bring up VAOs though, because in the last hour I implemented vertex buffer support for Shapes in minisphere.  It really improves performance by a lot for shapes with lots of vertices.  To test it out I had it make a fan with 1,000,000(!) vertices and it still ran at 60fps on my i3 (took a while to upload all the vertices though...).  Drawing the same shape from a software buffer dropped me to about 20fps.

See, this is the kind of thing that happens when you leave me to my own devices for long: Shit gets done, but not without a ton of goofing off in the process. :P
neoSphere 5.9.2 - neoSphere engine - Cell compiler - SSj debugger
forum thread | on GitHub

Re: TurboSphere
Reply #567

It's funny you bring up VAOs though, because in the last hour I implemented vertex buffer support for Shapes in minisphere.  It really improves performance by a lot for shapes with lots of vertices.  To test it out I had it make a fan with 1,000,000(!) vertices and it still ran at 60fps on my i3 (took a while to upload all the vertices though...).  Drawing the same shape from a software buffer dropped me to about 20fps.


VAO!=VBO.

A VAO (Vertex Array Object) requires OpenGL >= 3.2 (or extensions in older versions), and encapsulates an entire set of vertices and bound buffers. A Vertex Buffer/VBO is just a way to upload data more quickly. You sill need to do all the binding manually, which is where a VAO is faster. A VAO allows these binds to done by the driver/GPU, and allows binding with implicit knowledge of the current state.

Assuming you are using the same divisions I do in Sapphire, VAOs make Groups with many Shapes faster, while VBOs just make the things faster for the total number of vertices, regardless of organization.

  • Fat Cerberus
  • [*][*][*][*][*]
  • Global Moderator
  • Sphere Developer
Re: TurboSphere
Reply #568
Well I'm actually not sure what Allegro is doing internally.  I just know I called al_create_vertex_buffer()  :)  Low-level stuff isn't really my scene, I'll let the middleware deal with the intricacies. :P. Either way, I was smart enough to fall back on al_draw_prim() if the vbuf can't be created.

That said, does D3D even have an equivalent to VAOs?  Way back when I used to do DX programming I remember reading about "vertex buffers" in the docs, but I don't really know how those map to the OpenGL world.
neoSphere 5.9.2 - neoSphere engine - Cell compiler - SSj debugger
forum thread | on GitHub

Re: TurboSphere
Reply #569
I really don't know. I've never actually dealt with any of the Direct* APIs. I've always managed with SDL+OpenGL.
I would assume that a D3D Vertex Buffer is either the analogue of an OpenGL VBO or VAO.

What you could do is see try making 100000 shapes with 4 vertices, and 4 shapes with 100000 vertices (or similarly different numbers). With Vertex Arrays, I can push many more Shapes, but a similar total number of vertices as without them.

It's also important to note that most GPU's will optimize for shapes that have duplicate vertices, mostly though occlusion and culling. You've got to push unique or semi-unique vertices through to see real performance.
  • Last Edit: April 30, 2015, 03:41:12 am by Flying Jester