Spherical forums Community for the Sphere game engine
New? Contact Us to register an account!

January 20, 2014, 07:09:47 am

Ah, that's a shame.
I never really thought about how you would go about using OpenGL from C#, I guess I assumed that OpenGL would be about as well exposed in C# as it is in C++ (on the MS side of things).

January 21, 2014, 05:03:13 am

So, I'm working on a smarter sprite batcher and I realize I can more or less do some immediate blitting as textures change. That means images can be drawn in order as they appear in code, but they may not always draw when things like fonts and windowstyles draw. This leads me to believe that I should render all sprites in the sprite batcher, not just images.

Anyways, I made a test (and it doesn't use external resources so anyone can run it) (on a 320*240 sized window)

Code: (javascript) [Select]


function TestManyImages() {
    var image = CreateSurface(48, 48, CreateColor(255, 0, 255)).createImage(),
        done = false,
        factor = 2,
        w = GetScreenWidth() * factor,
        h = GetScreenHeight() * factor;

    while (!done) {
        for (var y = 0; y < h; y += image.height) {
            for (var x = 0; x < w; x += image.width) {
                image.blit(x, y);
            }
        }

        FlipScreen();

        while (AreKeysLeft()) {
            if (GetKey() == KEY_ENTER) done = true;
        }
    }
}

Speeds:
SSFML-culling: 10000 fps (which seems to be my computers limit, I cannot go beyond this in performance).
SSFML: 7600 fps
SSFML - culling - nobatch: 8300 fps
SSFML - nobatch: 5400 fps

1.5 SphereGL: 3400 fps
1.5 Sphere32: 2000 fps
1.6 Sphere32: 2700 fps

Which means the sprite-batcher is better than the 15% I had been working with before.

It may not seem like a lot to move from immediate to batched, but it has longevity stability. So even at 10x the images the speeds are still the same while Sphere grinds to a sudden halt. But that's culling. The next test negates any culling and draws things on the same screen:

Replace the inner drawing function:

Code: (javascript) [Select]


                image.blit(x % (w / factor), y % (h / factor));

factor of 2:
SSFML fastest: 5900 fps
1.5 fastest: 3400 fps
1.6 fastest: 2000 fps

factor of 4:
SSFML: 1600 fps
Sphere 1.5: 450 fps

So, as you can see keeping things on the same screen hurts performance by a lot. Thankfully people don't usually draw 200+ images in the same screen bounds. Even if you made your own tile-map engine, it'd only draw the amount on screen.

January 21, 2014, 06:28:31 am

Out of curiosity, how are you drawing primitives?

January 21, 2014, 03:20:05 pm

By passing vertices to the GL window:

Code: (csharp) [Select]


            Color sfml_color = color.GetColor();
            _verts[0] = new Vertex(new Vector2f((float)x, (float)y), sfml_color);
            _verts[1] = new Vertex(new Vector2f((float)(x + width), (float)y), sfml_color);
            _verts[2] = new Vertex(new Vector2f((float)(x + width), (float)(y + height)), sfml_color);
            _verts[3] = new Vertex(new Vector2f((float)x, (float)(y + height)), sfml_color);
            Target.Draw(_verts, PrimitiveType.Quads);

_verts is a static array that gets overwritten by new vertices each time you ask to draw a primitive. It is then sent to the GPU with Target.Draw. It's similar to doing:

Code: (cpp) [Select]


glBegin(GL_QUAD);

glColor4f(r, g, b, a); // rgba components of sfml_color
glVertex2f(x0, y0);

glColor4f(r, g, b, a);
glVertex2f(x1, y1);

glColor4f(r, g, b, a);
glVertex2f(x2, y2);

glColor4f(r, g, b, a);
glVertex2f(x3, y3);

glEnd();

Except made a bit more flexible since I use an array. I can pass just 2 points for a line, and 3 for a triangle. But a GL display list could be used to cache these on a lower-level (which I can't do in SFML). It might make it go faster... or not, display lists seem to be only useful for static objects. But they can be scaled, transformed by manipulating the matrix stack, etc, but then again that's primitives. A GL display list doesn't help sprite batching all so much.

Come to think of it... I should add primitives to the sprite batch too (making sure I draw those in order with textures in code, too).

January 21, 2014, 03:52:51 pm

Are you texturing primitives with a full-color texture, or disabling texturing?

I've done many tests, and in C++, if you have to change the color and vertex values pretty much every call, it's just as fast to use immediate mode, buffers with rewritten subdata, fully rewriting buffers, or using vertex arrays. I haven't seen primitives be especially slow, but I haven't found any way that is really any faster than any other when it comes to sending vertex or color data.

But I thought up a simple way to make interleaved primitive and texture drawing faster. Just give a single pixel of every texture full color, set the texture coordinates of the image appropriately (so the image does not overlap the single pixel), and never unbind textures completely or disable texturing. That way, you just use the single pixel of full color to texture the primitve, you only need a single shader (and don't have to add any branches to the shader or swap shaders), and you reduce the number of state changes related to texturing since you never need to bind a texture or disable texturing to draw a primitive. If you put the pixel of pure white in the corner, you can even use texture coordinates of all zeroes for primitives and keep the all-zero texture coordinates in a very static buffer.

January 21, 2014, 04:02:48 pm

Hmm, that seems too low-level of a solution for SFML, again I'd have to use OpenTK for that. But, I could draw primitive rectangles faster perhaps if I used a single white pixel and stretched it to fit? I noticed in DirectX that was a really fast solution.

January 21, 2014, 04:16:50 pm

All you'd need to do is add an extra row or column to any image to be uploaded to a texture, and make one of the pixels white with full alpha. You do need to worry about texture coordinates of the images then, though.

Essentially, each texture becomes a texture atlas, holding at least a normal image and 1x1 image of 0xFFFFFFFF.

It's faster in OpenGL for a couple reasons. Never disabling texturing lets you use a single shader with no branches in it, which is faster than a branching shader and (by far) switching between shaders, especially multiple times a frame. That's why you'd want to texture primitives, at least. Second, you never need to change the bound texture to do a primitive draw. Changing textures is the slowest part of TurboSphere, that's part of why I came up with this (I doubt I was the first to think of it, however).

It slightly complicates image drawing, but actually simplifies writing shaders and primitive drawing.

January 21, 2014, 05:21:03 pm

Hmm, I'll do that then. I was going to add a feature to dynamically add loaded images into a texture atlas. It won't hurt having a single extra pixel in it.

Edit:
I moved text rendering to the sprite batcher and saw the fps of a screen with a text-heavy menu skyrocket from 4200 fps to 8300. That's not bad at all! Next is to move windowstyles over, and see what that adds.

Jest:
I have an interesting problem to solve (I haven't yet ran into it, but I presume I will soon enough). When someone creates a surface, then casts that to an image (a surface-created image) should you leave it as it's own texture or do you put that into a texture atlas? There isn't an easy answer.

What if you want to create a lot of surfaces and do a lot of little edits and then cast those to image to do some drawing techniques for a while, then throw that away. If that all gets put into an atlas, would you clear the atlas? Creating dynamic atlases like that isn't easy since texture-packing is not a two-way thing. Conversely if you leave surface-created images outside of texture atlases, then while you'll suffer some performance loss drawing surface-created images, you at least don't run the risk of a lot of unnecessary atlas management that could slow surface generation down.

Or perhaps, better yet, surface-created images don't share an atlas with loaded images. But then having this special kind of atlas would still incur surface generation penalties as you muck with it (by adding and removing). You don't want to bloat up atlases for no real reason.

I might have to benchmark both when the time comes. It's important to find a fast method here since some surface-created images *do* have fundamental uses (besides just being 'middlemen'). Say you have a game where the main character's sprite is paper-dolled. To do this you create a bunch of surfaces, then compile them into one surface in the end. For the surfaces you used to build the one image, you don't need them to persist in an atlas. But the single image you finish with should go into an atlas for fast blitting. Now, of course by calling .createImage() you are intending to keep it for longer, so I guess that's the best hint to say the intended image gets "atlased". But it still can't stop someone from abusing the feature (using many surface-created images).

Edit:
I've been thinking, in SSFML, I could add a new blit method: "WrapBlit" this could come in handy for images that tile... The API would be like:

Code: (javascript) [Select]

image.wrapBlit(x, y, u, v)

where x and y is where it goes and u, v, is the texture wrapping part. I made a space game demo in XNA and for the nebula backgrounds I had huge, beautifully tiled images that could wrap easily from the fact the u, v can repeat (it's far, far, far more efficient than drawing 9 images and looping them around). But I can't do this at all with texture atlases. :/

Since, though, it is not a feature Sphere has I might propose this API:

Code: (javascript) [Select]


var wrapped_image = new WrappedImage();
wrapped_image.blit(x, y, u, v);

Then it doesn't have to get sent to an atlas and can be ignored if you target vanilla Sphere.

(Also useful for parallax backgrounds)

Edit:
I moved windowstyles over to a partial system. The corners are drawn fast, but the sides and background image are not. They must use un-atlased textures, and cannot be batched since their u,v's must be repeated. The hybrid system only gains another hundred fps, which is good but not the best.

I think there is a way to take the representation of a windowstyle and forgo the u,v approach altogether and generate an actual windowstyle mesh, physically repeating the side images (yes I'm talking about looping and repeating the texture). I just hope this is faster than the current method (which is certainly not bad). I mean, I get to use a full atlas rather than a partial atlas, and vertices are fast to generate.

Edit: (again)
I did the benchmarking and it turns out the hybrid approach is the best on windowstyles. The u,v texturing on sides and bg image give a huge speed boost and accurately clips the images. The vertex mesh is good, but starts failing if there are a lot of images, say, in the background.

January 22, 2014, 03:15:32 am

So my last post was getting kinda long, it's just me logging my progress and asking questions.

So, in Sphere 1.5 GL drawing a windowstyle that filled a 320*240 screen that had a 1px background image put the fps to 13. In SSFML the fps was unchanged at 8800 fps. That is 676 times faster.

This is important sinces games use windowstyles. My old trick to speed up games in Sphere GL was to use gigantic tiling sprites in a windowstyle. In SSFML it's good to know I don't have to. I want to know how FJ_GL handles windowstyles, because I know SphereGL implements them wrong.

In Sphere 1.5 Standard32, drawing that insane windowstyle took 770fps. Which means SSFML is only 11.5 times faster.
In Sphere 1.6 Standard32, drawing that insane windowstyle took 850 fps. Which means SSFML is only 10.4 times faster.

But I think the takeaway here is SSFML draws windowstyles at constant speed, while the underlying image sizes vary the performance of vanilla Sphere.

Edit:
FPS of drawing 100 Rectangle Primitives (320*240, 100 32*24 rects) in each version:
SSFML: 8800 - fixed 10000
Sphere 1.5 GL: 7900
Sphere 1.5 32: 1700
Sphere 1.6 32: 2800

Gradiented:
SSFML: 5900 (why?) - fixed: 9900
Sphere 1.5 GL: 6400
Sphere 1.6 32: 500
Sphere 1.6 32: 430

So that means SphereGL has pretty fast primitive rendering (better than SSFML). SSFML's faster now due to sprite-batching primitives, but it used to be much slower (at 6800 fps). It only batches rectangles right now since it was set up for rectangle-textures.

~~SSFML should have gradient's as fast as non-gradients so something is weird...~~

edit:
Ok, I have to toss out all of my results, everything is much faster now. Everything. I was storing values in the color object wrong.

See, colors take bytes and JS's Number object is sometimes a double, or an int and I was doing all kinds of crazy conversions to make sure it was an int and then make sure that int was a byte. Now I streamlined it without conversions and now most everything renders at 10000fps. I solved it by adding getters/setters to the color object on the C# side of things. But now there is a strong performance hit if all 4 fields were being modified each second. Thankfully, that doesn't happen often.

Edit:
Scratch that. There is a way to have efficient property getters in Jurassic but it makes color object etremely expensive to create. Extremely expensive. I have a pixel-perfect collision detector and it must check all kinds of colors, in all Sphere versions it was fast enough, but in SSFML it grinded the computer to a halt. I remove the "efficient" property getters and the speed returns - at the expense of some slowdown to reconstruct colors.

What I need to solve is this:

Code: (idea) [Select]


JS Object Property -> Byte

It's definitely not easy since JS Object Property might not be a number and so you error. It might be a double, or it might be an int. Problem is, Jurassic only tells you the property is of type 'object'. :/ It boxes a value true enough, but the check code is always yucky to write:

Code: (Csharp) [Select]


if (object["red"] is int) _color.R = (byte)((int)object["red"]);
else _color.R = (byte)((double)object["red"]);

There is a faster way of doing this in Jurassic, but it makes the color object 1000 times slower to create, but 10 times faster to use.

Code: (Csharp) [Select]


// ... in declaration
[JSProperty(Name = "red")]
public int Red { get; set; }

//... later on:

_color.R = (byte)red; // red is a Jurassic JSField

January 23, 2014, 02:27:26 am

Quote from: Radnen on January 22, 2014, 03:15:32 am

So my last post was getting kinda long, it's just me logging my progress and asking questions.

So, in Sphere 1.5 GL drawing a windowstyle that filled a 320*240 screen that had a 1px background image put the fps to 13. In SSFML the fps was unchanged at 8800 fps. That is 676 times faster. This is important sinces games use windowstyles. My old trick to speed up games in Sphere GL was to use gigantic tiling sprites in a windowstyle. In SSFML it's good to know I don't have to. I want to know how FJ_GL handles windowstyles, because I know SphereGL implements them wrong.

FJ-GL is about 10-20% better than Sphere-GL (which is just generally true of the two). And it's not the video drivers fault, it's Sphere's. The way it really should be rendering windowstyles is used greater-than-one texture coordinates to tile the texture with a single API call. But it can't, since Sphere itself has no concept of texture coordinates, that's up to the video driver. So, using the video driver API, you cannot draw windowstyles in any way other than calling BlitImage for every time the background and each edge component is tiled. What it really should be doing, in that case, is have video drivers that have a BlitImageTiled procedures, and then if there is no way to optimize tiled drawing in that backend, the video driver would just tile it with a bunch of BlitImage calls (as Sphere forces it to do anyway). Perhaps there is some optimization for DirectX I'm not seeing (probably because I know almost no DIrectX), but it's the worst way to call an OpenGL-based backend. WindowStyles show perfectly the limitation of Sphere's video driver API--tiled image blits could speed up the map engine, as well.

Of course, a stronger relationship between the video backend and the video procedures would help in every way as well. Sphere's 'plugin' API is only just high level enough to let it use OpenGL, DirectX, and SDL. A higher level separation, with more procedures in the video backend would let many, many more optimizations and simplifications be possible. It wouldn't even have to be anywhere near as much as in TurboSphere--the 'engine' there does almost nothing but load plugins and manage V8. The advantage is that every plugin has full control of the script functions, from script to the GPU or filesystem or audio API, the plugin knows what is happening and can have any optimization possible in it. It comes close to being as powerful, in that respect, as a purely monolithic architecture.

January 23, 2014, 02:35:53 am

I'm running tun's startup game with a fair list of games to see how well it handles. By far it's quite badly coded, yet is a good worst-case code candidate for speed.

In SSFML 0.7.5 I get 64 fps
In SSFML 0.8.0 I get 265 fps
In SSFML 0.8.1 I get 300 fps
In Sphere 1.6 32: I get 750 fps
In Sphere 1.5 GL: I get 640 fps
In Sphere 1.5 32: I get 700 fps

I still have a ways to go, but my next step is to try and move to hardware textures.

January 23, 2014, 02:52:18 am

On Linux:
FJ-GL gets 262 FPS
Sphere-GL gets 213 FPS

It's spamming DirectBlits (aka surface.blit) to the video driver. I have no idea at what point in time that was a fast thing to do. I would imagine the best thing to improve the performance of Tung's startup game would be to have surfaces always also exist as textures, and then reupload the texture to GL when surface.blit is called, but only if the texture has changed.

January 23, 2014, 04:00:15 am

Quote from: Flying Jester on January 23, 2014, 02:52:18 am

I would imagine the best thing to improve the performance of Tung's startup game would be to have surfaces always also exist as textures, and then reupload the texture to GL when surface.blit is called, but only if the texture has changed.

I had been doing that since SSFML 0.7.5, it doesn't help. Tun's startup game is redrawing the texture each time which is even worse. In the Standard32 drivers the speed of writing a texture and drawing it to screen is not that bad. But in hardware accelerated drivers, it's really bad. This necessitates having render textures. Do you know how Sphere-GL does it so fast? Because I was hoping SSFML would have been the same there since my current surface code is not doing anything fancy.

In fact I was surprised to see SphereGL at 640 fps, I swear I saw it much lower (around mid 70's to early hundreds) before...

January 23, 2014, 06:38:16 am

Can you at least do a call to glTexSubImage2D, just to limit bandwidth?

Sphere_GL's surface.blit is identical to doing surface.createImage().blit(). From the hallowed source:

Code: (c++) [Select]


    IMAGE i = CreateImage(w, h, pixels);

    if (i)
    {
        BlitImage(i, x, y, CImage32::BLEND);
        DestroyImage(i);

    }

It uploads the surface anew to the GPU every single call. The only OpenGL-based speed-ups possible here would have to be in the OpenGL library itself.

The best reason I can give is that it is so simple that the compiler can optimize it really well, and it has little to do in the first place.[/code]

January 24, 2014, 12:19:36 am

I don't know how the above is any fast. I went to many graphics forums and looked around a lot and that should not work quite so fast. Every frame an image is created then destroyed, that's like doing this in your Sphere game:

Code: (javascript) [Select]


while (true) {
    var image = LoadImage("file.png");
    image.blit(0, 0);
    FlipScreen();
}

Now, I've been caching the surfaces as a texture, and not losing the texture pointer. Whenever I update the bytes I update the texture. I just think C++ has more raw speed than C#, to test this I decided to rewrite large portions of code to use direct pointer accesses into memory and noticed a considerable speedup.

I went from this (replace blend mode):

Code: (csharp) [Select]


    int off = (y * _width + x) << 2;
    _pixels[off + 0] = c.R;
    _pixels[off + 1] = c.G;
    _pixels[off + 2] = c.B;
    _pixels[off + 3] = c.A;

to this:

Code: (csharp) [Select]


    Color* color = (Color*)(buffer + (y * _width + x) * sizeof(Color));
    color->R = c.R;
    color->G = c.G;
    color->B = c.B;
    color->A = c.A;

Went from this (Blend blend mode):

Code: (csharp) [Select]


    int off = (y * _width + x) << 2;
    Color source = GetColorAt(x, y);
    float w0 = (float)dest.A / 255, w1 = 1 - w0;
    _pixels[off + 0] = (byte)(source.R * w1 + dest.R * w0);
    _pixels[off + 1] = (byte)(source.G * w1 + dest.G * w0);
    _pixels[off + 2] = (byte)(source.B * w1 + dest.B * w0);
    _pixels[off + 3] = (byte)(source.A * w1 + dest.A * w0);

To this:

Code: (csharp) [Select]


    Color* color = (Color*)(buffer + (y * _width + x) * sizeof(Color));
    float w0 = (float)c.A / 255, w1 = 1.0f - w0;
    color->R = (byte)(color->R * w1 + c.R * w0);
    color->G = (byte)(color->G * w1 + c.G * w0);
    color->B = (byte)(color->B * w1 + c.B * w0);
    color->A = (byte)(color->A * w1 + c.A * w0);

It's likely faster since I take a full color object a time and modify it rather than just the components at a time. Furthermore, there are a lot more lower level optimizations being made, such as no bounds checking. This is simply called "unsafe" code in C#, but it's technically just like C++.

The FPS of the main screen went from 300 to 500, but I've seen it as high as 650 while screwing around (but it could have just been a fluke).

Spherical forums Community for the Sphere game engine
New? Contact Us to register an account!

News

Topic: Sphere SFML v0.90 (Read 109783 times) previous topic - next topic

Reply #210 – January 20, 2014, 07:09:47 am

Reply #211 – January 21, 2014, 05:03:13 am

Reply #212 – January 21, 2014, 06:28:31 am

Reply #213 – January 21, 2014, 03:20:05 pm

Reply #214 – January 21, 2014, 03:52:51 pm

Reply #215 – January 21, 2014, 04:02:48 pm

Reply #216 – January 21, 2014, 04:16:50 pm

Reply #217 – January 21, 2014, 05:21:03 pm

Reply #218 – January 22, 2014, 03:15:32 am

Reply #219 – January 23, 2014, 02:27:26 am

Reply #220 – January 23, 2014, 02:35:53 am

Reply #221 – January 23, 2014, 02:52:18 am

Reply #222 – January 23, 2014, 04:00:15 am

Reply #223 – January 23, 2014, 06:38:16 am

Reply #224 – January 24, 2014, 12:19:36 am

Spherical forums Community for the Sphere game engineNew? Contact Us to register an account!

News

Topic: Sphere SFML v0.90 (Read 109783 times) previous topic - next topic

Reply #210 – January 20, 2014, 07:09:47 am

Reply #211 – January 21, 2014, 05:03:13 am

Reply #212 – January 21, 2014, 06:28:31 am

Reply #213 – January 21, 2014, 03:20:05 pm

Reply #214 – January 21, 2014, 03:52:51 pm

Reply #215 – January 21, 2014, 04:02:48 pm

Reply #216 – January 21, 2014, 04:16:50 pm

Reply #217 – January 21, 2014, 05:21:03 pm

Reply #218 – January 22, 2014, 03:15:32 am

Reply #219 – January 23, 2014, 02:27:26 am

Reply #220 – January 23, 2014, 02:35:53 am

Reply #221 – January 23, 2014, 02:52:18 am

Reply #222 – January 23, 2014, 04:00:15 am

Reply #223 – January 23, 2014, 06:38:16 am

Reply #224 – January 24, 2014, 12:19:36 am

Spherical forums Community for the Sphere game engine
New? Contact Us to register an account!