Skip to main content

News

Topic: neoSphere 5.9.2 (Read 522039 times) previous topic - next topic

0 Members and 24 Guests are viewing this topic.
  • Radnen
  • [*][*][*][*][*]
  • Senior Staff
  • Wise Warrior
Re: minisphere 1.1b3 (stable: 1.0.10)
Reply #450

Edit: Yep, same issue with the slashes--simply slashing at thin air creates and destroys nearly 100 bitmaps.  Sorry to say, this game is VERY poorly optimized.


I'm a bit unnerved, really. I honestly don't know what to say. But I'll get to that in a sec, first an explanation. I'll be gentle, don't worry.

So I have no idea how thin air slashing does that. I mean it's just COMMAND_ANIMATE frames. I mean, I did nothing outside the bounds of the map engine for the action system (except the pixel collision detector). I use Pythagorean distance first to check if there are frames to collide with before this whole hundred-sprite-creation thing begins to happen

Are you saying this, on thin air creates and destroys 100 bitmaps?
Code: (javascript) [Select]

        // search and return a list of enemies in a 16 pixel radius at the front of the character. This is Attack()
        var list = this.getNearest(16, player.xv*16, player.yv*16);
        while (i--) {
            enemy = list[i];
            if (this.check(this.input, enemy.name)) { // call the check method
                //...
            }
        }


Thin air slashing ought to put the sword well away from the nearest enemy sprite. And how on Earth could it create a hundred to a thousand frames? Look at the code and think about it. I call the attack script only once on frame change.

Code: (javascript) [Select]

            // search for and add functions to certain frames.
            dir = "attack" + dirs[i];
            for (var n = 0; n < 7; ++n) {
                if (n == 4)
                    this.actionmanager.addFrameCall(dir, n, Cut);    // frame 4: cut
                else
                    this.actionmanager.addFrameCall(dir, n, Attack); // all others: check collision on Attack (see above)
            }

// later on in actionmanager:
// LOOK at it. It only does actions on frame switch: (because a sprite frame can last up to 8 screen frames)
var frame = GetPersonFrame(this.attached);
if (this.frame != frame) {
var index = GetPersonDirection(this.attached) + ":" + frame;
if (this.framecalls[index]) this.framecalls[index]();
this.frame = frame;
}


Technically my code only has to create bitmaps 2 times per frame (your source and the opponent source). I mean, I'm sure it'll create no more than 10 bitmaps max. I think your engine needs some work here. I mean, there are at least 100 to 1000 pixels to zip through on a dead collision check call, if it occurs. Are you telling me that each time I get a pixel your engine must create and destroy a bitmap? You have power over that my friend, not me. Especially if blank calls to COMMAND_ANIMATE does it. Well, technically running in your engine I can try to mitigate that, but at this point I'm not sure what's possible. I can cache the 2 bitmaps, but then I'm saving these 10 creations each time, the enemy constantly changes, mind you. So I could cache the payer. But if it's really each time a pixel is returned...I'm not sure. I don't want to have to lug around huge caches of enemies and players, or heck, maintain a cache pool if I can help it.

The "unoptimized section"
Code: (javascript) [Select]

        var ss = GetPersonSpriteset(player);
        var dir = ss.directions[GetDirectionNum(ss, GetPersonDirection(player))];
        var frame = dir.frames[GetPersonFrame(player)];
        var image = ss.images[frame.index].createSurface();
       
        // person B:
        ss = GetPersonSpriteset(who);
        dir = ss.directions[GetDirectionNum(ss, GetPersonDirection(who))];
        frame = dir.frames[GetPersonFrame(who)];
        var image2 = ss.images[frame.index].createSurface();


Above surfaces are created but only 2. Only 2 new bitmaps are created per call of check. Per the frames it's called on. Which is precisely frames 1 through 6 except 4. So 5 frames, which is no more than 10 bitmaps being created and thrown away. I check in SSFML, and yep, only 10 bitmaps. There is no lag. I usually don't try to optimize if there's nothing too crazy to optimize.  The code I use here is also industry standard (using pythos first, then making sure you only check an intersection of bounding boxes. In fact I doubt more than 256 pixels are ever checked!!). It's really industry leading. It's quite good, my friend of friends. Assuming of course bitmap creation is relatively cheap. If it's not, I can fix this, and you're right about it being unoptimized, but I'd say only slightly so. Because it's either this or cache pooling, and do I really want to do that if this works well enough? I mean each frame the enemy and player changes, and then each direction, for each enemy. There are so many states each bitmap can be in at any given time I'd rather reconstruct than cache, especially if reconstructing like it is in SSFML and Sphere1.5 is as dang fast as it is.

Because...

I only optimize code I've profiled for, looking for bottlenecks. ;) It's really what any sensible programmer would do. You saying the game is not very optimized kinda stings since I strive for that, but only in areas where I needed it. I say your engine is what's unoptimized here buddy since my SSFML handles this even better than Sphere's. The difference being I've used SW bitmaps. Now, I've moved on to HW sprites and noticed compositing was really fast but simple things like updating and creating new bitmaps as well as getting pixels was really slow. So I'm moving back to SW for good even if it means it's "slower".

Edit:
I profiled getPixel in SSFML and between 6000 to 2000 pixels are checked. Which is fine. Worst case scenario is 48*48*10 which is 23000ish pixels. Swinging empty air does nothing. No pixels, no bitmaps, nada.
  • Last Edit: May 04, 2015, 03:29:31 am by Radnen
If you use code to help you code you can use less code to code. Also, I have approximate knowledge of many things.

Sphere-sfml here
Sphere Studio editor here

  • Fat Cerberus
  • [*][*][*][*][*]
  • Global Moderator
  • Sphere Developer
Re: minisphere 1.1b3 (stable: 1.0.10)
Reply #451
Sorry, I didn't mean to offend you--the game itself really is awesome, and the thousand bitmaps shocked me, too.  I'm only able to go by what the few logging calls I've added, but I honestly have no idea why it's creating so many.  Hell, I can just walk around aimlessly and get about 5-10 "created image via clone" log entries per second.  It makes no sense, because this doesn't happen in Specs or any other Sphere game I've tested.  Just yours.  Blockman is doing something differently but I don't know what yet.  I don't doubt the issue is on my end, though.

What's weird is that it's NOT generating multiple images for the actual pixel check--a successful attack generates a long string of pixelcache hits on the same two images.  It's just that it seems to be creating a lot of them in-between each check.  Weird.

Totally agreed on not making premature optimizations, don't take that the wrong way.  I myself advocate the "don't do it" school of optimization--it's the reason minisphere's codebase is still so clean. :D  I admit I jumped to conclusions when I said the game was unoptimized, but as I said, this is literally the ONLY game I've tested that causes so many images to be generated in such a short timeframe.  Out of probably a hundred games, both big and small.  "Engine bug" was naturally the last thing I would have considered.

So yeah, I thank you for clarifying the under the hood operation--I will now run the MSVC profiler to figure out where all the images are being generated.
neoSphere 5.9.2 - neoSphere engine - Cell compiler - SSj debugger
forum thread | on GitHub

  • Fat Cerberus
  • [*][*][*][*][*]
  • Global Moderator
  • Sphere Developer
Re: minisphere 1.1b3 (stable: 1.0.10)
Reply #452
So, I found part of the problem.  Take a look at this line in clone_spriteset():
Code: (c) [Select]
if ((clone->images[i] = clone_image(spriteset->images[i])) == NULL) goto on_error;


Spriteset bitmaps are never written to (only replaced wholesale), so the image cloning was completely unnecessary.  I changed the offending clone_image() call to ref_image() and that sped things up a ton.  My bad! :-[

Now, if you're wondering why I'm cloning spritesets at all, well, this calls for some background: As you know, I strive for compatibility, and the semantics of Get/SetPersonSpriteset() in Sphere are... odd, to say the least.  If you call GetPersonSpriteset() and change stuff around (replacing images, directions, etc.), the changes aren't reflected immediately--you have to call SetPersonSpriteset() to commit the changes.  And then, if you make further changes after that, you have to set it again.  This is ridiculous and completely counterintuitive, but I have to emulate it to remain compatible in odd corner cases (Aquatis was a wonderful source of these), and doing so means cloning the spriteset every time someone calls either function.  And since you use GetPersonSpriteset() a lot... yeah.

One thing I still notice, even with this change, is that I get a lot of this when just walking around aimlessly:
Code: (text) [Select]
[image 5416] Read 16 x 16 image from file
[image 5417] Read 16 x 16 image from file
[image 5418] Read 16 x 16 image from file
[image 5419] Read 16 x 16 image from file
[image 5420] Read 16 x 16 image from file
[image 5421] Read 16 x 16 image from file
[image 5422] Read 16 x 16 image from file
[image 5423] Read 16 x 16 image from file
[image 5424] Read 16 x 16 image from file
[image 5425] Read 16 x 16 image from file
[image 5426] Read 16 x 16 image from file
[image 5427] Read 16 x 16 image from file


To clarify, those are direct reads of an embedded image from an open file (such as happens when loading a tileset or spriteset).  Not sure what's up with that, probably something else stupid I'm doing somewhere.

There is a silver lining, of course... I guess it's a sign of a mature project when you to get to the stage of having to deal with low-level issues like this. :)
  • Last Edit: May 04, 2015, 07:20:51 pm by Lord English
neoSphere 5.9.2 - neoSphere engine - Cell compiler - SSj debugger
forum thread | on GitHub

Re: minisphere 1.1b3 (stable: 1.0.10)
Reply #453

Moving on... I added pixel caching for Surface:getPixel().  It definitely helped, but it's still ungodly slow.  It might help if you stopped creating a surface anew with each collision check.  Allegro stores all bitmaps on the GPU, so each time you call Image:createSurface()


This is why Surfaces are meant to be totally in software. They are supposed to be relatively quick to create, destroy, and manipulate, much faster than Images.

  • Fat Cerberus
  • [*][*][*][*][*]
  • Global Moderator
  • Sphere Developer
Re: minisphere 1.1b3 (stable: 1.0.10)
Reply #454


Moving on... I added pixel caching for Surface:getPixel().  It definitely helped, but it's still ungodly slow.  It might help if you stopped creating a surface anew with each collision check.  Allegro stores all bitmaps on the GPU, so each time you call Image:createSurface()


This is why Surfaces are meant to be totally in software. They are supposed to be relatively quick to create, destroy, and manipulate, much faster than Images.


It would be literally trivial for me to have Allegro create surface bitmaps in memory... but wouldn't blitting Images to Surfaces then be SLOW since you're crossing the hardware boundary?  I guess I should do some tests, thanks for the tip.

Edit: Yep, that did the trick!  No pixel caching needed! ;D (scratch that, the pixel caching is still needed even with s/w surfaces--apparently al_get_pixel() is slow no matter what.

@Radnen: In case it wasn't clear, your getPixel() woes are fixed in the latest build.  Now I just have to tackle that windowstyle issue...
  • Last Edit: May 04, 2015, 02:00:59 pm by Lord English
neoSphere 5.9.2 - neoSphere engine - Cell compiler - SSj debugger
forum thread | on GitHub

Re: minisphere 1.1b3 (stable: 1.0.10)
Reply #455

It would be literally trivial for me to have Allegro create surface bitmaps in memory... but wouldn't blitting Images to Surfaces then be SLOW since you're crossing the hardware boundary?  I guess I should do some tests, thanks for the tip.


Drawing Surfaces is OK to be slow. It's notably slower drawing Surfaces than Images in Sphere 1.5.
That is one of the tradeoffs of Surfaces, and why they are different than Images. Surfaces should be quick to read and write and cheaper to manage, Images should be fast to draw. Other performance aspects are generally irrelevant.

  • Fat Cerberus
  • [*][*][*][*][*]
  • Global Moderator
  • Sphere Developer
Re: minisphere 1.1b3 (stable: 1.0.10)
Reply #456
@Flying Jester
I knew there had to be a fundamental difference between the two (surface and image) beyond the superficial "you can write to one but not the other".  I just couldn't figure out what until you specifically pointed it out.  Luckily the way Allegro is set up, it was trivial to have it allocate surfaces in system memory instead of on the GPU.  Internally it's still an image_t structure, just that the underlying bitmap is created with different flags.

Honestly I'm pretty dumb for not figuring this out sooner.  There's a reason there are methods to convert between the two types!
neoSphere 5.9.2 - neoSphere engine - Cell compiler - SSj debugger
forum thread | on GitHub

  • Radnen
  • [*][*][*][*][*]
  • Senior Staff
  • Wise Warrior
Re: minisphere 1.1b3 (stable: 1.0.10)
Reply #457

but as I said, this is literally the ONLY game I've tested that causes so many images to be generated in such a short timeframe.  Out of probably a hundred games, both big and small.


I'm glad it was a good test case then. :) My game is really the only mature Sphere ARPG. Other games even Aquatis and T&E use separate views for their battles (take place in turn based arenas) so spriteset related action based things might not happen in the same manner. And I doubt they use pixel perfect collision to the degree that I do.

BTW Lord English I do understand in Sphere and am willing to accept that drawing surfaces and converting to surfaces is expected to be slow and is on the users end to mitigate those calls with their own crafty solutions. But creating a few surfaces here and there ought not to be slow. The next test in Blockman is entering a cave and seeing if the fullscreen darkness "shader" plummets the fps. I'm not the only game to do this since I used SDHawk's (blast from the past) advice on it.

Edit: BTW, running around in Blockman does create dust plumes under his feet, these can be what triggers those surface creations when walking around. Though I don't know why surfaces are created... It does load a new spriteset each time rather than reuse one. It's a stupid Sphere thing since CreatePerson forces you to load a spriteset per entity. I cache spritesets in SSFML, but it has the issue that perhaps each entity can share modifications made against them. Though I haven't seen that in the games I've tested, most games usually copy or reload from file a new spriteset via LoadSpriteset() whenever they modify a users sprite.
  • Last Edit: May 04, 2015, 08:17:44 pm by Radnen
If you use code to help you code you can use less code to code. Also, I have approximate knowledge of many things.

Sphere-sfml here
Sphere Studio editor here

  • Fat Cerberus
  • [*][*][*][*][*]
  • Global Moderator
  • Sphere Developer
Re: minisphere 1.1b3 (stable: 1.0.10)
Reply #458
They aren't surfaces created, they are images, read as raw RGBA data from an open file.  Three things do this: load_tileset(), load_spriteset() and load_windowstyle().  So yeah, thanks for the tip on the dust plumes, I hadn't noticed those.  :)  There is a way you can avoid recreating new entities every time, just use SetPersonVisible and reuse them... But I agree it's pretty dumb that it doesn't let you pass in a Spriteset object to CreatePerson().  I think I'll go implement that now.
  • Last Edit: May 04, 2015, 09:31:38 pm by Lord English
neoSphere 5.9.2 - neoSphere engine - Cell compiler - SSj debugger
forum thread | on GitHub

  • Radnen
  • [*][*][*][*][*]
  • Senior Staff
  • Wise Warrior
Re: minisphere 1.1b3 (stable: 1.0.10)
Reply #459

There is a way you can avoid recreating new entities every time, just use SetPersonVisible and reuse them...


Neat idea, but you can have more than 2 dust plumes on screen so I'd have to craftily cycle anywhere between 2 to 4 them and that's a kind of complexity I didn't want to do. It seems that there is one pervasive issue with Sphere's API. Without client-side pooling and caching most calls to the Sphere API copy and clone data with reckless abandon. It's either you trust the underlying engine does a good job handling that for you, or you write lots of boilerplate whenever you need some simple effect.


... it's pretty dumb that it doesn't let you pass in a Spriteset object to CreatePerson().  I think I'll go implement that now.


That's a good step in the right direction. Oftentimes and in most other game engines many duplicates of objects like bushes, grass, footsteps in sand or snow, coins, etc., reuse the same asset to reduce the memory footprint. So, being able to directly tell Sphere that we intend to reuse the same spriteset in any number of entities can really aid in speeding things up. Creating and destroying entities is usually a cheap thing to do, I mean it really only encodes a few properties (mainly positional, nothing a matrix can't solve, really).

Also, imagine drawing to screen 30 entities with the same sprite atlas. You'll have sped drawing up considerably since you aren't changing the source texture each time. Games like DaVince's Sir Boingers can be sped up with this technique.

Now, I'm not sure here but Sphere has to cache Spritesets like my SSFML does. I can't imagine loading the same spriteset 300 times in a giant room with lots of coins in it. But who knows? It might do that? I'm not sure, I'll have to profile a demo.
  • Last Edit: May 05, 2015, 12:17:53 am by Radnen
If you use code to help you code you can use less code to code. Also, I have approximate knowledge of many things.

Sphere-sfml here
Sphere Studio editor here

  • Radnen
  • [*][*][*][*][*]
  • Senior Staff
  • Wise Warrior
Re: minisphere 1.1b3 (stable: 1.0.10)
Reply #460
Results of my test:

Loading a map with 50 non-trivial entities. Same spriteset!
1. SSFML (30ms, then 5ms repeatedly)
2. Sphere1.5 (6ms, then 4ms repeatedly)
3. Minisphere (300ms then, 190ms repeatedly)

With 1 entity:
1. SSFML (30ms, then 5ms repeatedly)
2. Sphere1.5 (6ms, then 4ms repeatedly)
3. Minisphere (10ms, then 5ms repeatedly)

I say the hit with our engines is any sprite batching, vertex mapping, or texture appropriations due to HW acceleration methods.

Edit
With 2 different spritesets the numbers went up linearly initially. Only adding different spritests did Sphere 1.5 take longer to load. Repeating the same spriteset added no time.
  • Last Edit: May 05, 2015, 01:06:38 am by Radnen
If you use code to help you code you can use less code to code. Also, I have approximate knowledge of many things.

Sphere-sfml here
Sphere Studio editor here

  • Fat Cerberus
  • [*][*][*][*][*]
  • Global Moderator
  • Sphere Developer
Re: minisphere 1.1b3 (stable: 1.0.10)
Reply #461


... it's pretty dumb that it doesn't let you pass in a Spriteset object to CreatePerson().  I think I'll go implement that now.


That's a good step in the right direction. Oftentimes and in most other game engines many duplicates of objects like bushes, grass, footsteps in sand or snow, coins, etc., reuse the same asset to reduce the memory footprint. So, being able to directly tell Sphere that we intend to reuse the same spriteset in any number of entities can really aid in speeding things up. Creating and destroying entities is usually a cheap thing to do, I mean it really only encodes a few properties (mainly positional, nothing a matrix can't solve, really).


Done.  It a rather simple change, actually.  Just some additional bookkeeping, and even that wasn't really an issue either as spritesets are already refcounted internally.

While we're on the subject of profiling and performance, you know how Blockman takes an coon's age to load fonts under minisphere on startup?  I fixed it.  Turns out the reason it was so painfully slow was because I was locking and unlocking each individual subimage (fonts are atlased) when loading characters.  That's 256 lock/unlock cycles, meaning data has to be uploaded to GPU every single time.  Now I lock the entire atlas ONCE, load all the glyphs, and then unlock it.  That's only one GPU upload.  The "Loading Fonts" text goes by so fast now you can barely even see it. ;D

Now those slow map loads have me wondering...

Edit: Okay, map loads are much faster now too, after an edit to the tileset loader.  It was the same issue as with fonts, the issue was just less pronounced as most tilesets don't have 200+ tiles in them and even then, you're only loading one at a time.  I should make a similar change the spriteset loading code, but it will be more difficult as spritesets aren't atlased.  At the time, it didn't seem worth it to atlas them since you're only usually drawing one image from a spriteset at a time.  But I might just do it anyway now to speed up map loading...
  • Last Edit: May 05, 2015, 01:58:58 am by Lord English
neoSphere 5.9.2 - neoSphere engine - Cell compiler - SSj debugger
forum thread | on GitHub

  • Fat Cerberus
  • [*][*][*][*][*]
  • Global Moderator
  • Sphere Developer
Re: minisphere 1.1b3 (stable: 1.0.10)
Reply #462
I just refactored minisphere's image locking.  It's safe to nest locks, any lock requests past the first just reuse the existing lock, which is refcounted.  Allegro would just outright refuse to lock the bitmap a second time, which is how my atlased loaders ended up so slow.  Now I can lock the atlas at the beginning of the loader, and the individual read_subimage() calls will just reuse the existing lock.

This is awesome, as now I can easily implement spriteset atlasing. :D

Edit: Not only have I implemented sprite atlasing, but I also went ahead and added a spriteset caching feature.  Radnen, could you run your map load tests again on the latest source?  I'd be curious to see the results.
  • Last Edit: May 05, 2015, 04:19:14 pm by Lord English
neoSphere 5.9.2 - neoSphere engine - Cell compiler - SSj debugger
forum thread | on GitHub

  • Fat Cerberus
  • [*][*][*][*][*]
  • Global Moderator
  • Sphere Developer
Re: minisphere 1.1b3 (stable: 1.0.10)
Reply #463
@Radnen
Figured out your windowstyle issue.  It's an Allegro limitation, and unfortunately, it's not a bug.

From Allegro 5.1.9 ogl_bitmap.c:
Code: (c,7,8) [Select]
   /* This used to be an iOS/Android only workaround - but
    * Intel is making GPUs with the same chips now. Very
    * small textures can have garbage pixels and FBOs don't
    * work with them on some of these chips. This is a
    * workaround.
    */
   if (true_w < 16) true_w = 16;
   if (true_h < 16) true_h = 16;


Unfortunately this means windowstyles can't be tiled in hardware--I have to do it manually and take the performance hit.

Alternatively, I could comment out those two lines and see what happens.  I have an Intel GPU laptop (Core i3), so I'll know if anything's amiss.
  • Last Edit: May 05, 2015, 08:17:49 pm by Lord English
neoSphere 5.9.2 - neoSphere engine - Cell compiler - SSj debugger
forum thread | on GitHub

  • Radnen
  • [*][*][*][*][*]
  • Senior Staff
  • Wise Warrior
Re: minisphere 1.1b3 (stable: 1.0.10)
Reply #464

Unfortunately this means windowstyles can't be tiled in hardware--I have to do it manually and take the performance hit.


Tile it in hardware anyways. If you witness sizes less than 16, switch to manual. But anyways 16 was the sphere windowstyle default. I usually use that size and larger, but sometimes things slip through.
If you use code to help you code you can use less code to code. Also, I have approximate knowledge of many things.

Sphere-sfml here
Sphere Studio editor here