Skip to main content

News

Topic: miniSphere 5.0.1 (Read 200030 times) previous topic - next topic

  • Rhuan
  • [*][*][*][*]
Re: miniSphere 4.8.6
Reply #2115
@Rhuan

Color constructor:
https://github.com/fatcerberus/minisphere/blob/master/src/minisphere/pegasus.c#L1483-L1508

CreateColor:
https://github.com/fatcerberus/minisphere/blob/master/src/minisphere/vanilla.c#L1905-L1926

Notice anything that might cause the latter to be significantly slower?
i'm not really a c programmer at all though would like to learn.

My only guesses from just looking at it right now are:

Either;
a) I set up the test wrongly somehow (will recheck shortly)
Or
b) there's something expensive about getting an integer from JS - possible as natural JS numbers are double floats.
Or
c) it's the fact that create color caps the inputs to 255 hence uses several conditionals whereas new color doesn't apply checks - those checks in create color won't take long but they won't take no time

I think the key next step is to run minisphere with some kind of c profiler and find out which jsal functions it's spending time in - not quite sure how to set that up as can't loop them easily and they all take <1 millisecond.
  • Last Edit: September 13, 2017, 01:34:11 pm by Rhuan

  • Fat Cerberus
  • [*][*][*][*][*]
  • Global Moderator
  • miniSphere Developer
Re: miniSphere 4.8.6
Reply #2116
You might be onto something here.  `jsal_get_int()` does clamping to [INT_MIN,INT_MAX] followed by a double-to-int cast:
https://github.com/fatcerberus/minisphere/blob/834a2ce24b2db064e7768cb48805b2093da17273/src/shared/jsal.c#L553-L557

Chakra has a function `JsNumberToInt()`, I should see if using that performs better than doing the cast myself.  It's possible that Chakra is storing integers as actual ints internally, so that calling `JsNumberToDouble()` followed by an int cast involves a round-trip int->double->int, which is going to be expensive especially if you do it for every single function parameter.
miniSphere 5.0.1 - Cell compiler - SSj debugger - thread | on GitHub
For the sake of our continued health I very much hope that Fat Cerberus does not become skilled enough at whatever arcane art it would require to cause computers to spawn enourmous man eating pigs ~Rhuan

  • Rhuan
  • [*][*][*][*]
Re: miniSphere 4.8.6
Reply #2117
I've checked over my test and re-run it; I'm confident that it's right and no change to the above results.

  • Rhuan
  • [*][*][*][*]
Re: miniSphere 4.8.6
Reply #2118
Adding in JSNumbertoInt and JSINttoNumber where possible resulted in somewhere between a 2% and 5% speed increase - probably worth doing but not the major issue.

All profiling points to JSCopyString being the most significant performance bottleneck unfortunately it seems like the majority of api calls rely on it heavily for one thing or another.

  • Rhuan
  • [*][*][*][*]
Re: miniSphere 4.8.6
Reply #2119
So I tried implementing this: https://github.com/Microsoft/ChakraCore/issues/3623

It was a surprisingly simple addition to the Chakra source, the below in JSRT.cpp + a stub in ChakraCore.h
Code: [Select]
//proposed new api to skip string type conversion
CHAKRA_API JSCreatePropertyIDfromJSString(
            _In_ JsValueRef stringValue,
            _Out_ JsPropertyIdRef *propertyId)
{
  const char16* str = nullptr;
  size_t strLength = 0;
  JsErrorCode errorCode = JsStringToPointer(stringValue, &str, &strLength);
  if (errorCode != JsNoError)
  {
    return errorCode;
  }
  return JsGetPropertyIdFromName(str, propertyId);
}

Then using that in make_property_id instead of the JSCopyString call; the result was a 20-25% performance improvement on common api calls; not enough to catch up with the old Duktape yet but it got rid of the majority of the slowdown in the Specs battle system - to the point that I would consider it playable.

I think we either should suggest this addition to ChakraCore OR we should just use it within the miniSPhere code base by using JSStringtoPointer - technically JSStringtoPointer is a windows only api BUT CC code uses it internally more than once so it is clearly functional on other platforms just as long as one is careful with sizes.

To optimise further I think it's worth tidying up uses of int and double; my brief testing earlier suggested that letting CC do the type conversion saves a few CPU cycles over doing type conversion in the miniSphere code.

I also think that all uses of JSCopyString should be examined, that function is just anti-speed.

  • Fat Cerberus
  • [*][*][*][*][*]
  • Global Moderator
  • miniSphere Developer
Re: miniSphere 4.8.6
Reply #2120
20-25% is nothing to sneeze at, maybe it's time for me to send another pull request upstream ;)
miniSphere 5.0.1 - Cell compiler - SSj debugger - thread | on GitHub
For the sake of our continued health I very much hope that Fat Cerberus does not become skilled enough at whatever arcane art it would require to cause computers to spawn enourmous man eating pigs ~Rhuan

  • Fat Cerberus
  • [*][*][*][*][*]
  • Global Moderator
  • miniSphere Developer
Re: miniSphere 4.8.6
Reply #2121
Then using that in make_property_id instead of the JSCopyString call; the result was a 20-25% performance improvement on common api calls; not enough to catch up with the old Duktape yet but it got rid of the majority of the slowdown in the Specs battle system - to the point that I would consider it playable.

I think make_property_id() is only half of the puzzle - by removing the string copy from there, you've solved the problem for strings used as property keys (and 20-25% improvement indicates that happens a lot), but the problem for jsal_get_string() remains.  That comes into play whenever an API takes a string parameter, and is a much more difficult problem to solve.  Essentially the entire engine would need to be refactored to use UTF-16 internally.
miniSphere 5.0.1 - Cell compiler - SSj debugger - thread | on GitHub
For the sake of our continued health I very much hope that Fat Cerberus does not become skilled enough at whatever arcane art it would require to cause computers to spawn enourmous man eating pigs ~Rhuan

  • Rhuan
  • [*][*][*][*]
Re: miniSphere 4.8.6
Reply #2122
Then using that in make_property_id instead of the JSCopyString call; the result was a 20-25% performance improvement on common api calls; not enough to catch up with the old Duktape yet but it got rid of the majority of the slowdown in the Specs battle system - to the point that I would consider it playable.

I think make_property_id() is only half of the puzzle - by removing the string copy from there, you've solved the problem for strings used as property keys (and 20-25% improvement indicates that happens a lot), but the problem for jsal_get_string() remains.  That comes into play whenever an API takes a string parameter, and is a much more difficult problem to solve.  Essentially the entire engine would need to be refactored to use UTF-16 internally.

To slightly improve speed could you use: JsGetStringLength(_In_ JsValueRef value, _Out_ int *length) to size the buffer rather than calling JsCopyString twice?

I think it will be tricky to take it much beyond that. The next slowest function looks like JsAddRef now - this is clearly much much slower than the Duktape equivalent.

  • Fat Cerberus
  • [*][*][*][*][*]
  • Global Moderator
  • miniSphere Developer
Re: miniSphere 4.8.6
Reply #2123
Hm, that might work.  JsGetStringLength() returns the length of the string in utf16 characters (the same as doing str.length in JS), in order to guarantee that will fit when transcoded to utf8 (1-4 bytes per character) one would need to allocate an oversized buffer at 4x the length.  Not very space-efficient, but I guess it would avoid doing a utf16 decode twice so it might improve performance.

My own results re: make_property_id concur with yours: Changing the function to use JsStringToPointer instead of JsCopyString took CPU usage in Specs from 16% with the console open down to ~12%, and the framerate drops almost completely went away.  So I think if we could avoid ever calling JsCopyString at all, we'd probably get pretty close the Duktape results for performance of API calls.

I don't understand why JsAddRef() is expensive though: In theory it should just be incrementing a reference counter which would be practically free.  I guess it'd be interesting to compile a debug build of ChakraCore and step into that function to see what it's actually doing under the hood...
miniSphere 5.0.1 - Cell compiler - SSj debugger - thread | on GitHub
For the sake of our continued health I very much hope that Fat Cerberus does not become skilled enough at whatever arcane art it would require to cause computers to spawn enourmous man eating pigs ~Rhuan

  • Fat Cerberus
  • [*][*][*][*][*]
  • Global Moderator
  • miniSphere Developer
Re: miniSphere 4.8.6
Reply #2124
I figured out why VertexList ctor got slower despite not dealing with strings at all: JSAL handled array indices by doing sprintf("%d", index) before calling jsal_get_prop.  That's unnecessary overhead since CC has JsGetIndexedProperty for exactly this purpose.  Growing pains, as always when learning a new library... :-[

It seems there's a lot of opportunity for optimization in JSAL itself, this is the price I pay for slapping it all together in a week while at the same time being a complete newcomer to Chakra's API :P
  • Last Edit: September 14, 2017, 02:14:32 am by Fat Cerberus
miniSphere 5.0.1 - Cell compiler - SSj debugger - thread | on GitHub
For the sake of our continued health I very much hope that Fat Cerberus does not become skilled enough at whatever arcane art it would require to cause computers to spawn enourmous man eating pigs ~Rhuan

  • Rhuan
  • [*][*][*][*]
Re: miniSphere 4.8.6
Reply #2125
More to be done but signifcant progress being made, see attached latest results of the same speed tests for various ChakraCore miniSphere functions, whilst these are still slower than the duktape equivalents they're a lot better than they were and I believe Fat Cerberus is still working on another significant optimisation.

I would also like to highlight a different point, sure the sphere api functions are a currently a little slower in the CC version BUT the pure javascript execution is faster to a crazy degree; basically with the chakracore version of miniSphere the only measurable time taken to execute your game code with almost any test we've managed to give it is the api functions, any logic you use and any JS built ins are close enough to instant that measuring them is almost pointless (unless you have a loop run 1,000,000 times, though even then it won't exactly take long).

Who'd have thought that a change to string handling mechanisms would have been so significant. (Duktape's internals use UTF8 strings whereas ChakraCore's use UTF16 strings - a key aspect of miniSphere's API was built around sending messages back and forth as javascript strings this meant that with the CC implementation there are now lots of utf8->utf16->utf8 conversions which were the key cause of the slowdown)
  • Last Edit: September 14, 2017, 02:58:24 pm by Rhuan

  • Fat Cerberus
  • [*][*][*][*][*]
  • Global Moderator
  • miniSphere Developer
Re: miniSphere 4.8.6
Reply #2126
Thanks to @Rhuan's help, we've managed to majorly optimize miniSphere's API layer; on my laptop (6th gen i7 quad), API call speed has improved fourfold.  See image attached.
miniSphere 5.0.1 - Cell compiler - SSj debugger - thread | on GitHub
For the sake of our continued health I very much hope that Fat Cerberus does not become skilled enough at whatever arcane art it would require to cause computers to spawn enourmous man eating pigs ~Rhuan

  • Fat Cerberus
  • [*][*][*][*][*]
  • Global Moderator
  • miniSphere Developer
Re: miniSphere 4.8.7
Reply #2127
4.8.7 fixes a bad bug where Sphere 1.x APIs won't accept a path beginning with ~/ (which unlike v2, is relative to the game manifest) and will instead throw "no save ID defined".
miniSphere 5.0.1 - Cell compiler - SSj debugger - thread | on GitHub
For the sake of our continued health I very much hope that Fat Cerberus does not become skilled enough at whatever arcane art it would require to cause computers to spawn enourmous man eating pigs ~Rhuan

  • Fat Cerberus
  • [*][*][*][*][*]
  • Global Moderator
  • miniSphere Developer
Re: miniSphere 5.0b1 (stable: 4.8.7)
Reply #2128
The first beta version of miniSphere 5.0 is available now to allow people to test out the native ES6 support and vastly improved JavaScript performance! :D
miniSphere 5.0.1 - Cell compiler - SSj debugger - thread | on GitHub
For the sake of our continued health I very much hope that Fat Cerberus does not become skilled enough at whatever arcane art it would require to cause computers to spawn enourmous man eating pigs ~Rhuan

  • Rhuan
  • [*][*][*][*]
Re: miniSphere 5.0b1 (stable: 4.8.7)
Reply #2129
Thanks to @Rhuan's help, we've managed to majorly optimize miniSphere's API layer; on my laptop (6th gen i7 quad), API call speed has improved fourfold.  See image attached.
In case anyone's worried those are not the times per call but rather the times to call each function 200,000 times (except for new VertexBuffer which is called only 20,000 times as it's so much slower than the others).