Skip to main content

News

Topic: neoSphere 5.9.2 (Read 522516 times) previous topic - next topic

0 Members and 21 Guests are viewing this topic.
  • Rhuan
  • [*][*][*][*]
Re: miniSphere 4.8.4
Reply #2025
Unless we're dealing with download delays or actually have multiple-threads I don't see much benefit as we can't do two things at once and generally need everything to happen sequentially anyway.

I'm afraid I don't know where to begin in debugging the crashes I've posted about above, any ideas?

  • Fat Cerberus
  • [*][*][*][*][*]
  • Global Moderator
  • Sphere Developer
Re: miniSphere 4.8.4
Reply #2026
Unless we're dealing with download delays or actually have multiple-threads I don't see much benefit as we can't do two things at once and generally need everything to happen sequentially anyway.

Not necessarily.  Dispatch.now() and its siblings exist for a reason.  await is basically that, except an entire function is returning early ("yielding"), and the thing being dispatched is "continue running the paused function".  Promises are the mechanism JS uses under the hood to achieve that.

Quote
I'm afraid I don't know where to begin in debugging the crashes I've posted about above, any ideas?

Hard to say,  Going by your crash log above, it seems that the crash was ultimately in strlen, which is suspicious and indicates that either:
  • A null pointer was passed into strlen()
  • Some kind of memory corruption (buffer overflow?) happened

Is there any chance you could run the engine under Valgrind?  It will be excruciatingly slow, but if there is any memory corruption Valgrind will tell you exactly where it happens, complete with stack traces.  Make sure to compile the engine with debug symbols.
neoSphere 5.9.2 - neoSphere engine - Cell compiler - SSj debugger
forum thread | on GitHub

  • Rhuan
  • [*][*][*][*]
Re: miniSphere 4.8.4
Reply #2027
I tried Valgrind but it didn't tell me anything useful.

I've tried sticking printfs everywhere, it dies in the function script_eval within script.c, it reliably will run a printf on the line before:
if (!jsal_try_call(0))

But oddly it will not run a printf placed in the top of jsal_try_call - it appears to crash as that function is called, if I comment out that function call it still crashes at about the same time, it seems to me that there's a second thread that's crashing - could it be that CC is receiving the script in the wrong format and it's compiler is multi-threaded and one of those threads is crashing?

I recall the CC documentation talking about different unicode handling functions for cross platform vs windows could this be the issue?

  • Fat Cerberus
  • [*][*][*][*][*]
  • Global Moderator
  • Sphere Developer
Re: miniSphere 4.8.4
Reply #2028
This is what I get out of valgrind when trying to run Cell on Ubuntu:
Code: [Select]
==852== Memcheck, a memory error detector
==852== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==852== Using Valgrind-3.11.0 and LibVEX; rerun with -h for copyright info
==852== Command: cell
==852==
Out of Memory
==852==
==852== Process terminating with default action of signal 6 (SIGABRT)
==852==    at 0x5E80428: raise (raise.c:54)
==852==    by 0x5E82029: abort (abort.c:89)
==852==    by 0x51F2525: Memory::X64WriteBarrierCardTableManager::Initialize() (in /usr/lib/libChakraCore.so)
==852==    by 0x4F7B63A: _GLOBAL__sub_I_RecyclerWriteBarrierManager.cpp (in /usr/lib/libChakraCore.so)
==852==    by 0x40106B9: call_init.part.0 (dl-init.c:72)
==852==    by 0x40107CA: call_init (dl-init.c:30)
==852==    by 0x40107CA: _dl_init (dl-init.c:120)
==852==    by 0x4000C69: ??? (in /lib/x86_64-linux-gnu/ld-2.23.so)
==852==
==852== HEAP SUMMARY:
==852==     in use at exit: 110,188 bytes in 24 blocks
==852==   total heap usage: 26 allocs, 2 frees, 110,723 bytes allocated
==852==
==852== LEAK SUMMARY:
==852==    definitely lost: 0 bytes in 0 blocks
==852==    indirectly lost: 0 bytes in 0 blocks
==852==      possibly lost: 304 bytes in 1 blocks
==852==    still reachable: 109,884 bytes in 23 blocks
==852==         suppressed: 0 bytes in 0 blocks
==852== Rerun with --leak-check=full to see details of leaked memory
==852==
==852== For counts of detected and suppressed errors, rerun with: -v
==852== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
Killed

Out of Memory, before the process even gets going.  No idea what's up with that, as I have 4GB of RAM allocated to the virtual machine.  If I run Cell manually, I get this:
Code: [Select]
fatcerberus@pigcult-vm:~/src/spectacles-i$ cell
Cell X.X.X Sphere packaging compiler (x64)
the JavaScript-powered build engine for Sphere
(c) 2015-2017 Fat Cerberus

setting up Cellscript environment...
evaluating '$/Cellscript.mjs'...
   E:
SCRIPT CRASH: uncaught JavaScript exception.
   at 1:0
1 error(s), 0 warning(s).

miniSphere on the other hand segfaults with a floating point exception in the same environment (and also reports out of memory under valgrind).

You're sure Valgrind didn't report any corruption/buffer overruns?
neoSphere 5.9.2 - neoSphere engine - Cell compiler - SSj debugger
forum thread | on GitHub

  • Rhuan
  • [*][*][*][*]
Re: miniSphere 4.8.4
Reply #2029
It told me stuff just didn't seem useful, and didn't give me proper symbols whatever compiler options I changed.

Code: [Select]
==42230== Memcheck, a memory error detector
==42230== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==42230== Using Valgrind-3.13.0 and LibVEX; rerun with -h for copyright info
==42230== Command: ./ccminisphere.app/contents/macos/minisphere
==42230==
--42230-- run: /usr/bin/dsymutil "./ccminisphere.app/contents/macos/minisphere"
==42230== Syscall param msg->desc.port.name points to uninitialised byte(s)
==42230==    at 0x106B8A34A: mach_msg_trap (in /usr/lib/system/libsystem_kernel.dylib)
==42230==    by 0x106B89796: mach_msg (in /usr/lib/system/libsystem_kernel.dylib)
==42230==    by 0x106B83485: task_set_special_port (in /usr/lib/system/libsystem_kernel.dylib)
==42230==    by 0x106D1F10E: _os_trace_create_debug_control_port (in /usr/lib/system/libsystem_trace.dylib)
==42230==    by 0x106D1F458: _libtrace_init (in /usr/lib/system/libsystem_trace.dylib)
==42230==    by 0x1031C69DF: libSystem_initializer (in /usr/lib/libSystem.B.dylib)
==42230==    by 0x103082A1A: ImageLoaderMachO::doModInitFunctions(ImageLoader::LinkContext const&) (in /usr/lib/dyld)
==42230==    by 0x103082C1D: ImageLoaderMachO::doInitialization(ImageLoader::LinkContext const&) (in /usr/lib/dyld)
==42230==    by 0x10307E4A9: ImageLoader::recursiveInitialization(ImageLoader::LinkContext const&, unsigned int, char const*, ImageLoader::InitializerTimingList&, ImageLoader::UninitedUpwards&) (in /usr/lib/dyld)
==42230==    by 0x10307E440: ImageLoader::recursiveInitialization(ImageLoader::LinkContext const&, unsigned int, char const*, ImageLoader::InitializerTimingList&, ImageLoader::UninitedUpwards&) (in /usr/lib/dyld)
==42230==    by 0x10307D523: ImageLoader::processInitializers(ImageLoader::LinkContext const&, unsigned int, ImageLoader::InitializerTimingList&, ImageLoader::UninitedUpwards&) (in /usr/lib/dyld)
==42230==    by 0x10307D5B8: ImageLoader::runInitializers(ImageLoader::LinkContext const&, ImageLoader::InitializerTimingList&) (in /usr/lib/dyld)
==42230==  Address 0x1078f465c is on thread 1's stack
==42230==  in frame #2, created by task_set_special_port (???:)
==42230==
--42230-- UNKNOWN mach_msg unhandled MACH_SEND_TRAILER option
--42230-- UNKNOWN mach_msg unhandled MACH_SEND_TRAILER option (repeated 2 times)
--42230-- UNKNOWN mach_msg unhandled MACH_SEND_TRAILER option (repeated 4 times)
==42230== Thread 2:
==42230== Invalid read of size 4
==42230==    at 0x106CE7899: _pthread_body (in /usr/lib/system/libsystem_pthread.dylib)
==42230==    by 0x106CE7886: _pthread_start (in /usr/lib/system/libsystem_pthread.dylib)
==42230==    by 0x106CE708C: thread_start (in /usr/lib/system/libsystem_pthread.dylib)
==42230==  Address 0x18 is not stack'd, malloc'd or (recently) free'd
==42230==
==42230==
==42230== Process terminating with default action of signal 11 (SIGSEGV)
==42230==  Access not within mapped region at address 0x18
==42230==    at 0x106CE7899: _pthread_body (in /usr/lib/system/libsystem_pthread.dylib)
==42230==    by 0x106CE7886: _pthread_start (in /usr/lib/system/libsystem_pthread.dylib)
==42230==    by 0x106CE708C: thread_start (in /usr/lib/system/libsystem_pthread.dylib)
==42230==  If you believe this happened as a result of a stack
==42230==  overflow in your program's main thread (unlikely but
==42230==  possible), you can try to increase the size of the
==42230==  main thread stack using the --main-stacksize= flag.
==42230==  The main thread stack size used in this run was 8388608.
==42230==
==42230== HEAP SUMMARY:
==42230==     in use at exit: 1,301,583 bytes in 807 blocks
==42230==   total heap usage: 1,751 allocs, 944 frees, 2,460,469 bytes allocated
==42230==
==42230== LEAK SUMMARY:
==42230==    definitely lost: 7,112 bytes in 96 blocks
==42230==    indirectly lost: 1,074,264 bytes in 86 blocks
==42230==      possibly lost: 3,784 bytes in 108 blocks
==42230==    still reachable: 62,761 bytes in 219 blocks
==42230==         suppressed: 153,662 bytes in 298 blocks
==42230== Rerun with --leak-check=full to see details of leaked memory
==42230==
==42230== For counts of detected and suppressed errors, rerun with: -v
==42230== Use --track-origins=yes to see where uninitialised values come from
==42230== ERROR SUMMARY: 2 errors from 2 contexts (suppressed: 28 from 4)
Segmentation fault: 11

Putting my printfs in the right place I was able to get it to print the whole of the startup game's main.js to the terminal so it is finding and opening the file - it's just when it tries to evaluate it that everything goes wrong.

  • Fat Cerberus
  • [*][*][*][*][*]
  • Global Moderator
  • Sphere Developer
Re: miniSphere 4.8.4
Reply #2030
This is the cause of the segfault:
Code: [Select]
Invalid read of size 4
==42230==    at 0x106CE7899: _pthread_body (in /usr/lib/system/libsystem_pthread.dylib)
==42230==    by 0x106CE7886: _pthread_start (in /usr/lib/system/libsystem_pthread.dylib)
==42230==    by 0x106CE708C: thread_start (in /usr/lib/system/libsystem_pthread.dylib)
==42230==  Address 0x18 is not stack'd, malloc'd or (recently) free'd

Note 0x18 is a very low address - this indicates a null pointer dereference.  It's also happening off the main thread.  Hmm...
neoSphere 5.9.2 - neoSphere engine - Cell compiler - SSj debugger
forum thread | on GitHub

  • Rhuan
  • [*][*][*][*]
Re: miniSphere 4.8.4
Reply #2031
I tried a few other things:

1. replacing main.js with a blank file -> no segfault, minisphere opens and closes

2. replacing main,js with:
Code: [Select]
function game() {}
-> minisphere opens, the console tells me it evals the script then tries to call function game then it closes down with no segfault.


  • Fat Cerberus
  • [*][*][*][*][*]
  • Global Moderator
  • Sphere Developer
Re: miniSphere 4.8.4
Reply #2032
I think there's a buffer overflow somewhere.  After screwing around with valgrind some more (you need to build CC with --valgrind on Linux apparently to get it to work), I noticed this:

Code: [Select]
Invalid read of size 4
==6756==    at 0x412079: lstr_from_wide (lstring.c:398)
==6756==    by 0x40E9A5: jsal_get_lstring (jsal.c:537)
==6756==    by 0x40EC40: jsal_get_string (jsal.c:613)
==6756==    by 0x408E27: main (main.c:314)
==6756==  Address 0x17310000 is in a --- anonymous segment

"Anonymous segment", from my cursory research, basically means "JITted code".

Pretty sure my code is buggy somewhere, it's just that Windows is a lot more forgiving of bad memory accesses.  The engine does segfault for me on occasion while calling JS functions, for what it's worth.
neoSphere 5.9.2 - neoSphere engine - Cell compiler - SSj debugger
forum thread | on GitHub

  • Rhuan
  • [*][*][*][*]
Re: miniSphere 4.8.4
Reply #2033
I was using the testbuild of CC, I'll see if the release build performs differently.

Re: miniSphere 4.8.4
Reply #2034
This is a 4.8.4 build or...?

Yes, this happens whether I directly pull it or download the source code of the 4.8.4 release.

  • Rhuan
  • [*][*][*][*]
Re: miniSphere 4.8.4
Reply #2035
No improvement for using the release build.

One thing I've noted though, I've checked back through a lot of my errors and normally any segfault follows a longjump - i.e. the error seems to be in the inbuilt error handling - though obviously the error handling shouldn't be triggering in the first place as I'm giving it legit JS to read.

I've tested with adding and removing lines form the JS. And it seems that our crashes are triggered by calling font#drawText()

  • Last Edit: August 31, 2017, 04:42:34 pm by Rhuan

  • Fat Cerberus
  • [*][*][*][*][*]
  • Global Moderator
  • Sphere Developer
Re: miniSphere 4.8.4
Reply #2036
What is sizeof(wchar_t) on macOS?  On Windows it's 2, on Linux it's 4.  I'm pretty sure that's why I can't get it to work in Linux, at least, CC uses UTF-16 where a widechar is 2 bytes.
neoSphere 5.9.2 - neoSphere engine - Cell compiler - SSj debugger
forum thread | on GitHub

  • Rhuan
  • [*][*][*][*]
Re: miniSphere 4.8.4
Reply #2037
What is sizeof(wchar_t) on macOS?  On Windows it's 2, on Linux it's 4.  I'm pretty sure that's why I can't get it to work in Linux, at least, CC uses UTF-16 where a widechar is 2 bytes.
It's 4 on macOS as well.

  • Fat Cerberus
  • [*][*][*][*][*]
  • Global Moderator
  • Sphere Developer
Re: miniSphere 4.8.4
Reply #2038
Try checking out the latest build from the chakra-js branch.  I got it up on Linux.
neoSphere 5.9.2 - neoSphere engine - Cell compiler - SSj debugger
forum thread | on GitHub

  • Rhuan
  • [*][*][*][*]
Re: miniSphere 4.8.4
Reply #2039
Have we both just done the same thing?

I added:
typedef __CHAR16_TYPE__ char16_t;
to the top of one of your headers and then replaced all uses of wchar_t with char16_t (this definition is apparently meant to be in a standard header called uchar.h but that standard header doesn't appear to come with macos)

Having done this the startup game would then load with CC miniSphere. Unfortunately the ExecuteGame function doesn't work (I've tried running a different game by putting it in the startup folder and it gets somewhere whereas trying to call it from the startup game gives a segfault)

Found a different issue... :( Chakra Core doesn't seem to implement the TextDecoder object which causes a bit of a problem for the "standard" v2 way of reading binary data. -> apparently it's a forthcoming feature: https://wpdev.uservoice.com/forums/257854-microsoft-edge-developer/suggestions/6558040-support-the-encoding-api#{toggle_previous_statuses}
  • Last Edit: August 31, 2017, 05:47:10 pm by Rhuan