Skip to main content

News

Topic: Future of Sphere: text parsing? (Read 7395 times) previous topic - next topic

0 Members and 1 Guest are viewing this topic.
  • N E O
  • [*][*][*][*][*]
  • Administrator
  • Senior Administrator
Future of Sphere: text parsing?
Sphere currently has a few script-based text parsers, allowing use of marked up text to be rendered in real-time in a project; parsers include the not-yet standalone parser by Metallix used in Aquatis and its demos, and NTML by myself based off of MML by Paragon. Currently, all these scripts parse some dialect of markup similar in syntax to HTML, where marked up text is surrounded by delimiters of some sort encapsulated by some bracket characters (usually < > in HTML-style parsers, [ ] in bbcode-style parsers, and { } in Mustache-style parsers).

Recently, Markdown has seen an increase in popularity (possibly due to the increase in popularity of GitHub over Sourceforge) and there are parsers in a multitude of languages, including a few in JavaScript. While they primarily convert Markdown to HTML, I assume that, given the open-source nature of most of them, they can be edited to convert Markdown to, say, JS statements to execute.

The question I present is three-fold: do we have a generalized need for text parsing, is the parsing going to be built into the engine, and is Markdown going to be the syntax?

There are arguments for and against each part of the question, but particularly WRT the syntax. For most purposes Markdown would be enough for the source material but wouldn't be immediately extensible later on, while an HTML or XML-like syntax would be extremely flexible but the parser is more complicated to create and maintain.

So! Do we/will we need to consider including parsing in future distribution of Sphere? If so, is it going to be built into the engine or is it going to be provided as a (system?) script? Is said parser going to be Markdown, HTML/XML-like, or some other syntax?

  • Radnen
  • [*][*][*][*][*]
  • Senior Staff
  • Wise Warrior
Re: Future of Sphere: text parsing?
Reply #1
XML parsing built into the Engine at one point, definitely. Then with that we can use XML to build whatever we need:

I can think of 4 areas in games that would benefit from a markdown language, and I think XML can be the most adaptable:

  • Cutscene markdown

  • Textbox markdown

  • GUI markdown

  • Item/Player stat markdown



Markdown, is not adaptable to games... I don't think. Or is it adaptable? Can Sphere add to Markdown via functions to add syntax to it? I'd like for example, if we use Markdown, for it to be extended to do this:

Code: (Markdown) [Select]

Blockman
========
Hi, what are you doing?

General
=======
I'm getting my troops in order.

Move Blockman 5 N
Fork
    Move General 4 E
End
Wait Blockman


How would that work out?
  • Last Edit: March 29, 2013, 06:55:45 pm by Radnen
If you use code to help you code you can use less code to code. Also, I have approximate knowledge of many things.

Sphere-sfml here
Sphere Studio editor here

Re: Future of Sphere: text parsing?
Reply #2
A Markdown (or modified) markdown plugin that works tightly with the MapEngine plugin of TurboSphere would be quite possible. It would also allow for other syntaxes to be used, if desired (and a  plugin existed for it).

This is just exactly what plugins are nice for--something that not everyone will want, but still could benefit from close integration when it is present.

Re: Future of Sphere: text parsing?
Reply #3

Code: (Markdown) [Select]

Blockman
========
Hi, what are you doing?

General
=======
I'm getting my troops in order.

Move Blockman 5 N
Fork
    Move General 4 E
End
Wait Blockman


Oddly enough, that looks very much like the simple scripting language (ArqScript) I created for doing cutscenes with my RPG framework Arq. It doesn't have any formatted text stuff yet though.

  • N E O
  • [*][*][*][*][*]
  • Administrator
  • Senior Administrator
Re: Future of Sphere: text parsing?
Reply #4
@Radnen - you're getting more into scenario or cutscene parsing territory. In your example, you're better off doing some line-by-line routine or flat out JS. I'm referring to markup in the "text formatting" sense (eg, "turn this word red, make it bold, italicize, and jitter its letters independently 5px for 3 seconds" to simulate something being scary within dialogue text or something) and mainly wondering if we should leave such markup systems to scripts or build it into font.drawText(Box)/extend to font.drawText(Box)Formatted (or could be named font.drawRichText(Box) or something).

@Jester - once TS is up to par, a Markdown text parsing plugin as a proof of concept would be a great trial for us!

  • Radnen
  • [*][*][*][*][*]
  • Senior Staff
  • Wise Warrior
Re: Future of Sphere: text parsing?
Reply #5

@Radnen - you're getting more into scenario or cutscene parsing territory. In your example, you're better off doing some line-by-line routine or flat out JS.


Oh I know, textbox stuff is easy, and I was hoping for something more generalized. Flat-out JS can look ugly to read and long to type. I know, it's what I've been doing.

@Alpha123: Could I see your Arq script? Perhaps I'll incorporate it into my cutscene manager, if you will. I wonder if it's adaptable or if you've designed it around your own static functions (if so, that's okay).
If you use code to help you code you can use less code to code. Also, I have approximate knowledge of many things.

Sphere-sfml here
Sphere Studio editor here

Re: Future of Sphere: text parsing?
Reply #6

Could I see your Arq script? Perhaps I'll incorporate it into my cutscene manager, if you will. I wonder if it's adaptable or if you've designed it around your own static functions (if so, that's okay).

It's unfortunately very much a work-in-progress (you probably can't run actual cutscenes with it yet). The language works fine except for built-in dialogue features which have yet to be implemented. Mostly it's the actual cutscene functions that are missing; I'm using a wrapper around Scenario to implement those. They are not entirely independent of the language (fork and sync are language constructs -- like in your example -- and rely on Scenario, and the built-in dialogue features will too) but should be adaptable.

I checked and the lexer and parser don't have any external dependencies, but the generated code does require MooTools to run and I'm not sure about the compiler. You can see the code here, but that is currently pretty out-of-date compared to my working copy. The parser and the lexer should be fairly easy to understand and are pretty well-written, but the compiler is a bit of a mess. I've never taken a formal course in compiler construction, and it shows.

But anyway, I'll do a little more work on it and if you do decide to use it I'll see what I can do to make that easier for you. It's a very friendly domain-specific language for cutscenes and cuts out a lot of cruft compared to JavaScript. Here's how your example would look (this would almost run in the current ArqScript; the syntax for the move function depends on the cutscene engine you're using).

Code: (arqscript) [Select]

Blockman: "Hi, what are you doing?"
General: "I'm getting my troops in order."

move('Blockman, "5N");
fork
    move('General, "4E");
end
sync;


I'd like to cut out more cruft (semicolons and parens) eventually. But the point is that the equivalent JS for this would be much uglier.

Rad-Edit: adding a value in code where it says 'here' [ code: here ] makes line numbers appear.
  • Last Edit: March 31, 2013, 03:31:06 am by Radnen

  • Radnen
  • [*][*][*][*][*]
  • Senior Staff
  • Wise Warrior
Re: Future of Sphere: text parsing?
Reply #7
Compiler? You'd likely want an interpreter, that would be much easier. And I'm surprised you mention you haven't taken a "formal" compilers class when I have (just now finished) and your code would look no different than what we would do in class if not better, far better (even if it looks a bit like chicken scratch). You have the concept of it down, from lexical analysis to "code generation" (technically it's JS here and not say, IA32). I doubt a formal class could teach you much more. Besides that you shouldn't make a token for comments, they'd be stripped during lexical analysis.

Tell me, if you never took such a class, how did you learn how to build a recursive descent parser? Learn about "token"s? That Lexical analysis came before parsing, etc. It's really quite remarkable. Or you wrote it but still not certain as to how or why?

Anyways, if I was going to make my own language I would make a "byte-code" (well, they wouldn't be bytes) interpreter utilizing a stack rather than a full-blown compiler.
  • Last Edit: March 31, 2013, 01:48:27 am by Radnen
If you use code to help you code you can use less code to code. Also, I have approximate knowledge of many things.

Sphere-sfml here
Sphere Studio editor here

Re: Future of Sphere: text parsing?
Reply #8

Compiler? You'd likely want an interpreter, that would be much easier.

Transpiling to JS is very easy if your language has the same semantics, and it's easy enough to fake slightly different semantics.
An interpreter probably would be a better solution though, especially if I did want different semantics (you can see from the last few commits I disabled case insensitivity because that didn't really work out). Given that I basically generate the code and then eval() it immediately an interpreter would be nice. :P
I guess I'll write an interpreter for it. I'm not at all happy with the current compiler anyway; it's loaded with ugly corner cases, among other problems. Do you have any recommended articles for writing an interpreter?

Quote

And I'm surprised you mention you haven't taken a "formal" compilers class when I have (just now finished) and your code would look no different than what we would do in class if not better, far better (even if it looks a bit like chicken scratch). You have the concept of it down, from lexical analysis to "code generation" (technically it's JS here and not say, IA32). I doubt a formal class could teach you much more. Besides that you shouldn't make a token for comments, they'd be stripped during lexical analysis.

I read stuff on the internet. :D
I've read a lot about parsing and I'm actually pretty proud of the parser. It works very well and the code is pretty clean. Particularly these excellent articles were very very helpful.

I figured out the compiler from trial and error (and well, if you think about it, once you have the AST a compiler isn't very complicated, especially if you can compile in one pass to a high-level language like JS). The ArqScript compiler is descended from quite a few other simple transpile-to-JS languages that I've toyed with.

The reason I have a token for comments was so that I could compile comments to JS as well, but I later decided I had more important features to work on. I should probably remove that.

Quote

Tell me, if you never took such a class, how did you learn how to build a recursive descent parser? Learn about "token"s? That Lexical analysis came before parsing, etc. It's really quite remarkable. Or you wrote it but still not certain as to how or why?

I'm fully certain as to how and why it works. All the stuff you mentioned I got from the articles linked above and various other sources. Really all I didn't know was how to actually build a compiler. Which is why the current one looks like chicken scratch. xP

Quote

Anyways, if I was going to make my own language I would make a "byte-code" (well, they wouldn't be bytes) interpreter utilizing a stack rather than a full-blown compiler.

I'll look into that. It would certainly have the advantages of being more maintainable, the freedom to diverge semantics (although this is a simple DSL, it's not like it needs complicated semantics :P), and better error messages.


Thanks a bunch for the review. It's nice to have someone who actually knows what they're doing look at my code. :)

  • Radnen
  • [*][*][*][*][*]
  • Senior Staff
  • Wise Warrior
Re: Future of Sphere: text parsing?
Reply #9
Well, one thing I can tell you is that in my compilers class we don't work with a parser or lexical analyzer. We used generators for those. So, you use regular expressions to define the tokens, and then grammars to define the semantics.

You can see the lexer generator I made here: https://github.com/Radnen/radlib/blob/master/scripts/radscript/radlexer.js

To define a language, such as a language of expressions, we can use regex's as so:
Code: (javascript) [Select]

RadLexer.register(/^[0-9]+$/, NUM);
RadLexer.register(/^print$/, PRINT); // quick way of grabbing 'print', not ideal though.
RadLexer.register(/^#[^\n\r]*$/, COMMENT); // my comments start with a single #
RadLexer.register(/^+$/, '+');
RadLexer.register(/^-$/, '-');
RadLexer.register(/^*$/, '*');
RadLexer.register(/^\/$/, '/');
RadLexer.register(/^;$/, ';');
RadLexer.register(/^[\s]+$/, WHITESPACE);
RadLexer.register(/^[^ \n\r]+$/, ERROR); // all other characters not caught by previous regex steps are errors


And in RadLexer.tokens, there is an array of the tokens with what will be debug data soon enough.

I have not yet made the parser generator, but once that is done, creating a new language would be fairly easy. I won't go as far as a compiler, but it's not hard to build one once the parser created the abstract syntax tree.
  • Last Edit: March 31, 2013, 04:17:58 pm by Radnen
If you use code to help you code you can use less code to code. Also, I have approximate knowledge of many things.

Sphere-sfml here
Sphere Studio editor here

Re: Future of Sphere: text parsing?
Reply #10
A top-down operator precedence parser was easier for me to make (they're trivial once you understand the concept; it's really a fantastic algorithm) than it was to learn how to use a parser generator and make it conform to my will. :P Also it was a nice learning experience.

If you like generated parsers (ugly and inflexible, IMO) see Jison (LALR and some others) and PEG.js (I have no idea what a PEG parser is, but it looks interesting).

  • Radnen
  • [*][*][*][*][*]
  • Senior Staff
  • Wise Warrior
Re: Future of Sphere: text parsing?
Reply #11
I guess you can say we've shown that text parsing is something one can do in JS/Sphere. :)

I guess that is a partial answer to NEO's question: we have been making our own. But he;'s right a generalized one would be nice. Markdown is great and if Jester was going to add that as a plugin I'd have a vote for it. But also, XML is not bad at all, and I can see many uses for that as well especially for data storage. So both would be great.
If you use code to help you code you can use less code to code. Also, I have approximate knowledge of many things.

Sphere-sfml here
Sphere Studio editor here

Re: Future of Sphere: text parsing?
Reply #12
I think it can help to have a predefined text parser somewhere in the sphere package you download... but I would realize it as a simple system script. Plugins are harder to maintain because they are precompiled and you have to upload the source somewhere, while a script everyone can modify if he needs to. I think that approach worked well in the history of sphere.

For event scripts I do not see the purpose of an own script format... If you write a cutscene.js in a good way it can provide a nice interface to write cutscenes in javascript where every line is a new step in a sequence of events.

Probably, if the number of scripts grows, a download manager for system scripts in the editor would make sense...

  • Radnen
  • [*][*][*][*][*]
  • Senior Staff
  • Wise Warrior
Re: Future of Sphere: text parsing?
Reply #13

Probably, if the number of scripts grows, a download manager for system scripts in the editor would make sense...


Now that, that would be nice. If we get a repository up on these forums, I'll try adding that to my editor. It'll download files directly into the /scripts system folder. I still need to create a general game packager though - it'll not use the .spk format, but instead bundle up Sphere with your game in /startup (with an option for the config.exe).
If you use code to help you code you can use less code to code. Also, I have approximate knowledge of many things.

Sphere-sfml here
Sphere Studio editor here

Re: Future of Sphere: text parsing?
Reply #14
I don't know what is the best practice here, but I would force the downloadable system scripts to have a "description/how to use" comment section on top of the file. Only if it appears there it can be installed by the manager. I think a bad situation would be a bunch of system scripts and you don't know what script is providing which feature...