Skip to main content

News

Topic: Function to parse a CSV file (Read 4719 times) previous topic - next topic

0 Members and 1 Guest are viewing this topic.
  • Rhuan
  • [*][*][*][*]
Function to parse a CSV file
I want to be able to use spreadsheets as data files in my current project - and obviously want Sphere to be able to read them, the easiest method seemed to be to save as CSV and then parse it, so I wrote a simple CSV parser (only tested with UTF-8 character encoding may need tweaking for different formats), anyway thought I'd share it in case any else wants this functionality.

Notes:
- sphere version 1 (sorry Fat Cerberus)
- has to use a length variable for the current field byte array (out_Rdata) as miniSphere doesn't let you dynamically resize byte arrays
- only supports 200 bytes per field - though you can change this by increasing the number on line 5
- returns the data as a 2D array where output[1][2] would be the value from the 2nd row and 3rd column. output[0][0] = value from 1st row and 1st column etc. It converts numbers into JS numbers and returns anything else as strings.

Code: [Select]
function parseCSV(input)
{
  var file = OpenRawFile(input);
  var in_data = file.read(file.getSize());
  var out_Rdata = CreateByteArray(200);
  var R_length = 0;
  var out_data  = [[]];
  var in_quotes = false;
  function convert_data(Rdata)
  {
    if(!(Rdata * 1))
    {
      out_data[out_data.length-1].push(Rdata);
    }
    else
    {
      out_data[out_data.length-1].push(Rdata*1);
    }
  }
  for(var i = 0; i<in_data.length; ++i)
  {
    if(in_quotes)
    {
      if(in_data[i] == 0x22)
      {
        in_quotes = false;
      }
      else
      {
        out_Rdata[R_length]=in_data[i];
        ++R_length;
      }
    }
    else
    {
      switch(in_data[i])
      {
        case(0x2C)://comma
        {
          if(R_length > 0)
          {
            convert_data(CreateStringFromByteArray(out_Rdata.slice(0,R_length)));
          }
          else
          {
            out_data[out_data.length-1].push("");
          }
          R_length =0;
          break;
        }
        case(0x22)://quotes
        {
          in_quotes = true;
          break;
        }
        case(0x0D)://Carriage Return -ignored as followed by LineFeed
        {
          break;
        }
        case(0x0A)://Line Feed - push the data and then move down a line
        {
          if(R_length > 0)
          {
            convert_data(CreateStringFromByteArray(out_Rdata.slice(0,R_length)));
          }
          else
          {
            out_data[out_data.length-1].push("");
          }
          R_length = 0;
          out_data.push([]);
          break;
        }
        default://anything else
        {
          out_Rdata[R_length]=in_data[i];
          ++R_length;
        }
      }
    }
  }
  if(out_Rdata.length > 0)//push the last piece of data as there's no terminator character
  {
    convert_data(CreateStringFromByteArray(out_Rdata.slice(0,R_length)));
  }
  else
  {
   out_data[out_data.length-1].push("");
  }
  return out_data;
 
}

  • Fat Cerberus
  • [*][*][*][*][*]
  • Global Moderator
  • Sphere Developer
Re: Function to parse a CSV file
Reply #1
Quote
miniSphere doesn't let you dynamically resize byte arrays


To be fair, Sphere 1.5 doesn't either ;)  It's actually harder than it sounds because you might pass the byte array pointer to something that works asynchronously and resizing the buffer will change its location in memory, leading to a crash.  This is why ArrayBuffers in ES6+ aren't resizable too.

Anyway, you'd probably be better off to do something like (still Sphere v1):

Code: (javascript) [Select]

var file = OpenRawFile(filename);
var input = file.read(file.getSize());
var lines = CreateStringFromByteArray(input).split(/\r?\n/);
var output = [];
for (var i = 0; i < lines.length; ++i) {
    output[i] = [];
    // parse a CSV line here (String#split won't work because you might have quoted commas)
}
file.close();


Much easier since you're dealing directly with string data instead of bytes.  At least I think so.
neoSphere 5.9.2 - neoSphere engine - Cell compiler - SSj debugger
forum thread | on GitHub

  • Rhuan
  • [*][*][*][*]
Re: Function to parse a CSV file
Reply #2
I think you have to do one character at a time in case you have a quoted new line (which doesn't count as a new field) - I won't but I wanted to support it.

I could CreateStringFromByteArray each character rather than doing the switch with the bytes but that seemed like an unnecessary step - though my method would obviously need to change for different character encodings.

EDIT: I guess I could create string as the starting point then walk down the string instead of walking down the byte array - not sure why i didn't do it this way.

EDIT2: corrected bizarre typo "as a new field" instead of "as and to a field"
  • Last Edit: May 21, 2017, 11:51:18 am by Rhuan

  • Fat Cerberus
  • [*][*][*][*][*]
  • Global Moderator
  • Sphere Developer
Re: Function to parse a CSV file
Reply #3

I think you have to do one character at a time in case you have a quoted new line (which doesn't count as and to a field) - I won't but I wanted to support it.


Huh, CSV is a weird format.  I've never seen any language where putting a newline between quotes wasn't a syntax error (usually they are escaped as, e.g. \n).  Learn something new every day!
neoSphere 5.9.2 - neoSphere engine - Cell compiler - SSj debugger
forum thread | on GitHub

Re: Function to parse a CSV file
Reply #4

Huh, CSV is a weird format.

CSV is a "format" in only the loosest sense. Parsers have to be extremely lax, and encoders should be extremely simple, since it's so loose.


I've never seen any language where putting a newline between quotes wasn't a syntax error (usually they are escaped as, e.g. \n).

Python lets you do it if you use three double quotes. Mercury works this way by default. It's actually extremely convenient, and makes me wish more languages had some way to do this.

  • Fat Cerberus
  • [*][*][*][*][*]
  • Global Moderator
  • Sphere Developer
Re: Function to parse a CSV file
Reply #5
Oh, haha.  I was referring more to interchange formats, "language" was the wrong word.  Generally quoted strings don't span lines in text-based interchange formats (e.g. JSON).  At least in my experience, anyway.

But yeah, multiline strings are sometimes convenient, although I admit to avoiding them because they play havoc with my OCD.  I want to indent the string to the same level as the surrounding code but can't because that adds extraneous spaces to the text. :(
neoSphere 5.9.2 - neoSphere engine - Cell compiler - SSj debugger
forum thread | on GitHub