XML to CSV convertion

UncleBen · Post by **UncleBen** » Wed Dec 10, 2014 10:53 am

Hello everyone!

At the start I've to say that I've got Macro Scheduler 11 and i can't install newer version.

I've got a problem with XML files. I would like to convert them to CSV files but I haven't got any ideas how to change every phrase between < and > to , . or nothing (whatever).

In example I've got a file:

Code: Select all

<name>John</name>
<surname>Jerremy</surname>
<age>15</age>
<weight>63</weight>
<height>165</height>

and i want to change it to:

Code: Select all

John,Jerremy,15,63,165

Have you got any ideas how can I do that?

Thanks in advance!

Jerry Thomas · Post by **Jerry Thomas** » Wed Dec 10, 2014 5:11 pm

Do all of the records have the same fields?
(This looks like a database type application where they would...)

START LOOP (Repeat..Until, or While..Endwhile)
Search for <name>,
Find position of "<" following <name>
Get the data in between
Write to new file

Search for <surname>
Find position of "<" following <surname>
Get the data in between
Write to new file

etc.
END LOOP

This could be a very tight loop by using a common subroutine for the 4 line groups above.
LOOP
GoSub>GetField,name
GoSub>GetField,surname
etc

One possible extra validation would depend on missing data.
For example, a <name> that does not have an age, is that field blank? Or omitted!
You may have to find the next <name> and then verify that each of the other fields occurs before that

JRL · Post by **JRL** » Wed Dec 10, 2014 8:36 pm

Here's sample script that works with the data you've provided. Similar to what Jerry described.

Code: Select all

//A method to set data to a variable
//You'll probably use ReadFile>
LabelToVar>Text,vData

//Break the data up into individual lines
Separate>vData,crlf,vLine
//Initialise a variable to be the comma delimited line.
Let>vNewLine=
//Repeat loop for parsing each line
Let>kk=0
Repeat>kk
  Add>kk,1
  //Set the line to a variable
  Let>value=vLine_%kk%
  //If the line has no data parsing will cause an error
  If>Value=
    Goto>SkipParse
  EndIf
  //Get the position of the open Tag opening bracket
  Position><,Value,1,vPos<
  //Get the position of the open Tag closing bracket
  Position>>,Value,1,vPos>
  //Use the bracket positions to get the value of the tag.
  MidStr>Value,{%vPos<%+1},{%vPos>%-2},Tag
  //Remove the open and close tags from the line
  StringReplace>Value,Tag,,Value
  StringReplace>Value,<>,,Value
  StringReplace>Value,</>,,Value
  Label>SkipParse
  //Append the result to the comma delimited line.
  ConCat>vNewLine,%Value%%comma%
Until>kk=vLine_Count


MDL>vNewLine


/*
Text:
<name>John</name>
<surname>Jerremy</surname>
<age>15</age>
<weight>63</weight>
<height>165</height>
*/

hagchr · Post by **hagchr** » Thu Dec 11, 2014 2:33 pm

An alternative approach for anyone interested in RegEx. It is probably an over-kill for smaller data sets but could well be worth it for larger ones (seconds instead of minutes in terms of runtime). It is based on the info provided in the post so would possibly have to be adjusted if the actual file differs (I assumed the 5 tag items keep repeating).

Code: Select all

//Read the XML data into the variable strXML
LabelToVar>XML,strXML,1,0,{"*/"}

//Remove any lines without tags, ie only keep lines with tags
RegEx>(?m)^[^<]*\n,strXML,0,M,NM,1,,strXML

//From the remaining lines, remove all tags, ie gives you the items line by line
RegEx></?.+?>,strXML,0,M,NM,1,,strXML

//Put everything on one line with items comma separated by replacing each carriage return/line feed with comma
RegEx>\r\n,strXML,0,M,NM,1,{","},strXML

//If all the records are the same size (here 5) then the following can be used to 
//put one record per line by adding a carriage return/line feed after each group
//Just remove if not applicable
CodeBlock
    Let>RecordSize=5
    Let>tmp0=((.*?,){%RecordSize%})
    RegEx>tmp0,strXML,0,M,NM,1,$1%CRLF%,strXML
EndCodeBlock

//Remove the trailing comma on each line 
Let>tmp0=(?m),$
RegEx>tmp0,strXML,0,M,NM,1,,strXML

MDL>strXML

/*
XML:
<name>John1</name>
<surname>Jerremy</surname>
<age>15</age>
<weight>63</weight>
<height>165</height>
Garbage
Garbage
<name>John2</name>
<surname>Jerremy</surname>
<age>15</age>
<weight>63</weight>
<height>165</height>
Garbage
Garbage
<name>John3</name>
<surname>Jerremy</surname>
<age></age>
<weight></weight>
<height>165</height>
*/

Post by **Marcus Tettmar** » Thu Dec 11, 2014 5:06 pm

The latest version of Macro Scheduler has a built in XML parsing function called XMLParse. Makes it much easier to pull data out of XML and there's no need to convert to CSV.

https://www.mjtnet.com/manual/xmlparse.htm

UncleBen · Post by **UncleBen** » Mon Dec 22, 2014 8:05 am

Thank you everyone! You helped me a lot!

XML to CSV convertion

XML to CSV convertion

Re: XML to CSV convertion

Re: XML to CSV convertion

Re: XML to CSV convertion

Re: XML to CSV convertion

Re: XML to CSV convertion