Trim CRLF from a certain CSV column

Technical support and scripting issues

Moderators: Dorian (MJT support), JRL

Post Reply
Vilmondes
Newbie
Posts: 15
Joined: Sun May 11, 2014 3:37 pm

Trim CRLF from a certain CSV column

Post by Vilmondes » Sun May 11, 2014 3:46 pm

Hello,

I guess the best way to trim CRLF from a given CSV column would to use SQL statements, however, as per the http://www.mjtnet.com/blog/2009/10/05/m ... csv-files/ article, it's not possible to UPDATE (not sure whether it's already possible).

So the question is: what's the best way to do that? I need to do it before looping through the CSV file using CSVFileToArray.

Thanks,
Vilmondes

User avatar
JRL
Automation Wizard
Posts: 3526
Joined: Mon Jan 10, 2005 6:22 pm
Location: Iowa

Re: Trim CRLF from a certain CSV column

Post by JRL » Sun May 11, 2014 5:03 pm

Hi Vilmondes,
Sorry but I'm confused. What do you want to do?

A CSV file is a text file so a crlf will create a new line in the file. Any column containing crlf's will therefore automatically be the last column. For a CSV file to function properly the last column needs to end with crlf. CSVFileToArray will automatically remove the crlf from the value in the last column.

So... what is it you're trying to accomplish?

Vilmondes
Newbie
Posts: 15
Joined: Sun May 11, 2014 3:37 pm

Re: Trim CRLF from a certain CSV column

Post by Vilmondes » Sun May 11, 2014 5:15 pm

Hi JRL,

Thanks for replying.

The thing is that one of the columns (which should not be the last column) contains CRLF or something like that which is making it to be the last column, so the row is being broken into 3 rows, i.e: the full row should have 30 columns, but now I have 3 rows, one with 12, another one with 10 and another with 8.

I need to somehow remove the CRLF from this column which is in the middle of the row, so that it doesn't become the last column.

It happens because this is data inserted by users, so they sometimes generate carriage returns.

Does it make sense?

Thanks,
Vilmondes

User avatar
JRL
Automation Wizard
Posts: 3526
Joined: Mon Jan 10, 2005 6:22 pm
Location: Iowa

Re: Trim CRLF from a certain CSV column

Post by JRL » Sun May 11, 2014 5:42 pm

SQL isn't going to fix this because the format of the CSV file is destroyed. The best thing would be to fix the csv input so it doesn't allow the crlf to be added. If that can't be done then my best guess would be you'll need to reconstruct the files one line at a time removing the crlf's that are not wanted. Assuming there are supposed to be 30 columns you should be able to rebuild the file by doing a ReadFile> then do a StringReplace> on the entire file removing all crlf's. After that Separate> the entire file by "comma". Then rewrite each line using groups of 30 from the Separate>. Something like this: The code is untested so step through it to make sure.

Code: Select all

Let>vFile=C:\CSFFile.CSV
Let>vNewFile=C:\newCSFFile.CSV

ReadFile>vFile,vData
StringReplace>vData,crlf,,vData
Separate>vData,comma,item

Let>Counter=0
Let>kk=0

Repeat>Counter
  Add>Counter,30
  Let>NewLine=
  Repeat>kk
    Add>kk,1
    Let>value=Item_%kk%
    If>kk=Counter
      ConCat>%NewLine%,%Value%
      WriteLn>vNewFile,wres,newline
    Else
      ConCat>%NewLine%,%Value%%comma%
    EndIf
  Until>kk=Counter
Until>Counter=Item_Count

Vilmondes
Newbie
Posts: 15
Joined: Sun May 11, 2014 3:37 pm

Re: Trim CRLF from a certain CSV column

Post by Vilmondes » Sun May 11, 2014 5:49 pm

Humm, I got it. I had already contacted the tool's tech support hoping that they can fix that.

In the meantime I'll test this code you suggested and let you know the result.

Thanks a lot for your help =]

Vilmondes
Newbie
Posts: 15
Joined: Sun May 11, 2014 3:37 pm

Re: Trim CRLF from a certain CSV column

Post by Vilmondes » Sun May 11, 2014 5:56 pm

Ah, one more question, please:

Do you think it would be possible to call a PHP file that would fix the csv and then Macro Scheduler goes through the rest of the script? Just wanted to know if it's possible to call another script and then get back to the .scp.

So the steps would be:

- Macro Scheduler downloads the .csv;
- A .php file is called which would fix the .csv;
- Then MacroScheduler carries on doing what it has to do.

User avatar
JRL
Automation Wizard
Posts: 3526
Joined: Mon Jan 10, 2005 6:22 pm
Location: Iowa

Re: Trim CRLF from a certain CSV column

Post by JRL » Sun May 11, 2014 7:41 pm

I don't know PHP so I can't tell you if PHP can fix the file or not.

Can Macro Scheduler call a PHP file? Again I don't know the answer. Isn't PHP server side processing?

I am fairly certain Macro Scheduler can fix the file in a manner similar to what I've outlined.

User avatar
Marcus Tettmar
Site Admin
Posts: 7395
Joined: Thu Sep 19, 2002 3:00 pm
Location: Dorset, UK
Contact:

Re: Trim CRLF from a certain CSV column

Post by Marcus Tettmar » Mon May 12, 2014 6:04 am

Yes, you can call PHP, but as JRL says, this seems overkill just to strip out line breaks. Why PHP for this? Anyway, if PHP is on the same machine then just execute php.exe. If elsewhere use HTTPRequest or a browser. Also CRLFs shouldn't break CSV files anyway, assuming the CSV is properly formatted and the strings are quotes. A line break inside quotes is fine. But if the CSV does not quote strings then, sure, you'll have a problem. But that wouldn't be properly formatted CSV. A CSV file with unquoted strings with line breaks is going to confuse Excel, let alone anything else.....


Sent from my iPad using Tapatalk
Marcus Tettmar
http://mjtnet.com/blog/ | http://twitter.com/marcustettmar

Did you know we are now offering affordable monthly subscriptions for Macro Scheduler Standard?

Vilmondes
Newbie
Posts: 15
Joined: Sun May 11, 2014 3:37 pm

Re: Trim CRLF from a certain CSV column

Post by Vilmondes » Mon May 12, 2014 11:25 pm

I opened up the .csv file using Notepad++ and found that the culprit are LFs, however, if I remove all LFs using the below code, it will also combine the last item of a row with the first item of the next row because there are no comma separating them:

Code: Select all

ReadFile>vFile,vData
StringReplace>vData,LF,,vData
Separate>vData,comma,item
So if I have:
First,Secondo,Third
One,Two,Three

I'll get "First,Secondo,Third One,Two,Three" and that's not what I want.

I tried using ReadLn, but I got an error. Something like:
Line: (Line No) 2 not appropriate"

This was the code:

Code: Select all

Let>ln=1
While>line<>##EOF##
  ReadLn>vFile,ln,line
  If>line<>##EOF##
  Endif
  Let>ln=ln+1
EndWhile
Does it make sense?

Any suggestion to solve that?

User avatar
JRL
Automation Wizard
Posts: 3526
Joined: Mon Jan 10, 2005 6:22 pm
Location: Iowa

Re: Trim CRLF from a certain CSV column

Post by JRL » Tue May 13, 2014 2:13 am

What do you get if you ReadFile the Stringreplace LF+Comma That should leave the LFs at the end of rows alone since they're not followed by commas

Code: Select all

ReadFile>vFile,vData
StringReplace>vData,%LF%%Comma%,Comma,vData

Vilmondes
Newbie
Posts: 15
Joined: Sun May 11, 2014 3:37 pm

Re: Trim CRLF from a certain CSV column

Post by Vilmondes » Tue May 13, 2014 11:30 am

Thanks JRF, but that pattern does not exist =/

These LFs are in a column which contains user generated content, so the LFs can be anywhere within whose quotes. Sometimes I have one LF followed by another one.

I can't think of any RegEx to solve that because when MS reads the file, it becomes a single line making it harder to find patterns as there's only one beginning and one end.

Any other suggestion? It seems I'll need to go for a back-end language to deal with that =/. Or maybe I could solve it with VB?

User avatar
Marcus Tettmar
Site Admin
Posts: 7395
Joined: Thu Sep 19, 2002 3:00 pm
Location: Dorset, UK
Contact:

Re: Trim CRLF from a certain CSV column

Post by Marcus Tettmar » Tue May 13, 2014 12:09 pm

Hi,

I see now that CSVFileToArray has problems with these LFs inside quotes. I've made a note in our issue tracker to try and address that.

In the mean time Excel copes fine. I made a file with only LFs as end of line characters and also within some quoted strings. I've attached the file here. If you open it in Notepad++ and show end of line chars you'll see the LFs.
testcsv.csv
And if you open it in Excel it displays correctly. You'll see two rows of data. Make the rows taller to see all the data - cell 1,1 has two lines and cell 2,3 also has two lines.

Knowing this, and assuming you have Excel installed, we can use Excel to read in the data using XLOpen:

XLOpen>c:\file\test.csv,1,xlH

And then you can loop through the data using XLGetSheetDims, XLGetCell etc ... see XLGetCell example in help file, and see this article:

http://help.mjtnet.com/article/138-tran ... or-web-app

However, if you want the convenience of CSVFileToArray how about we use Excel just to remove these LFs for us to reformat and save it back to a cleaner CSV file. All we need to do is open it in Excel and save it again. That will replace the LFs at the end of each row with CRLFs. Now all we need to do is replace all occurrences of LF *on their own* with nothing.

Like this:

Code: Select all

//open the CSV file in Excel, then save it.  That's all!
XLOpen>c:\temp\testcsv.csv,1,xlH
XLSave>xlH,c:\temp\testcsv_fixed.csv
XLQuit>xlH

//now remove all occurences of LF on their own
ReadFile>c:\temp\testcsv_fixed.csv,fileData
RegEx>(?<!\r)\n,fileData,0,matches,nm,1, ,fileData

//delete file and save back
DeleteFile>c:\temp\testcsv_fixed.csv
Let>WLN_NOCRLF=1
WriteLn>c:\temp\testcsv_fixed.csv,wres,fileData

//now we should be ok to read the CSV to an array using CSVFileToArray
CSVFileToArray>c:\temp\testcsv_fixed.csv,arrayData
That opens the CSV in Excel, saves it to a new CSV file (which in turn causes the end of row markers to become CRLF pairs), then we read the file in, remove all LFs that do NOT appear before a CR, save the file back and then finally we are able to use CSVFileToArray.

However, if the file is large it might be best to work directly against Excel in the first place as suggested above using XLOpen, XLGetSheeDims and XLGetCell in a loop ....
Marcus Tettmar
http://mjtnet.com/blog/ | http://twitter.com/marcustettmar

Did you know we are now offering affordable monthly subscriptions for Macro Scheduler Standard?

Vilmondes
Newbie
Posts: 15
Joined: Sun May 11, 2014 3:37 pm

Re: Trim CRLF from a certain CSV column

Post by Vilmondes » Tue May 13, 2014 1:19 pm

Many thanks for looking into this, Marcus.

The .csv file is still broken when I open it with Excel 2013. Therefore, the Open+Save trick won't work for me (neither the XLGetSheetDims, XLGetCell etc).

Are you using another version of Excel?

User avatar
Marcus Tettmar
Site Admin
Posts: 7395
Joined: Thu Sep 19, 2002 3:00 pm
Location: Dorset, UK
Contact:

Re: Trim CRLF from a certain CSV column

Post by Marcus Tettmar » Tue May 13, 2014 1:21 pm

I'm using Excel 2010 here. Did you try with the file I uploaded?

Are you able to send me your CSV file? If it is confidential you can email it to support. Be interesting to see what's different about it and maybe if we look at it we can find a solution.
Marcus Tettmar
http://mjtnet.com/blog/ | http://twitter.com/marcustettmar

Did you know we are now offering affordable monthly subscriptions for Macro Scheduler Standard?

Post Reply
cron
Sign up to our newsletter for free automation tips, tricks & discounts