Trim CRLF from a certain CSV column
Moderators: Dorian (MJT support), JRL
Trim CRLF from a certain CSV column
Hello,
I guess the best way to trim CRLF from a given CSV column would to use SQL statements, however, as per the http://www.mjtnet.com/blog/2009/10/05/m ... csv-files/ article, it's not possible to UPDATE (not sure whether it's already possible).
So the question is: what's the best way to do that? I need to do it before looping through the CSV file using CSVFileToArray.
Thanks,
Vilmondes
I guess the best way to trim CRLF from a given CSV column would to use SQL statements, however, as per the http://www.mjtnet.com/blog/2009/10/05/m ... csv-files/ article, it's not possible to UPDATE (not sure whether it's already possible).
So the question is: what's the best way to do that? I need to do it before looping through the CSV file using CSVFileToArray.
Thanks,
Vilmondes
Re: Trim CRLF from a certain CSV column
Hi Vilmondes,
Sorry but I'm confused. What do you want to do?
A CSV file is a text file so a crlf will create a new line in the file. Any column containing crlf's will therefore automatically be the last column. For a CSV file to function properly the last column needs to end with crlf. CSVFileToArray will automatically remove the crlf from the value in the last column.
So... what is it you're trying to accomplish?
Sorry but I'm confused. What do you want to do?
A CSV file is a text file so a crlf will create a new line in the file. Any column containing crlf's will therefore automatically be the last column. For a CSV file to function properly the last column needs to end with crlf. CSVFileToArray will automatically remove the crlf from the value in the last column.
So... what is it you're trying to accomplish?
Re: Trim CRLF from a certain CSV column
Hi JRL,
Thanks for replying.
The thing is that one of the columns (which should not be the last column) contains CRLF or something like that which is making it to be the last column, so the row is being broken into 3 rows, i.e: the full row should have 30 columns, but now I have 3 rows, one with 12, another one with 10 and another with 8.
I need to somehow remove the CRLF from this column which is in the middle of the row, so that it doesn't become the last column.
It happens because this is data inserted by users, so they sometimes generate carriage returns.
Does it make sense?
Thanks,
Vilmondes
Thanks for replying.
The thing is that one of the columns (which should not be the last column) contains CRLF or something like that which is making it to be the last column, so the row is being broken into 3 rows, i.e: the full row should have 30 columns, but now I have 3 rows, one with 12, another one with 10 and another with 8.
I need to somehow remove the CRLF from this column which is in the middle of the row, so that it doesn't become the last column.
It happens because this is data inserted by users, so they sometimes generate carriage returns.
Does it make sense?
Thanks,
Vilmondes
Re: Trim CRLF from a certain CSV column
SQL isn't going to fix this because the format of the CSV file is destroyed. The best thing would be to fix the csv input so it doesn't allow the crlf to be added. If that can't be done then my best guess would be you'll need to reconstruct the files one line at a time removing the crlf's that are not wanted. Assuming there are supposed to be 30 columns you should be able to rebuild the file by doing a ReadFile> then do a StringReplace> on the entire file removing all crlf's. After that Separate> the entire file by "comma". Then rewrite each line using groups of 30 from the Separate>. Something like this: The code is untested so step through it to make sure.
Code: Select all
Let>vFile=C:\CSFFile.CSV
Let>vNewFile=C:\newCSFFile.CSV
ReadFile>vFile,vData
StringReplace>vData,crlf,,vData
Separate>vData,comma,item
Let>Counter=0
Let>kk=0
Repeat>Counter
Add>Counter,30
Let>NewLine=
Repeat>kk
Add>kk,1
Let>value=Item_%kk%
If>kk=Counter
ConCat>%NewLine%,%Value%
WriteLn>vNewFile,wres,newline
Else
ConCat>%NewLine%,%Value%%comma%
EndIf
Until>kk=Counter
Until>Counter=Item_Count
Re: Trim CRLF from a certain CSV column
Humm, I got it. I had already contacted the tool's tech support hoping that they can fix that.
In the meantime I'll test this code you suggested and let you know the result.
Thanks a lot for your help =]
In the meantime I'll test this code you suggested and let you know the result.
Thanks a lot for your help =]
Re: Trim CRLF from a certain CSV column
Ah, one more question, please:
Do you think it would be possible to call a PHP file that would fix the csv and then Macro Scheduler goes through the rest of the script? Just wanted to know if it's possible to call another script and then get back to the .scp.
So the steps would be:
- Macro Scheduler downloads the .csv;
- A .php file is called which would fix the .csv;
- Then MacroScheduler carries on doing what it has to do.
Do you think it would be possible to call a PHP file that would fix the csv and then Macro Scheduler goes through the rest of the script? Just wanted to know if it's possible to call another script and then get back to the .scp.
So the steps would be:
- Macro Scheduler downloads the .csv;
- A .php file is called which would fix the .csv;
- Then MacroScheduler carries on doing what it has to do.
Re: Trim CRLF from a certain CSV column
I don't know PHP so I can't tell you if PHP can fix the file or not.
Can Macro Scheduler call a PHP file? Again I don't know the answer. Isn't PHP server side processing?
I am fairly certain Macro Scheduler can fix the file in a manner similar to what I've outlined.
Can Macro Scheduler call a PHP file? Again I don't know the answer. Isn't PHP server side processing?
I am fairly certain Macro Scheduler can fix the file in a manner similar to what I've outlined.
- Marcus Tettmar
- Site Admin
- Posts: 7395
- Joined: Thu Sep 19, 2002 3:00 pm
- Location: Dorset, UK
- Contact:
Re: Trim CRLF from a certain CSV column
Yes, you can call PHP, but as JRL says, this seems overkill just to strip out line breaks. Why PHP for this? Anyway, if PHP is on the same machine then just execute php.exe. If elsewhere use HTTPRequest or a browser. Also CRLFs shouldn't break CSV files anyway, assuming the CSV is properly formatted and the strings are quotes. A line break inside quotes is fine. But if the CSV does not quote strings then, sure, you'll have a problem. But that wouldn't be properly formatted CSV. A CSV file with unquoted strings with line breaks is going to confuse Excel, let alone anything else.....
Sent from my iPad using Tapatalk
Sent from my iPad using Tapatalk
Marcus Tettmar
http://mjtnet.com/blog/ | http://twitter.com/marcustettmar
Did you know we are now offering affordable monthly subscriptions for Macro Scheduler Standard?
http://mjtnet.com/blog/ | http://twitter.com/marcustettmar
Did you know we are now offering affordable monthly subscriptions for Macro Scheduler Standard?
Re: Trim CRLF from a certain CSV column
I opened up the .csv file using Notepad++ and found that the culprit are LFs, however, if I remove all LFs using the below code, it will also combine the last item of a row with the first item of the next row because there are no comma separating them:
So if I have:
First,Secondo,Third
One,Two,Three
I'll get "First,Secondo,Third One,Two,Three" and that's not what I want.
I tried using ReadLn, but I got an error. Something like:
Line: (Line No) 2 not appropriate"
This was the code:
Does it make sense?
Any suggestion to solve that?
Code: Select all
ReadFile>vFile,vData
StringReplace>vData,LF,,vData
Separate>vData,comma,item
First,Secondo,Third
One,Two,Three
I'll get "First,Secondo,Third One,Two,Three" and that's not what I want.
I tried using ReadLn, but I got an error. Something like:
Line: (Line No) 2 not appropriate"
This was the code:
Code: Select all
Let>ln=1
While>line<>##EOF##
ReadLn>vFile,ln,line
If>line<>##EOF##
Endif
Let>ln=ln+1
EndWhile
Any suggestion to solve that?
Re: Trim CRLF from a certain CSV column
What do you get if you ReadFile the Stringreplace LF+Comma That should leave the LFs at the end of rows alone since they're not followed by commas
Code: Select all
ReadFile>vFile,vData
StringReplace>vData,%LF%%Comma%,Comma,vData
Re: Trim CRLF from a certain CSV column
Thanks JRF, but that pattern does not exist =/
These LFs are in a column which contains user generated content, so the LFs can be anywhere within whose quotes. Sometimes I have one LF followed by another one.
I can't think of any RegEx to solve that because when MS reads the file, it becomes a single line making it harder to find patterns as there's only one beginning and one end.
Any other suggestion? It seems I'll need to go for a back-end language to deal with that =/. Or maybe I could solve it with VB?
These LFs are in a column which contains user generated content, so the LFs can be anywhere within whose quotes. Sometimes I have one LF followed by another one.
I can't think of any RegEx to solve that because when MS reads the file, it becomes a single line making it harder to find patterns as there's only one beginning and one end.
Any other suggestion? It seems I'll need to go for a back-end language to deal with that =/. Or maybe I could solve it with VB?
- Marcus Tettmar
- Site Admin
- Posts: 7395
- Joined: Thu Sep 19, 2002 3:00 pm
- Location: Dorset, UK
- Contact:
Re: Trim CRLF from a certain CSV column
Hi,
I see now that CSVFileToArray has problems with these LFs inside quotes. I've made a note in our issue tracker to try and address that.
In the mean time Excel copes fine. I made a file with only LFs as end of line characters and also within some quoted strings. I've attached the file here. If you open it in Notepad++ and show end of line chars you'll see the LFs.
And if you open it in Excel it displays correctly. You'll see two rows of data. Make the rows taller to see all the data - cell 1,1 has two lines and cell 2,3 also has two lines.
Knowing this, and assuming you have Excel installed, we can use Excel to read in the data using XLOpen:
XLOpen>c:\file\test.csv,1,xlH
And then you can loop through the data using XLGetSheetDims, XLGetCell etc ... see XLGetCell example in help file, and see this article:
http://help.mjtnet.com/article/138-tran ... or-web-app
However, if you want the convenience of CSVFileToArray how about we use Excel just to remove these LFs for us to reformat and save it back to a cleaner CSV file. All we need to do is open it in Excel and save it again. That will replace the LFs at the end of each row with CRLFs. Now all we need to do is replace all occurrences of LF *on their own* with nothing.
Like this:
That opens the CSV in Excel, saves it to a new CSV file (which in turn causes the end of row markers to become CRLF pairs), then we read the file in, remove all LFs that do NOT appear before a CR, save the file back and then finally we are able to use CSVFileToArray.
However, if the file is large it might be best to work directly against Excel in the first place as suggested above using XLOpen, XLGetSheeDims and XLGetCell in a loop ....
I see now that CSVFileToArray has problems with these LFs inside quotes. I've made a note in our issue tracker to try and address that.
In the mean time Excel copes fine. I made a file with only LFs as end of line characters and also within some quoted strings. I've attached the file here. If you open it in Notepad++ and show end of line chars you'll see the LFs.
And if you open it in Excel it displays correctly. You'll see two rows of data. Make the rows taller to see all the data - cell 1,1 has two lines and cell 2,3 also has two lines.
Knowing this, and assuming you have Excel installed, we can use Excel to read in the data using XLOpen:
XLOpen>c:\file\test.csv,1,xlH
And then you can loop through the data using XLGetSheetDims, XLGetCell etc ... see XLGetCell example in help file, and see this article:
http://help.mjtnet.com/article/138-tran ... or-web-app
However, if you want the convenience of CSVFileToArray how about we use Excel just to remove these LFs for us to reformat and save it back to a cleaner CSV file. All we need to do is open it in Excel and save it again. That will replace the LFs at the end of each row with CRLFs. Now all we need to do is replace all occurrences of LF *on their own* with nothing.
Like this:
Code: Select all
//open the CSV file in Excel, then save it. That's all!
XLOpen>c:\temp\testcsv.csv,1,xlH
XLSave>xlH,c:\temp\testcsv_fixed.csv
XLQuit>xlH
//now remove all occurences of LF on their own
ReadFile>c:\temp\testcsv_fixed.csv,fileData
RegEx>(?<!\r)\n,fileData,0,matches,nm,1, ,fileData
//delete file and save back
DeleteFile>c:\temp\testcsv_fixed.csv
Let>WLN_NOCRLF=1
WriteLn>c:\temp\testcsv_fixed.csv,wres,fileData
//now we should be ok to read the CSV to an array using CSVFileToArray
CSVFileToArray>c:\temp\testcsv_fixed.csv,arrayData
However, if the file is large it might be best to work directly against Excel in the first place as suggested above using XLOpen, XLGetSheeDims and XLGetCell in a loop ....
Marcus Tettmar
http://mjtnet.com/blog/ | http://twitter.com/marcustettmar
Did you know we are now offering affordable monthly subscriptions for Macro Scheduler Standard?
http://mjtnet.com/blog/ | http://twitter.com/marcustettmar
Did you know we are now offering affordable monthly subscriptions for Macro Scheduler Standard?
Re: Trim CRLF from a certain CSV column
Many thanks for looking into this, Marcus.
The .csv file is still broken when I open it with Excel 2013. Therefore, the Open+Save trick won't work for me (neither the XLGetSheetDims, XLGetCell etc).
Are you using another version of Excel?
The .csv file is still broken when I open it with Excel 2013. Therefore, the Open+Save trick won't work for me (neither the XLGetSheetDims, XLGetCell etc).
Are you using another version of Excel?
- Marcus Tettmar
- Site Admin
- Posts: 7395
- Joined: Thu Sep 19, 2002 3:00 pm
- Location: Dorset, UK
- Contact:
Re: Trim CRLF from a certain CSV column
I'm using Excel 2010 here. Did you try with the file I uploaded?
Are you able to send me your CSV file? If it is confidential you can email it to support. Be interesting to see what's different about it and maybe if we look at it we can find a solution.
Are you able to send me your CSV file? If it is confidential you can email it to support. Be interesting to see what's different about it and maybe if we look at it we can find a solution.
Marcus Tettmar
http://mjtnet.com/blog/ | http://twitter.com/marcustettmar
Did you know we are now offering affordable monthly subscriptions for Macro Scheduler Standard?
http://mjtnet.com/blog/ | http://twitter.com/marcustettmar
Did you know we are now offering affordable monthly subscriptions for Macro Scheduler Standard?