RegEx replace match [UNDOCUMENTED FEATURE?]

Ideas for new features & functions

Moderators: Dorian (MJT support), JRL

Post Reply
User avatar
Grovkillen
Automation Wizard
Posts: 1131
Joined: Fri Aug 10, 2012 2:38 pm
Location: Bräcke, Sweden
Contact:

RegEx replace match [UNDOCUMENTED FEATURE?]

Post by Grovkillen » Mon Jul 12, 2021 3:36 pm

I have a script which is searching through all my source code and compile the documentation based on a LaTeX syntax. In the source code my comments follow a syntax which makes it fairly straight forward to parse it. BUT, if I want to have a function that makes sure I haven't got any UN-escaped LaTeX code/characters I cannot do this with the commands I got at hand using MS. Take this example below:

Code: Select all

LabelToVar>test_string,test_string,1,1,*/
RegEx>(?<!\\)[&%$#\{\}~^],test_string,0,,,1,\<match>,test_string_2

/*
test_string:
this number: #1
this ampersand: &
this percent: %
*/
I want to escape the characters & % $ # { } ~ ^ by adding a back slash to them. But the replace part of the RegEx command will not let me do this. I can only replace the finding with a fixed string i.e.;

Code: Select all

this number: #1
this ampersand: &
this percent: %
will become:

Code: Select all

this number: \1
this ampersand: \
this percent: \
And the StringReplace command will not work either since it will not care if the character already have been escaped, i.e.:

Code: Select all

this number: \#1
this ampersand: &
this percent: \%
will become:

Code: Select all

this number: \\#1
this ampersand: \&
this percent: \\%
Last edited by Grovkillen on Tue Jul 13, 2021 3:57 pm, edited 1 time in total.
Let>ME=%Script%

Running: 15.0.27
version history

User avatar
Grovkillen
Automation Wizard
Posts: 1131
Joined: Fri Aug 10, 2012 2:38 pm
Location: Bräcke, Sweden
Contact:

Re: RegEx replace match

Post by Grovkillen » Mon Jul 12, 2021 5:27 pm

I see now that the replace part of regex does in fact allow for regex syntax:

Code: Select all

LabelToVar>test_string,test_string,1,1,*/
RegEx>(?<!\\)[&%$#\{\}~^],test_string,0,,,1,\test,test_string_2
RegEx>(?<!\\)[&%$#\{\}~^],test_string,0,,,1,\\test,test_string_3
RegEx>(?<!\\)[&%$#\{\}~^],test_string,0,,,1,\r,test_string_4
RegEx>(?<!\\)[&%$#\{\}~^],test_string,0,,,1,{$110},test_string_5
/*
test_string:
this number: #1
this ampersand: &
this percent: %
*/
test_string_2:

Code: Select all

this number: 	est1
this ampersand: 	est
this percent: 	est
test_string_3:

Code: Select all

this number: \test1
this ampersand: \test
this percent: \test
test_string_4:

Code: Select all

this number: 
1
this ampersand: 

this percent: 

test_string_5:

Code: Select all

this number: 2721
this ampersand: 272
this percent: 272
Maybe someone could shed some light here?
Let>ME=%Script%

Running: 15.0.27
version history

User avatar
Grovkillen
Automation Wizard
Posts: 1131
Joined: Fri Aug 10, 2012 2:38 pm
Location: Bräcke, Sweden
Contact:

Re: RegEx replace match

Post by Grovkillen » Tue Jul 13, 2021 3:55 pm

I have now read up on the regex syntax and my previous pattern didn't include a capturing group.
I.e. this
(?<!\\)[&%$#\{\}~^] needed to be this
(?<!\\)([&%$#\{\}~^])

Code: Select all

LabelToVar>test_string,test_string,1,1,*/
RegEx>(?<!\\)([&%$#\{\}~^]),test_string,0,,,1,\\$1,test_string_2
MDL>test_string_2
/*
test_string:
this number: #1
this ampersand: &
this percent: %
*/
It's working!

Code: Select all

this number: \#1
this ampersand: \&
this percent: \%
This is a really powerful feature which isn't documented as far as I can tell.
Let>ME=%Script%

Running: 15.0.27
version history

User avatar
Grovkillen
Automation Wizard
Posts: 1131
Joined: Fri Aug 10, 2012 2:38 pm
Location: Bräcke, Sweden
Contact:

Re: RegEx replace match [UNDOCUMENTED FEATURE?]

Post by Grovkillen » Tue Jul 13, 2021 4:05 pm

As an example of a really powerful way of adding strings INBETWEEN two capturing groups:

Code: Select all

LabelToVar>text_string,text_string,1,1,*/
RegEx>(Grovkillen|Allan)(!),text_string,0,,,1,$1..ABC123..$2,text_string_2
MDL>text_string_2
/*
text_string:
Are you Allan?
Allan!? My name is Grovkillen!
Aha, Grovkillen... got ya!
*/
Output:

Code: Select all

Are you Allan?
Allan..ABC123..!? My name is Grovkillen..ABC123..!
Aha, Grovkillen... got ya!
Let>ME=%Script%

Running: 15.0.27
version history

User avatar
Grovkillen
Automation Wizard
Posts: 1131
Joined: Fri Aug 10, 2012 2:38 pm
Location: Bräcke, Sweden
Contact:

Re: RegEx replace match [UNDOCUMENTED FEATURE?]

Post by Grovkillen » Wed May 10, 2023 12:57 pm

The day has come where I found the use of my discovery :)

Let's say I have this html file:

Code: Select all

<HTML><HEAD>
	<META HTTP-EQUIV="Content-Type" CONTENT="text/html; CHARSET=windows-1252">
	<TITLE>How to Write a Great Help File</TITLE>
	<OBJECT TYPE="application/x-oleobject" CLASSID="clsid:1e2a7bd0-dab9-11d0-b93a-00c04fc99f9e">
		<PARAM NAME="Keyword" VALUE="how to">
	</OBJECT>
	<META NAME="AUTHOR" CONTENT="Copyright © 1996-2003 Jan Goyvaerts">
	<META NAME="GENERATOR" CONTENT="HelpScribble 7.9.0">
	<STYLE> span { display: inline-block; }</STYLE>
</HEAD>
<BODY BGCOLOR="#FFFFFF" TEXT="#000000" LINK="#0000FF" VLINK="#800080" ALINK="#FF0000">
<P STYLE="margin-top:0;margin-bottom:0;"><FONT FACE="Arial" COLOR="#0000FF" SIZE="5"><B>How to Write a Great Help File</B></FONT></P>
<P STYLE="margin-top:0;margin-bottom:0;"><FONT FACE="Arial" COLOR="#800000" SIZE="2"><HR></FONT></P>
<P STYLE="margin-top:0;margin-bottom:0;"><FONT FACE="Arial" SIZE="2">&nbsp;</FONT></P>
<P STYLE="margin-top:0;margin-bottom:0;"><FONT FACE="Arial" SIZE="2">A lot of software these days ships with help files of very questionable quality.&nbsp; Even help files included with expensive software from big companies often leave a lot to be desired.</FONT></P>
<P STYLE="margin-top:0;margin-bottom:0;"><FONT FACE="Arial" SIZE="2">&nbsp;</FONT></P>
<P STYLE="margin-top:0;margin-bottom:0;"><FONT FACE="Arial" SIZE="2">Just what is a great help file?&nbsp; It's simple: a great help file is a help file that provides the user with exactly that information which the user needs at any time while requiring a minimum amount of effort to locate that information.</FONT></P>
<P STYLE="margin-top:0;margin-bottom:0;"><FONT FACE="Arial" SIZE="2">&nbsp;</FONT></P>
<P STYLE="margin-top:0;margin-bottom:0;"><FONT FACE="Arial" SIZE="2">An obvious requirement that follows from our definition of a great help file is that every aspect of the software must be documented, because you cannot know in advance which aspect of the software a particular user may want help with.&nbsp; However, unless your software is very limited in functionality, documenting every little detail will make it more difficult for novice users to see the forest through the trees.</FONT></P>
<P STYLE="margin-top:0;margin-bottom:0;"><FONT FACE="Arial" SIZE="2">&nbsp;</FONT></P>
<P STYLE="margin-top:0;margin-bottom:0;"><FONT FACE="Arial" SIZE="2">Therefore, a great help file consists of two main sections.&nbsp; The first section is a "how to" section that explains the basic tasks of the software.&nbsp; It focuses on how to get something done with the software,&nbsp; rather than trying to explain everything.&nbsp; Information is presented in a logical order with respect to the user's work flow, possibly using step by step instructions.&nbsp; The second section is the reference section that explains each and every detail of the software.&nbsp; It focuses on explaining what each part of the software can be used for.</FONT></P>
<P STYLE="margin-top:0;margin-bottom:0;"><FONT FACE="Arial" SIZE="3">&nbsp;</FONT></P>
<P STYLE="margin-top:0;margin-bottom:0;"><FONT FACE="Arial" SIZE="2">People new to the software will mostly read the "how to" section, while expert users will look up the information they want in the reference section.&nbsp; However, most people using the software will actually be neither beginner nor expert, but something in between.&nbsp; Therefore, both sections should </FONT><FONT FACE="Arial" SIZE="2"><A HREF="hs1000.htm#howtolink">link</A></FONT><FONT FACE="Arial" SIZE="2"> to each other.&nbsp; Whenever the "how to" section mentions a feature, that mention should be a link to the reference section.&nbsp; As the user gains more experience, she can easily find information in the reference section by using the "how to" topic she is already familiar with as a starting point.&nbsp; In the reference section, topics explaining the details of a certain part of the software should link to the relevant "how to" topics.&nbsp; Somebody new to the software may wonder what a certain command is for, and arrive in the reference section via the </FONT><FONT FACE="Arial" SIZE="2"><A HREF="hs1100.htm">index</A></FONT><FONT FACE="Arial" SIZE="2">, while a basic "how to" text would be more interesting for that person at that time.</FONT></P>
<P STYLE="margin-top:0;margin-bottom:0;"><FONT FACE="Arial" SIZE="1">&nbsp;</FONT></P>
<P STYLE="margin-top:0;margin-bottom:0;"><FONT FACE="Arial" SIZE="2">Speaking of links: a great help file can never have enough links.&nbsp; Whenever you mention a feature or concept that is explained elsewhere, make that mention a link so the user can easily get more information.&nbsp; Remember that help files are not read cover to cover.&nbsp; They don't have covers.&nbsp; Rather, people will read a few </FONT><FONT FACE="Arial" SIZE="2"><A HREF="hs1000.htm">topics</A></FONT><FONT FACE="Arial" SIZE="2"> when they get stuck with the software.&nbsp; So you cannot assume anything about what the customer already knows and what not.</FONT></P>
<P STYLE="margin-top:0;margin-bottom:0;"><FONT FACE="Arial" SIZE="2">&nbsp;</FONT></P>
<P STYLE="margin-top:0;margin-bottom:0;"><FONT FACE="Arial" SIZE="2">Another important and related fact is that most readers will arrive at a certain topic "out of nowhere".&nbsp; Maybe they found the topic through the </FONT><FONT FACE="Arial" SIZE="2"><A HREF="hs1100.htm">index</A></FONT><FONT FACE="Arial" SIZE="2"> or they clicked on a link without really knowing what to expect.&nbsp; Therefore, every topic must establish a context.&nbsp; If a topic explains a certain menu item, like </FONT><FONT FACE="Arial" SIZE="2">Project|Save</FONT><FONT FACE="Arial" SIZE="2">, that topic must first mention that it is talking about a menu item, and where that menu can be found.&nbsp; If I look up "save project" in the index, and I get a topic that only says "click this button to save your project", then I have learned absolutely nothing.&nbsp; I want to know <I>where</I> that button is.&nbsp; It seems that a lot of technical writers did not fully grasp this yet.&nbsp; A help file is not a book.&nbsp; I cannot flip back a few pages to establish the context.&nbsp; In WinHelp, browse sequences can simulate "page flipping".&nbsp; But a link to the </FONT><FONT FACE="Arial" SIZE="2">Project menu</FONT><FONT FACE="Arial" SIZE="2"> from the </FONT><FONT FACE="Arial" SIZE="2">Save</FONT><FONT FACE="Arial" SIZE="2"> topic is just so much easier than clicking the </FONT><FONT FACE="Courier" SIZE="2">&lt;&lt;&lt;</FONT><FONT FACE="Arial" SIZE="2"> button three times and then guessing I may find the Save item in the Project menu.</FONT></P>
<P STYLE="margin-top:0;margin-bottom:0;"><FONT FACE="Arial" SIZE="2">&nbsp;</FONT></P>
<P STYLE="margin-top:0;margin-bottom:0;"><FONT FACE="Arial" SIZE="2">I have already mentioned the use of the index several times.&nbsp; A great help file must have a comprehensive index.&nbsp; Make sure that each topic is included in the index, with every synonym you can think of, an with every possible permutation of multi-word concepts.&nbsp; See </FONT><FONT FACE="Arial" SIZE="2"><A HREF="hs1100.htm">How to Add an Index</A></FONT><FONT FACE="Arial" SIZE="2"> for more information.</FONT></P>
<P STYLE="margin-top:0;margin-bottom:0;"><FONT FACE="Arial" COLOR="#008000" SIZE="2">&nbsp;</FONT></P>
<P STYLE="margin-top:0;margin-bottom:0;"><FONT FACE="Arial" SIZE="2"><A HREF="hs1300.htm">Screen shots</A></FONT><FONT FACE="Arial" SIZE="2"> are also an important part of a help file.&nbsp; Usually the software will be hidden behind the help window.&nbsp; Looking at the screen shot in the help file while reading the help text is easier than task-switching back and forth between the application and the help file.&nbsp; Seeing a screen shot that is identical to the actual software also assures the user that the help file is accurate.&nbsp; So when you add screen shots, it is very important to update them as the software gets updated.&nbsp; When a screen shot differs from the actual software, the user will not trust the help text.</FONT></P>
<P STYLE="margin-top:0;margin-bottom:0;"><FONT FACE="Arial" SIZE="2">&nbsp;</FONT></P>
<P STYLE="margin-top:0;margin-bottom:0;"><FONT FACE="Arial" SIZE="2">If your help file is distributed with a free evaluation version of the software, the quality of your help file will definitely influence people's decision whether to buy this software or a competing product.&nbsp; Though many people will hardly use your help file at all, most will take a look at it to check if solid documentation is available, should they ever get stuck with the software.&nbsp; It is also far easier to verify the quality of a help file than the quality of an application.&nbsp; Anybody can read a help file.&nbsp; Compare this to taking an airplane.&nbsp; Most people cannot even estimate the mechanical reliability of an airplane, and the skills of its pilots.&nbsp; But old coffee stains on the seats and overdue paint jobs, while they have no effect on the airworthiness of the plane, are easy to spot and will determine what people think about the plane and the airline.</FONT></P>
<P STYLE="margin-top:0;margin-bottom:0;"><FONT SIZE="2" FACE="Arial">&nbsp;</FONT></P>
<P STYLE="margin-top:0;margin-bottom:0;"><FONT FACE="Arial" SIZE="2">So be sure to write a great help file.&nbsp; HelpScribble takes care of all the technical details so you can focus on providing solid content.
Where I want to clean up the FONT element based on its inner attribute SIZE. I want to don't give a damn about what ever font being used or color. I just want to replace that whole element with new attributes, except the numerical "size". Look at this (not a full working script just to keep it clean, but you need to load the html code found above into a string and do the RegEx command with replace):

Code: Select all

Let>CHM_FONT=Segoe UI
Let>CHM_BASE_COLOR=#BFBC00

Let>REGEX_PATTERN=<FONT[^>]*?SIZE="(5|6)[^>]*?>(.*?/P>)
Let>REGEX_REPLACE_PATTERN=<FONT FACE="%CHM_FONT%" COLOR="%CHM_BASE_COLOR%" SIZE="$1">$2%CRLF%<P><HR></P>
RegEx>REGEX_PATTERN,HTML_CODE,0,,,1,REGEX_REPLACE_PATTERN,HTML_CODE
//this <FONT FACE="Arial" COLOR="#0000FF" SIZE="5"><B>How to Write a Great Help File</B></FONT></P> becomes:
//this <FONT FACE="Segoe UI" COLOR="#BFBC00" SIZE="5"><B>How to Write a Great Help File</B></FONT></P>
//     <P><HR></P>

Let>REGEX_PATTERN=<FONT[^>]*?SIZE="(1|2|3|4)".*?>
Let>REGEX_REPLACE_PATTERN=<FONT FACE="%CHM_FONT%"SIZE="$1">
RegEx>REGEX_PATTERN,HTML_CODE,0,,,1,REGEX_REPLACE_PATTERN,HTML_CODE
//this <FONT FACE="Arial" COLOR="#008000" SIZE="2"> becomes:
//this <FONT FACE="Segoe UI" SIZE="2">
//and
//this <FONT FACE="Arial" SIZE="1"> becomes:
//this <FONT FACE="Segoe UI" SIZE="1">
All in one go, no need for extra parsing and search and replace. The update of the code is done on the spot. Sweeet!
Let>ME=%Script%

Running: 15.0.27
version history

User avatar
Grovkillen
Automation Wizard
Posts: 1131
Joined: Fri Aug 10, 2012 2:38 pm
Location: Bräcke, Sweden
Contact:

Re: RegEx replace match [UNDOCUMENTED FEATURE?]

Post by Grovkillen » Mon May 15, 2023 3:32 pm

Would be great if we could get the capturing groups back just like the "global match". I.e.

Code: Select all

  RegEx>REGEX_PATTERN,TEXT_TO_PARSE,0,MATCHES,,0,,
  //MATCHES_1="full match like today"
  //MATCHES_1_1="contents of capturing group 1 ($1)"
  //MATCHES_1_2="contents of capturing group 2 ($2)"
  //...
  //MATCHES_1_N="contents of capturing group N ($N)"
Let>ME=%Script%

Running: 15.0.27
version history

User avatar
Grovkillen
Automation Wizard
Posts: 1131
Joined: Fri Aug 10, 2012 2:38 pm
Location: Bräcke, Sweden
Contact:

Re: RegEx replace match [UNDOCUMENTED FEATURE?]

Post by Grovkillen » Tue Aug 01, 2023 9:56 am

I found a use case for this today. The UUID wasn't given to me with the dashes. This simple trick fixed it for me.

Code: Select all

Let>STRING_raw=12345678abcd1234abcd12345678
Let>REGEX_PATTERN=(.{8})(.{4})(.{4})(.{4})(.*)
RegEx>REGEX_PATTERN,STRING_raw,0,,,1,$1-$2-$3-$4-$5,STRING_fixed
MDL>STRING_fixed
Let>ME=%Script%

Running: 15.0.27
version history

User avatar
Grovkillen
Automation Wizard
Posts: 1131
Joined: Fri Aug 10, 2012 2:38 pm
Location: Bräcke, Sweden
Contact:

Re: RegEx replace match [UNDOCUMENTED FEATURE?]

Post by Grovkillen » Fri Mar 29, 2024 7:54 pm

Grovkillen wrote:
Mon May 15, 2023 3:32 pm
Would be great if we could get the capturing groups back just like the "global match". I.e.

Code: Select all

  RegEx>REGEX_PATTERN,TEXT_TO_PARSE,0,MATCHES,,0,,
  //MATCHES_1="full match like today"
  //MATCHES_1_1="contents of capturing group 1 ($1)"
  //MATCHES_1_2="contents of capturing group 2 ($2)"
  //...
  //MATCHES_1_N="contents of capturing group N ($N)"
If this could squeeze into the next (v16?) release it would be a great addition.
Let>ME=%Script%

Running: 15.0.27
version history

robsmith
Newbie
Posts: 17
Joined: Mon Oct 30, 2023 2:12 pm
Location: USA

Re: RegEx replace match [UNDOCUMENTED FEATURE?]

Post by robsmith » Thu Apr 25, 2024 9:59 pm

What do you mean by 'global match'? Do you mean the number of groups captured? Do you mean have each capture group's content stored in an array?

I think I originally used the perl documentation to figure out the syntax for referencing capture groups ( \1 \2 etc) https://perldoc.perl.org/perlre#Capture-groups

It would be nice to have an example in the documentation that explains how to use capture groups, I'm not sure if it's accurate to call it an undocumented feature though.

The information that RegEx command is based on PCRE library for perl 5.10 is enough to use a online regex builder tool, or to find the appropriate documentation.

It is very interesting to see you have edited LaTeX this way, very cool use case!

User avatar
Grovkillen
Automation Wizard
Posts: 1131
Joined: Fri Aug 10, 2012 2:38 pm
Location: Bräcke, Sweden
Contact:

Re: RegEx replace match [UNDOCUMENTED FEATURE?]

Post by Grovkillen » Fri Apr 26, 2024 5:00 am

Global as in full match of the RegEx. It not being documented as in showing some examples with capturing groups.

The feature I'm asking for is to have the capturing groups put into an array (optional setting most likely).
Let>ME=%Script%

Running: 15.0.27
version history

Post Reply
Sign up to our newsletter for free automation tips, tricks & discounts