Remove blank lines when there are more than one of them

Technical support and scripting issues

Moderators: Dorian (MJT support), JRL

Post Reply
ari
Junior Coder
Posts: 23
Joined: Tue Jul 15, 2014 4:12 pm

Remove blank lines when there are more than one of them

Post by ari » Tue Jan 21, 2020 10:37 pm

Hello,
I would like a way to remove multiple blank lines between text in a way so that only one blank line is left. The number of blank lines is irregular, might be 1 sometimes and no change is needed, might be several and I need to reduce it to 1. I have found many ways to remove all blank lines, but not astute enough to figure out how to spare 1.

For example, turn this:

Paragraph 1
blank line 1
blank line 2
blank line 3
Paragraph 2

into this:

Paragraph 1
blank line 1
Paragraph 2

without affecting this:

Paragraph 3
blank line 1
Paragraph 4

Thanks for any advice! MS saves my life.

DreamTheater
Newbie
Posts: 19
Joined: Mon Oct 14, 2019 6:23 am

Re: Remove blank lines when there are more than one of them

Post by DreamTheater » Wed Jan 22, 2020 3:47 am

This might not be the best way, but should do what you're after:

Code: Select all

//Variable to remove extra lines from
Let>TEST=Paragraph1%CRLF%%CRLF%%CRLF%%CRLF%Paragraph2%CRLF%Paragraph3.

//Separate using CRLF
Separate>TEST,%CRLF%,TEST_ARR

//Result will be the new variable without extra breaks - concat the first break
Let>Result=TEST_ARR_1
ConCat>Result,%CRLF%

//Loop to go through the array and concat everything together without extra breaks (x=1 instead of 0 as we've already set the first row)
Let>x=1
Repeat>x
    //increase the loop
    Add>x,1
    
    //If the current array value is blank (%CRLF%) then no action, don't concat this row
    If>TEST_ARR_%x%={""}
        Goto>Endx
    ELSE
        //If there is some sort of value, then concat that row and add a break afterwards
        ConCat>Result,TEST_ARR_%x%
        ConCat>Result,%CRLF%
    EndIf>

    Label>Endx
Until>x=Test_ARR_COUNT

//Check that it worked
MDL>Result

hagchr
Automation Wizard
Posts: 327
Joined: Mon Jul 05, 2010 7:53 am
Location: Stockholm, Sweden

Re: Remove blank lines when there are more than one of them

Post by hagchr » Wed Jan 22, 2020 9:48 am

Hi, just an example how to solve it using Regex>

You treat the text as one line and look for two or more consecutive end-of-line-characters (\R in regex language). If found replace with just two of them - %CRLF%.

Code: Select all

LabelToVar>Text,strText

Let>tmp0=(?s)(\R){2,}
RegEx>tmp0,strText,0,m,nm,1,%CRLF%%CRLF%,strRes

MDL>strText
MDL>strRes

/*
Text:
Paragraph 1


Paragraph 2
Paragraph 3




Paragraph4

Paragraph5

Paragraph6
*/

ari
Junior Coder
Posts: 23
Joined: Tue Jul 15, 2014 4:12 pm

Re: Remove blank lines when there are more than one of them

Post by ari » Thu Jan 23, 2020 4:59 am

Thanks, these did the trick. I have started learning Regex and was wondering which exact version that MS uses? The syntax seems to vary across different tutorials.

User avatar
Dorian (MJT support)
Automation Wizard
Posts: 1348
Joined: Sun Nov 03, 2002 3:19 am
Contact:

Re: Remove blank lines when there are more than one of them

Post by Dorian (MJT support) » Thu Jan 23, 2020 10:34 am

Hi Ari,

What Syntax does Regex use?
Macro Scheduler uses the PCRE syntax.
Yes, we have a Custom Scripting Service. Message me or go here

ari
Junior Coder
Posts: 23
Joined: Tue Jul 15, 2014 4:12 pm

Re: Remove blank lines when there are more than one of them

Post by ari » Fri Jan 24, 2020 4:16 am

One last semi-related question,
I am having some serious problems with simple regexes working in RegexBuddy and with various online regex checkers, yet they fail or give different results in MS.

For example, if my text is the following and I just wish to match the blocks of text starting with POSITIVE, ^POS.*$ just matches all of the text in MS, whereas Regexbuddy just matches the lines I want (e.g. everything from POSITIVE to Willow, POSITIVE to Fumigatus, excludes the words POLLENS, MOLDS, NEGATIVE...Alternaria.

The text example:

POLLENS

POSITIVE to: Ragweed, Burweed Marsh Elder, Cocklebur, Golden Rod, Kochia, Lambs Quarter, Mugwort, Nettle, Pigweed, Plantain, Russian Thistle, Sheep Sorrel, Yellow Dock, Brome Grass, Grass, Johnson Grass, June Grass, Meadow Fescue, Red Top, Orchard Grass, Rye Grass, Sweet Vernal, Ash, Timothy Grass, Birch, Black Walnut (pollen), Cottonwood, Elm, Hickory, Mulberry, Maple, Poplar, Oak, Privet, Red Cedar, Sycamore, White Pine, Other, Willow


MOLDS

POSITIVE to: Stemphylium, Aureobasidium, Bipolaris, Gibberella, Epicoccum, Sarocladium, Penicillium, Mucor, Cladosporium , Botrytis, Alternaria, Aspergillus Fumigatus

NEGATIVE to: Stemphylium, Aureobasidium, Bipolaris, Gibberella, Sarocladium, Penicillium, Epicoccum, Aspergillus Fumigatus, Mucor, Cladosporium , Botrytis, Alternaria

====

Is there a Regex guide for use in MS other than the command reference page?

Thanks!

DreamTheater
Newbie
Posts: 19
Joined: Mon Oct 14, 2019 6:23 am

Re: Remove blank lines when there are more than one of them

Post by DreamTheater » Fri Jan 24, 2020 6:51 am

In my experience for Regex you need to do a lot of testing, what returns results in some engines doesn't always seem to in Macro Scheduler.

I use Regex101 for all of my testing, that engine by default has multiline and global mode enabled by default.
I've had issues with forgetting to enable multiline (?m) in MS which had a mismatch of results when using ^$.

Not using a greedy quantifier has also had some mixed results, sometimes adding a ? after something finds the result. Generally always a good idea after something like .* which is followed by \n. (.*?\n rather than .*\n).

Regex can be tricky, so usually when I encounter something that's weird I put it down to I've probably made a mistake and try another method to achieve the same result.

User avatar
Dorian (MJT support)
Automation Wizard
Posts: 1348
Joined: Sun Nov 03, 2002 3:19 am
Contact:

Re: Remove blank lines when there are more than one of them

Post by Dorian (MJT support) » Fri Jan 24, 2020 9:09 am

Just in case you don't manage to master Regex, here's a way to do it using only Macro Scheduler.

This method reads an entire text file, uses Separate, and then uses Position to check each paragraph to see if it contains "POSITIVE to".

Of course you could use ReadLn and read it line-by-line, negating the need to use Separate, but this may be slower on very large files.

As long as all your text is in the variable TheText, it doesn't matter how it gets there. So if it was all in the clipboard, for instance, you could replace ReadFile or ReadLn with GetClipBoard>TheText

Code: Select all

ReadFile>d:\pollens.txt,TheText
Separate>TheText,CRLF,TheParagraphs

let>k=0
Repeat>k
  Let>k=k+1
  Pos>POSITIVE to,TheParagraphs_%k%,1,PosPOSTO,
  If>posPOSTO>0
    MessageModal>TheParagraphs_%k%
  Endif
Until>k,TheParagraphs_count
Label>end
Yes, we have a Custom Scripting Service. Message me or go here

hagchr
Automation Wizard
Posts: 327
Joined: Mon Jul 05, 2010 7:53 am
Location: Stockholm, Sweden

Re: Remove blank lines when there are more than one of them

Post by hagchr » Fri Jan 24, 2020 12:52 pm

From the the Manual:
RegEx is compatible with the Perl 5.10 regular expression syntax using the PCRE library.

hagchr
Automation Wizard
Posts: 327
Joined: Mon Jul 05, 2010 7:53 am
Location: Stockholm, Sweden

Re: Remove blank lines when there are more than one of them

Post by hagchr » Fri Jan 24, 2020 1:27 pm

Sometimes I get differences using MS and RegexBuddy, but usually it relates to difference in modifier settings. Top left in RegexBuddy you have different choices for eg, case sensitivity, spacing, dot matches line breaks, etc.

In MS, if unsure, it can help to include the modifier in the search pattern so you know how it will behave.

In the given example, you can add the modifier in the beginning.
(?m-s)^POS.*$
m (m turned on) means match ^ and $ on every line
-s (s turned off) means the DOT does not match line break

Then ^ and $ will match on every line but never passed a line break.

Then as mentioned greedy/non-greedy can create problems if one is not careful.

User avatar
PepsiHog
Automation Wizard
Posts: 511
Joined: Wed Apr 08, 2009 4:19 pm
Location: Florida

Re: Remove blank lines when there are more than one of them

Post by PepsiHog » Sat Jan 25, 2020 9:25 pm

@ hagchr,
I agree. I find this to work well between RegexBuddy and Macro Scheduler.
[edit]- I just noticed you turn off 's'. I'll have to try that.

regex>(?Usmi)pattern,text,,match,nom,0

i= ignore case
m= ^ and $ match start and end of line
s= . matches newline as well
x= Allow spaces and comments
J= Duplicate group names allowed
U= Ungreedy quantifiers

?=set flags.

@ari,
Question mark is used in multiple ways. In this case it is to indicate you are setting flags.

I included some extras. But I find (?Usmi) works well.

"pattern" is what ever pattern you've found that works in RegExBuddy and so on. Just include (?Usmi)

Also, check these out. I captured these and use them when I need to.
https://www.cheatography.com/davechild/ ... pressions/

Ok. Well, there's one. I can't find the other I have. But just search "regex cheat sheet". There's plenty out there.

I use MS capture and capture the cheats and then place them all in one image, similar to how they are online.


PepsiHog
Windows 7

PepsiHog. Yep! I drink LOTS of Pepsi (still..in 2021) AND enjoy programming. (That's my little piece of heaven!)

The immensity of the scope of possibilities within Macro Scheduler pushes the user beyond just macros!

Post Reply
Sign up to our newsletter for free automation tips, tricks & discounts