I’ve just been doing some basic text layout clean up code, i.e. getting rid of empty lines, leading/trailing whitespace, etc. You may think a regex like this will get rid of all empty lines:
$text =~ s/ \n [\ ]* \n /\n/sxmg;
However, because the second \n in the regex matches and is removed strings like
$text = "\n \n \n";
become:
$text = "\n \n";
Only one line removed since the regex matcher has matched up to the second \n in the original string it only sees ” \n” and doesn’t remove the second empty line.
You may think about solving it by re-performing the regex a second time, e.g.
$text =~ s/ \n [\ ]* \n /\n/sxmg;
$text =~ s/ \n [\ ]* \n /\n/sxmg;
Not great as this still wouldn’t capture all empty lines if there were more than two in it, so what about:
while ( $text =~ s/ \n [\ ]* \n /\n/sxmg )
{
1;
}
Yes, that would do it by it isn’t very nice, the best way is to use a “Zero-width positive lookahead assertion”, e.g.
$text =~ s/ \n [\ ]* (?:\n) /\n/sxmg;
Nice, but still not quite there, it’ll get rid of all the lines with only white space it it, so we’ll need an extra regex to fully clear up newlines, so the end code will look something like:
$text =~ s/ \n [\ ]* (?:\n) /\n/sxmg; $text =~ s/ \n\n+ /\n\n/sxmg;