In the world of RegEx (Regular Expressions), not all engines support non-greedy or lazy matching capability of input; the lazy matching was introduced in Perl, so any Regex engine that implements PCRE (Perl Compatible Regular Expression) supports lazy matching out of the box.
If you're on an engine that does not support non-greedy match, you can use some trick to achieve that.
- I will be using GNU
2.25) in a shell, you can use any tool or any Regex library of your language of choice; they all should behave similarly except for some specific tokens (which I won't be referring here)
In the example below, we have a string
foo_bar_spam in variable
var and our target is to get
foo_ out of it using Regex.
Now, let's see with usual greedy Regex pattern
.* what we can get:
% grep -o '^.*_' <<<"$var" foo_bar_
foo_bar_ as expected.
Note, for GNU/Linux/shell users:
grep option used:
-ogets only the matched portion.
The shell token
<<< is known as here-string, it is a special form of here-document; here, using
<<<"$var", the expansion of variable
var is passed to the standard input of
grep. It is similar to doing:
% echo "$var" | grep -o '^.*_'
except one less process (
echo), and no pipe (
|) which is created in the kernel space (
Now, how can we get our desired portion?
One way would be to use the non-greedy operators
.*? provided by the
-P option of
-P enables PCRE engine in
% grep -Po '^.*?_' <<<"$var" foo_
But we are assuming an engine that does not have this support.
The way to do the exact same thing with any basic Regex engine is to use the pattern
% grep -o '^[^_]\+_' <<<"$var" foo_
Note: Here, we needed to escape
+ as it's a ERE (Extended RegEx) token, otherwise we can just use
-E to enable ERE:
% grep -Eo '^[^_]+_' <<<"$var" foo_
ERE enables quantifiers
(), which are not supported by BRE (Basic RegEx) that
grep uses by default
This is just for
grep, your engine should just support
+ out of the box, without escaping.
^matches the start of the line/string
[^_]+matches one or more characters (
+) upto next
_matches a literal
There you go! This trick could be used in any similar scenario.
As mentioned earlier, the Regex pattern is generic and should be reproducible on any Regex engine.
Here's with Python's default
re (RegEx) module:
>>> var = 'foo_bar_spam' >>> import re >>> re.search(r'^.*_', var).group() #Greedy 'foo_bar_' >>> re.search(r'^.*?_', var).group() #Non-greedy with `.*?` 'foo_' >>> re.search(r'^[^_]+_', var).group() #Non-greedy with `[^_]` 'foo_'
Happy coding! Thanks!