This document is free text: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or any later version.
This document is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with this program. If not, see
Well I know Regex is (almost) the same on every system, but this site is for Debian and Ubuntu, so it is named as.
Book: 978-0-13-475706-3 Learning Regular Expressions by Ben Forta
Book: 978-1-4842-3875-2 Regex Quick Syntax Reference by Zsolt Nagy
Book: 978-1-449-31943-4Regular Expressions Cookbook by Jan Goyvaerts and Steven Levithan
Obviously every string is a match for itself - go: go - dog: dog
| or []
Any operator or quantifier can be escaped with \ to resemble itself
When used in a bracket, \ is not necessary
A * quantifier after a character or group means 0 or more occurences of it.
A + quantifier after a character or group means 1 or more occurences of it.
A ? quantifier after a character or group means 0 or 1 occurences of it.
Could be in {m} {n,} or {p,r} forms, m, n, p, r are all whole numbers
They come after a character or group and mean: 1. Exactly m occurences 2. n or more occurences 3. p to r occurences
By default, a quantifier matches as many of characters as possible.
When we try the regex:
\(.*\)
(find anything in paranthesis) on the following
abc(def)ghi(jkl)mno
instead of matching (def)
and (jkl)
, it matches (def)ghi(jkl)
. This is called greedy matchings. So quantifiers are greedy by default.
To change the behaviour, that is matching the minimum, we can use the lazy versions of quantifiers by adding a ? to the end. Like:
\(.*?\)
This regex matches (def) and (jkl), and this is called lazy matching.
The lazy versions of the quantifiers are as follow:
\b denotes beginning or end of a word (characters surrounded by whitespaces).
^ defines the start of a line
$ defines the end of a line
A subexpression is a group of characters or operators in paranthesis. They are used to apply quantifiers to expressions.
Subexpressions can be nested
A backreference is in the format of a backslash followed by a digit, like \1 \2 \3. It refers to the subexpression in the relative position.
For example, the following regex matches the repeating words:
[\s]+(\w+)[\s]+\1[\s]+
[\s]+(\w+)[\s]+ matches a word, that is 1 or more whitespaces, followed by 1 or more characters, followed by 1 or more whitespaces.
As the part (\w+) is the first subexpression in the regex, \1 matches to whatever it matches. So the regex matches the repeating word.
Another example would be matching repeating word couples:
[\s]+(\w+)[\s]+(\w+)[\s]+\1[\s]+\2[\s]+
The first (\w+) will be the first word as \1, and the second one will be the second word as \2.
Backreferences help a lot at find and replace operations. At the repeating word example, if we want to replace repeating words to a single one, for the replace part we would have to write \1
Please consider, this examples are not perfect. You or someone else can definitely find or write better versions.
\w+[\w\.]*@\w+[\w\.]*\.\w+
Name Part:
it can only start with a letter or a digit \w+ then may follow any number of letters, digits and dots [\w.]* then comes @
Domain Part:
it can only start with a letter or digit \w+ then may follow any number of letters, digits and dots [\w.]* then comes . \. and then comes the TLD part \w+
\d{1,2}[-\/.]\d{1,2}[-\/.]\d{2,4}
1 or 2 digit day field → \d{1,2}
Separator - \ / or . → [-\/.]
1 or 2 digit month field → \d{1,2}
Separator - \ / or . → [-\/.]
2 to 4 digit year field → \d{2,4}
(\d{1,3}\.){3}\d{1,3}
3 of (1 to 3 digit numbers, followed by a dot) → (\d{1,3}.){3} 1 (1 to 3 digit numbers) → \d{1,3}
Actually this regex matches invalid IP addresses too, like: 300.288.11.11
(((25[0-5])|(2[0-4]\d)|(1\d{2})|(\d{1,2}))\.)(((25[0-5])|(2[0-4]\d)|(1\d{2})|(\d{1,2})))