EasyPatterns™ are a revolutionary new way to describe text patterns that
is easy to understand, easy to use, and still just as powerful as the old-hard
to understand Perl-style or grep-style pattern matching languages.
A pattern describes text in a general way. Searching based on literal text
yields only one match, whereas searching with a pattern can match a whole class
of text that shares the specified characteristics. Phone numbers (in North
America) are a perfect example: three groups of digits separated by various
punctuation characters. Here's a simple Text Machine pattern that will find most
phone numbers:
[3 digits, punctuation, 3 digits, punctuation, 4 digits]
Let's suppose that you have a list of addresses phone numbers gathered from
different sources. The phone numbers are likely to be in many formats: some with
/ and -, some with periods, etc. To convert them to a consistent format you need
to keep the three groups of digits and insert new punctuation in between. With
TextPipe's EasyPatterns, this is as easy as
Search for: "[punctuation, capture(3 digits), punctuation, capture(3 digits), punctuation,
capture(4
digits)]"
Replace with: "$1-$2-$3"
This section introduces all of the essential elements of EasyPattern patterns. If you are new to pattern-matching, there's no reason you have to read the entire section at once. Feel free to stop after any section and continue with the next 2 sections or spend time reviewing example. Also, don't worry if you don't immediately understand every point. The only way to learn pattern matching is to create your own patterns, including the inevitable trial and error required to get just the right match. TextPipe's trial run area was designed with this in mind.
Throughout this text and the reference section, double quotes are used around both the pattern and the example text that it would match. The double quotes should *NOT* be entered in TextPipe.
Unlike other pattern matching languages, EasyPatterns use plain English
keywords such as "letter" and "digit" to define text patterns. At the most basic
level, each keyword appears in [square brackets], e.g.
"[letter][digit]" - matches "a0" ... "a9",
"b0" ... "b9", etc.
Why use brackets? Patterns can also include literal text; brackets
distinguish
that from the pattern specification:
"abc[digit]" - matches "abc0", "abc1" ...
"abc9"
"Z[letter][letter]" - matches "Zoo", "Zed", etc.
"whatever" - keywords are not required, EasyPattern can search just for literal
text
In addition to simple keywords, the brackets can also contain expressions.
The most common expressions indicate repetition, e.g.
"[optional digit]" - 0 or 1
"[oneOrMore digits]" - 1, 2 or many (no spaces are allowed in "oneOrMore")
"[1+ digits]" - same meaning, different notation
"[3+ digits]" - 3 or more
"[3 digits]" - exactly 3
"[3 to 5 digits]" - 3, 4 or 5
Simple keywords and expressions can be combined, e.g. here's a pattern that
matches many North American telephone numbers:
"[3 digits][punctuation][3
digits][punctuation][4 digits]"
That's a lot of brackets! The above pattern works, but EasyPattern also allows
multiple keywords and expressions to be combined with commas, e.g.
"[3 digits, punctuation, 3 digits,
punctuation, 4 digits]"
In fact, even the commas are optional, though you may find they make the
pattern easier to read.
"[3 digits punctuation 3 digits punctuation 4
digits]"
Many patterns require mixing keywords with literal text, e.g. to find
telephone numbers in a certain local exchange:
"978[punctuation]692[punctuation, 4 digits]"
That pattern works fine; EasyPattern also lets you include the literal text
inside the bracketed expression using single quotes:
"['978', punctuation, '692', punctuation, 4
digits]"
One of the fundamental ideas in pattern matching is that each character belongs to one or more character sets. For example, "a" is a member of the set of letters, represented by the keyword "[letter]". "2" is a member of the set "[digit]", etc. In EasyPattern, character sets can be combined with "or". For example, both "a" and "2" are members of the combined set "[letter or digit]". You can combine any number of sets with "or", e.g. "[letter or punctuation or symbol or whitespace]" is nearly everything except digits.
You can also use "or" to combine individual characters into a set, e.g. "[space or tab]". See the EasyPattern Reference for many additional whitespace options.
Tip: Parentheses are not required. For example, "[1+ space or tab]" and "[1+ (space or tab)]" have the same meaning.
The keyword "[char]" or "[character]" represents the set of all characters,
including letters, digits, punctuation, symbols, space, tab, return, linefeed,
formfeed and control characters. (NULL is not included; see
EasyPattern Reference for details and a
workaround.) Sometimes you want to match every character in a paragraph, i.e.
every character except return (or formfeed). EasyPattern has a special keyword
"[paragraphChar]" for this situation. For example, given the text "abc" then a
return then "123":
"[1+ char]" - matches all 7 characters
"[1+ paragraphChar]" - matches the first 3 characters
Tip: Look up "[paragraphChar]" in the EasyPattern Reference; you'll find a whole family of keywords for words, columns, lines & paragraphs.
Sometimes it's easier to specify the set that you don't want rather than
listing everything you want:
"[not letter]" - match anything except a
letter "[not letter or digit]" - match anything except a letter or
digit "[1+ not letter or digit]" - matches a sequence of characters
Tip: Since "not" is processed after the character set use of "or",
parentheses are still optional. For example, "[not space or tab]" and "[not
(space or tab)]" have the same meaning.
One of the most powerful features of EasyPattern isn't
related to patterns per se: the ability to find a match from among two or more
choices. Some examples:
"this[or]that"
"['this' or 'that']"
"EasyPattern[or]Player"
"StarTrek[or]StarWars[or]OSA[or]scripting"
Tip: If you include an alternative in a larger pattern, you must
include parentheses:
"[digit ('this' or 'that')]"
The keyword "or" has two uses: combining character sets and listing alternatives. The differences are sometimes subtle though will become more clear with experience. See the EasyPattern Reference for details.
EasyPattern lets you group any part of a pattern with parentheses. Groups can also be identified with a number to capture the text that the group matched for later use, either in a replacement string, or in a later part of the pattern, e.g. [( ... )1]. The number given is irrelevant – it simply serves to mark the parentheses as a group.
There are many reasons to group parts of a pattern:
For example, you could change phone numbers from many
different formats to your preferred format, searching for:
"[punctuation, capture(3 digits), punctuation, capture(3 digits), punctuation,
capture(4
digits)]"
and replacing with:
"$1-$2-$3"
Tip: Parentheses groups can be nested to any level, e.g. [1+ ((letter, digit) or (digit, letter))] will match "r2d2", "r22d", "2rd2", "2r2d", etc.
EasyPattern keywords are not allowed to contain spaces, and are not case sensitive; e.g. [uppercaseLetter], [UppercaseLetter], [UPPERCASELETTER], [UppErcAsELEttEr] are equivalent.
Although keywords are not case sensitive, the case sensitivity of literal text is controlled by TextPipe 'Match Case' option.
In the simplest replacement, you will find a pattern and
replace it with literal text, e.g.
replace "*[or]_" with "-"
To add literal text before or after existing text, use $0 to represent the
entire found text, e.g.
replace "-[space]" with "\t$0" -- insert a
tab before list items
The phone number example shows a common use of [(...)#] and $# (where # is a
number from 0-9); here's another example:
replace "[(column)1 tab (column)2]" with
"$2\t$1" -- swap columns
EasyPattern's core vocabulary was covered in this section. The best way to learn is by creating patterns for real-world tasks. When you have questions on the basics, refer back to this section. If you need a more detailed explanation of any topic or are looking for additional keywords to handle a new case, see the EasyPattern Reference.
Perl pattern | EasyPattern |
. | [char] |
.* | [zeroOrMore char], [0+ char] |
.+ | [oneOrMore char], [1+ char] |
[a-zA-Z] | [letter] |
[0-9] | [digit] |
\x20 | [space] |