EasyPatterns™ are a revolutionary new way to describe text patterns that is easy to understand, easy to use, and still just as powerful as the old-hard to understand Perl-style or grep-style pattern matching languages.

What is a Pattern?

A pattern describes text in a general way. Searching based on literal text yields only one match, whereas searching with a pattern can match a whole class of text that shares the specified characteristics. Phone numbers (in North America) are a perfect example: three groups of digits separated by various punctuation characters. Here's a simple Text Machine pattern that will find most phone numbers:
 [3 digits, punctuation, 3 digits, punctuation, 4 digits]

Let's suppose that you have a list of addresses phone numbers gathered from different sources. The phone numbers are likely to be in many formats: some with / and -, some with periods, etc. To convert them to a consistent format you need to keep the three groups of digits and insert new punctuation in between. With TextPipe's EasyPatterns, this is as easy as
  Search for: "[punctuation, capture(3 digits), punctuation, capture(3 digits), punctuation, capture(4 digits)]"
  Replace with: "$1-$2-$3"

This section introduces all of the essential elements of EasyPattern patterns. If you are new to pattern-matching, there's no reason you have to read the entire section at once. Feel free to stop after any section and continue with the next 2 sections or spend time reviewing example. Also, don't worry if you don't immediately understand every point. The only way to learn pattern matching is to create your own patterns, including the inevitable trial and error required to get just the right match. TextPipe's trial run area was designed with this in mind.

Throughout this text and the reference section, double quotes are used around both the pattern and the example text that it would match. The double quotes should *NOT* be entered in TextPipe.

Basics

Unlike other pattern matching languages, EasyPatterns use plain English keywords such as "letter" and "digit" to define text patterns. At the most basic level, each keyword appears in [square brackets], e.g.
 "[letter][digit]" - matches "a0" ... "a9", "b0" ... "b9", etc.

Why use brackets? Patterns can also include literal text; brackets distinguish that from the pattern specification:
 "abc[digit]" - matches "abc0", "abc1" ... "abc9"
 "Z[letter][letter]" - matches "Zoo", "Zed", etc.
 "whatever" - keywords are not required, EasyPattern can search just for literal text

In addition to simple keywords, the brackets can also contain expressions. The most common expressions indicate repetition, e.g.
 "[optional digit]" - 0 or 1
 "[oneOrMore digits]" - 1, 2 or many (no spaces are allowed in "oneOrMore")
 "[1+ digits]" - same meaning, different notation
 "[3+ digits]" - 3 or more
 "[3 digits]" - exactly 3
 "[3 to 5 digits]" - 3, 4 or 5

Simple keywords and expressions can be combined, e.g. here's a pattern that matches many North American telephone numbers:
 "[3 digits][punctuation][3 digits][punctuation][4 digits]"

That's a lot of brackets! The above pattern works, but EasyPattern also allows multiple keywords and expressions to be combined with commas, e.g.
 "[3 digits, punctuation, 3 digits, punctuation, 4 digits]"

In fact, even the commas are optional, though you may find they make the pattern easier to read.
 "[3 digits punctuation 3 digits punctuation 4 digits]"

Many patterns require mixing keywords with literal text, e.g. to find telephone numbers in a certain local exchange:
 "978[punctuation]692[punctuation, 4 digits]"

That pattern works fine; EasyPattern also lets you include the literal text inside the bracketed expression using single quotes:
 "['978', punctuation, '692', punctuation, 4 digits]"

Character Sets

One of the fundamental ideas in pattern matching is that each character belongs to one or more character sets. For example, "a" is a member of the set of letters, represented by the keyword "[letter]". "2" is a member of the set "[digit]", etc. In EasyPattern, character sets can be combined with "or". For example, both "a" and "2" are members of the combined set "[letter or digit]". You can combine any number of sets with "or", e.g. "[letter or punctuation or symbol or whitespace]" is nearly everything except digits.

You can also use "or" to combine individual characters into a set, e.g. "[space or tab]". See the EasyPattern Reference for many additional whitespace options.

Tip: Parentheses are not required. For example, "[1+ space or tab]" and "[1+ (space or tab)]" have the same meaning.

The keyword "[char]" or "[character]" represents the set of all characters, including letters, digits, punctuation, symbols, space, tab, return, linefeed, formfeed and control characters. (NULL is not included; see EasyPattern Reference for details and a workaround.) Sometimes you want to match every character in a paragraph, i.e. every character except return (or formfeed). EasyPattern has a special keyword "[paragraphChar]" for this situation. For example, given the text "abc" then a return then "123":
 "[1+ char]" - matches all 7 characters
 "[1+ paragraphChar]" - matches the first 3 characters

Tip: Look up "[paragraphChar]" in the EasyPattern Reference; you'll find a whole family of keywords for words, columns, lines & paragraphs.

Sometimes it's easier to specify the set that you don't want rather than listing everything you want:
 "[not letter]" - match anything except a letter  "[not letter or digit]" - match anything except a letter or digit  "[1+ not letter or digit]" - matches a sequence of characters Tip: Since "not" is processed after the character set use of "or", parentheses are still optional. For example, "[not space or tab]" and "[not (space or tab)]" have the same meaning.

Alternatives

One of the most powerful features of EasyPattern isn't related to patterns per se: the ability to find a match from among two or more choices. Some examples:
 "this[or]that"
 "['this' or 'that']"
 "EasyPattern[or]Player"
 "StarTrek[or]StarWars[or]OSA[or]scripting"

Tip: If you include an alternative in a larger pattern, you must include parentheses:
 "[digit ('this' or 'that')]"

The keyword "or" has two uses: combining character sets and listing alternatives. The differences are sometimes subtle though will become more clear with experience. See the EasyPattern Reference for details.

Groups

EasyPattern lets you group any part of a pattern with parentheses. Groups can also be identified with a number to capture the text that the group matched for later use, either in a replacement string, or in a later part of the pattern, e.g. [( ... )1]. The number given is irrelevant – it simply serves to mark the parentheses as a group.

There are many reasons to group parts of a pattern:

For example, you could change phone numbers from many different formats to your preferred format, searching for:
 "[punctuation, capture(3 digits), punctuation, capture(3 digits), punctuation, capture(4 digits)]"

and replacing with:
 "$1-$2-$3"

Tip: Parentheses groups can be nested to any level, e.g. [1+ ((letter, digit) or (digit, letter))] will match "r2d2", "r22d", "2rd2", "2r2d", etc.

Case in Keywords

EasyPattern keywords are not allowed to contain spaces, and are not case sensitive; e.g. [uppercaseLetter], [UppercaseLetter], [UPPERCASELETTER], [UppErcAsELEttEr] are equivalent.

Case in Literals

Although keywords are not case sensitive, the case sensitivity of literal text is controlled by TextPipe 'Match Case' option.

Patterns in Replace Strings

In the simplest replacement, you will find a pattern and replace it with literal text, e.g.
 replace "*[or]_" with "-"

To add literal text before or after existing text, use $0 to represent the entire found text, e.g.
 replace "-[space]" with "\t$0" -- insert a tab before list items

The phone number example shows a common use of [(...)#] and $# (where # is a number from 0-9); here's another example:
 replace "[(column)1 tab (column)2]" with "$2\t$1" -- swap columns

What Next?

EasyPattern's core vocabulary was covered in this section. The best way to learn is by creating patterns for real-world tasks. When you have questions on the basics, refer back to this section. If you need a more detailed explanation of any topic or are looking for additional keywords to handle a new case, see the EasyPattern Reference.

Converting Perl patterns for readability

Perl pattern EasyPattern
. [char]
.* [zeroOrMore char], [0+ char]
.+ [oneOrMore char], [1+ char]
[a-zA-Z] [letter]
[0-9] [digit]
\x20 [space]