POSIX Regular Expressions

Modified on Wed, 27 Nov 2019 at 10:40 AM

POSIX regular expressions can be used in a wide range of systems to match files for inclusion or exclusion of files from file sets or searches. For example both the Asigra and Infrascale appliances allow for POSIX expressions to be used in their selection criteria.


POSIX regular expressions are much like the wildcard characters in Microsoft Windows, but much more flexible and powerful.


The standard Microsoft Windows wildcard matches use two wildcards: * matches any number of any characters, and ? matches any character. For example, ?.doc returns a.doc, but not file.doc; *.doc matches both.


POSIX regular expressions have more options. These are some of the special characters used in POSIX regular expressions:

                             

CharacterDescription
.Matches a single character
[]Match characters specified within square brackets
[^]Match any characters, not specified within square brackets
\Escape character. Toggles special meaning of the following character. To get a literal backslash (\), enter \\ (the first backslash makes the second one not to be a special character anymore)
\dMatches any number. Shorter variant of [0-9]
\DMatches anything but numbers. Shorter variant of [^0-9]
$Matches the end of the filename
^Matches the start of the filename
*Modifies the preceding character to match zero or more times
+Modifies the preceding character to match once or more times
?Modifies the preceding character to match once or more times
{m}Modifies the preceding character to match m times, e.g. .{3} matches any three characters
{m,}Modifies the preceding character to match m or more times, e.g. .{3,} matches three or more characters
{m,n}Modifies the preceding character to match from m to n times, e.g. .{3,4} matches any 3 or 4 characters


Regular expressions will match a file if the expression matches any part of the filename. So, a regular expression g will match any file with the letter g anywhere in the filename. $ matches the end of the filename (including the extension), so g$ will match any filename ending in g, e.g. .mpg, .png and .jpg files.


Assume you have the following files: a.doc, a.dooc, a.dc, a.dac, and aldoc.


Regular expression a\.d.c matches both a.doc and a.dac. Backslash (\) makes period (.) between a and d match period (.) specifically, not any character.


Regular expression a.d.c matches a.doc, a.dac, and aldoc.


Wrapping characters in square brackets matches a single character to anything in the set. This means that regular expression d[abcde]c matches a.dac, but not a.doc (because o isn’t listed). [] can also contain a range of characters, so the same regular expression can be written as d[a-e]c, which is easier to write. Square brackets can also be used to match anything not in a list. So d[^f-z]c matches a.dac, but not a.doc since o is between f and z.


* modifies the preceding character to match zero or more times. So, regular expression do*c matches a.dc, a.doc, and a.dooc. * can also modify square brackets. Regular expression d[a-e]*c also matches a.dc, a.doc, and a.dooc.


+ modifies the preceding character match same as * does, except it requires at least one match. So, regular expression do+c matches a.doc and a.dooc, but not a.dc.


{} modifier sets a fixed number or range of matches. For example, .do{2}c matches a.dooc. You can also specify a range, e.g. regular expression a\.do{0,1}c matches a.dc, and a.doc, but not a.dooc.


Note. Use slash (/) to separate directories even in Windows where directories are separated with backslash (\).


Below are a few examples of the regular Windows character matching using wildcards, and their equivalent regular expressions:

                          

Windows wildcardPOSIX regular expressionExplanation
*z*.*, *.*z*zMatches any file with z in its filename or extension
*.com\.com$Matches all .com files
*.?om..om$Matches all .aom, .bom, .com, etc. files
*.aom, *.bom, *.zom.[abz]om$Matches all .aom, .bom and .zom files in C:\Windows
a*.*, b*.*, c*.*, d*.*, e*.*, f*.*, g*.*, h*.*, i*.*, j*.*^[a-j]Matches all files and directories starting with letters from a to j
N/A^[0-9]*\.doc$Matches all .doc files with filenames that are only numbers
N/A^[0-9].*\.doc$Matches all .doc files with filenames starting with a number
N/A^[0-9]{6}\.doc$Matches all .doc files with filenames of 6 characters long with only numbers in them




Was this article helpful?

That’s Great!

Thank you for your feedback

Sorry! We couldn't be helpful

Thank you for your feedback

Let us know how can we improve this article!

Select atleast one of the reasons
CAPTCHA verification is required.

Feedback sent

We appreciate your effort and will try to fix the article