ISBN 9789350238691,Regular Expressions Cookbook

Regular Expressions Cookbook



Shroff Publishers & Distributors Pvt Ltd

Publication Year 2012

ISBN 9789350238691

ISBN-10 9350238691

Paper Back

Edition 2nd
Number of Pages 632 Pages
Language (English)

Computers & Internet

Take the guesswork out of using regular expressions. With more than 140 practical recipes, this cookbook provides everything you need to solve a wide range of real-world problems. Novices will learn basic skills and tools, and programmers and experienced users will find a wealth of detail. Each recipe provides samples you can use right away.

This revised edition covers the regular expression flavors used by C#, Java, JavaScript, Perl, PHP, Python, Ruby, and VB.NET. You'll learn powerful new tricks, avoid flavor-specific gotchas, and save valuable time with this huge library of practical solutions.

Learn regular expressions basics through a detailed tutorial
Use code listings to implement regular expressions with your language of choice
Understand how regular expressions differ from language to language
Handle common user input with recipes for validation and formatting
Find and manipulate words, special characters, and lines of text
Detect integers, floating-point numbers, and other numerical formats
Parse source code and process log files
Use regular expressions in URLs, paths, and IP addresses
Manipulate HTML, XML, and data exchange formats
Discover little-known regular expression tricks and techniques
About The Author
Jan Goyvaerts runs Just Great Software, where he designs and develops some of the most popular regular expression software. His products include RegexBuddy, the world's only regular expression editor that emulates the peculiarities of 15 regular expression flavors, and PowerGREP, the most feature-rich grep tool for Microsoft Windows.

Steve Levithan works at Facebook as a JavaScript engineer. He has enjoyed programming for nearly 15 years, working in Tokyo, Washington D.C., Baghdad, and Silicon Valley. Steven is a leading JavaScript regular expression expert, and has created a variety of open source regular expression tools including RegexPal and the XRegExp library.

Table of Contents

Chapter 1 Introduction to Regular Expressions

Regular Expressions Defined
Search and Replace with Regular Expressions
Tools for Working with Regular Expressions
Chapter 2 Basic Regular Expression Skills

Match Literal Text
Match Nonprintable Characters
Match One of Many Characters
Match Any Character
Match Something at the Start and/or the End of a Line
Match Whole Words
Unicode Code Points, Categories, Blocks, and Scripts
Match One of Several Alternatives
Group and Capture Parts of the Match
Match Previously Matched Text Again
Capture and Name Parts of the Match
Repeat Part of the Regex a Certain Number of Times
Choose Minimal or Maximal Repetition
Eliminate Needless Backtracking
Prevent Runaway Repetition
Test for a Match Without Adding It to the Overall Match
Match One of Two Alternatives Based on a Condition
Add Comments to a Regular Expression
Insert Literal Text into the Replacement Text
Insert the Regex Match into the Replacement Text
Insert Part of the Regex Match into the Replacement Text
Insert Match Context into the Replacement Text
Chapter 3 Programming with Regular Expressions

Programming Languages and Regex Flavors
Literal Regular Expressions in Source Code
Import the Regular Expression Library
Create Regular Expression Objects
Set Regular Expression Options
Test If a Match Can Be Found Within a Subject String
Test Whether a Regex Matches the Subject String Entirely
Retrieve the Matched Text
Determine the Position and Length of the Match
Retrieve Part of the Matched Text
Retrieve a List of All Matches
Iterate over All Matches
Validate Matches in Procedural Code
Find a Match Within Another Match
Replace All Matches
Replace Matches Reusing Parts of the Match
Replace Matches with Replacements Generated in Code
Replace All Matches Within the Matches of Another Regex
Replace All Matches Between the Matches of Another Regex
Split a String
Split a String, Keeping the Regex Matches
Search Line by Line
Construct a Parser
Chapter 4 Validation and Formatting

Validate Email Addresses
Validate and Format North American Phone Numbers
Validate International Phone Numbers
Validate Traditional Date Formats
Validate Traditional Date Formats, Excluding Invalid Dates
Validate Traditional Time Formats
Validate ISO 8601 Dates and Times
Limit Input to Alphanumeric Characters
Limit the Length of Text
Limit the Number of Lines in Text
Validate Affirmative Responses
Validate Social Security Numbers
Validate ISBNs
Validate ZIP Codes
Validate Canadian Postal Codes
Validate U.K. Postcodes
Find Addresses with Post Office Boxes
Reformat Names From "FirstName LastName" to "LastName, FirstName"
Validate Password Complexity
Validate Credit Card Numbers
European VAT Numbers
Chapter 5 Words, Lines, and Special Characters

Find a Specific Word
Find Any of Multiple Words
Find Similar Words
Find All Except a Specific Word
Find Any Word Not Followed by a Specific Word
Find Any Word Not Preceded by a Specific Word
Find Words Near Each Other
Find Repeated Words
Remove Duplicate Lines
Match Complete Lines That Contain a Word
Match Complete Lines That Do Not Contain a Word
Trim Leading and Trailing Whitespace
Replace Repeated Whitespace with a Single Space
Escape Regular Expression Metacharacters
Chapter 6 Numbers

Integer Numbers
Hexadecimal Numbers
Binary Numbers
Octal Numbers
Decimal Numbers
Strip Leading Zeros
Numbers Within a Certain Range
Hexadecimal Numbers Within a Certain Range
Integer Numbers with Separators
Floating-Point Numbers
Numbers with Thousand Separators
Add Thousand Separators to Numbers
Roman Numerals
Chapter 7 Source Code and Log Files

Numeric Constants
Single-Line Comments
Multiline Comments
All Comments
Strings with Escapes
Regex Literals
Here Documents
Common Log Format
Combined Log Format
Broken Links Reported in Web Logs
Chapter 8 URLs, Paths, and Internet Addresses

Validating URLs
Finding URLs Within Full Text
Finding Quoted URLs in Full Text
Finding URLs with Parentheses in Full Text
Turn URLs into Links
Validating URNs
Validating Generic URLs
Extracting the Scheme from a URL
Extracting the User from a URL
Extracting the Host from a URL
Extracting the Port from a URL
Extracting the Path from a URL
Extracting the Query from a URL
Extracting the Fragment from a URL
Validating Domain Names
Matching IPv4 Addresses
Matching IPv6 Addresses
Validate Windows Paths
Split Windows Paths into Their Parts
Extract the Drive Letter from a Windows Path
Extract the Server and Share from a UNC Path
Extract the Folder from a Windows Path
Extract the Filename from a Windows Path
Extract the File Extension from a Windows Path
Strip Invalid Characters from Filenames
Chapter 9 Markup and Data Formats

Processing Markup and Data Formats with Regular Expressions
Find XML-Style Tags
Replace Tags with
Remove All XML-Style Tags Except and
Match XML Names
Convert Plain Text to HTML by Adding and
Decode XML Entities
Find a Specific Attribute in XML-Style Tags
Add a cellspacing Attribute to Tags That Do Not Already Include It
Remove XML-Style Comments
Find Words Within XML-Style Comments
Change the Delimiter Used in CSV Files
Extract CSV Fields from a Specific Column
Match INI Section Headers
Match INI Section Blocks
Match INI Name-Value Pairs