Regular Expressions Explained: Complete Regex Guide with Examples

By Suvom Das March 12, 2026 18 min read

1. What Are Regular Expressions?

Regular expressions -- commonly abbreviated as "regex" or "regexp" -- are sequences of characters that define search patterns. They are one of the most powerful text-processing tools available to developers, used for matching, finding, extracting, and replacing text within strings. Originally rooted in formal language theory developed by mathematician Stephen Kleene in the 1950s, regular expressions became practical tools when Ken Thompson implemented them in the Unix text editor ed in the late 1960s.

Today, regular expressions are supported in virtually every programming language, text editor, and command-line tool. Whether you are validating user input in a web form, parsing log files on a server, searching through a codebase, or transforming data in a pipeline, regex gives you a concise and expressive way to describe text patterns.

Consider a simple example: you need to find all email addresses in a large body of text. Without regex, you would have to write dozens of lines of string-manipulation code, handling dots, at signs, domain suffixes, and edge cases. With regex, a single pattern like [a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,} accomplishes the same task in one line.

The trade-off is readability. Regex patterns can look intimidating -- a dense line of dots, brackets, backslashes, and question marks. But once you learn the building blocks, you will find that regex is remarkably logical and systematic. This guide breaks down every concept from the basics through advanced features, with practical examples you can apply immediately.

2. Regex Syntax Fundamentals

Every regex pattern is built from a small set of building blocks. Understanding these fundamentals is the key to reading and writing any regular expression.

Literal Characters

The simplest regex is a literal string. The pattern hello matches the exact sequence of characters "hello" in the target string. Most characters -- letters, digits, spaces -- match themselves literally. If you search for cat, it will match "cat" in "concatenate", "catalog", and "the cat sat".

Metacharacters

Certain characters have special meaning in regex and do not match themselves literally. These are called metacharacters:

. ^ $ * + ? { } [ ] \ | ( )

Each metacharacter serves a specific purpose:

To match a metacharacter literally, precede it with a backslash. For example, to match the string "3.14", use 3\.14. Without the backslash, 3.14 would match "3.14" but also "3X14", "3-14", or any string with any character between "3" and "14".

Character Classes

Character classes (also called character sets) let you match any one character from a defined set. They are enclosed in square brackets:

Regex also provides shorthand character classes for common sets:

These shorthands make patterns more concise. For example, \d{3}-\d{4} matches a phone number fragment like "555-1234" and is much easier to read than [0-9]{3}-[0-9]{4}.

3. Quantifiers

Quantifiers specify how many times a character, group, or character class must occur for a match. They are appended after the element they quantify.

Basic Quantifiers

Counted Quantifiers

Curly braces let you specify exact counts or ranges:

Greedy vs. Lazy Quantifiers

By default, quantifiers are greedy -- they match as many characters as possible while still allowing the overall pattern to succeed. Adding a ? after a quantifier makes it lazy (also called reluctant), matching as few characters as possible.

This distinction matters most when your pattern includes a wildcard followed by a delimiter. Consider matching HTML tags in the string <b>hello</b> world <b>goodbye</b>:

# Greedy: matches from first < to LAST >
<.*>    matches "<b>hello</b> world <b>goodbye</b>"

# Lazy: matches from first < to NEXT >
<.*?>   matches "<b>", then "</b>", then "<b>", then "</b>"

The lazy versions of all quantifiers are *?, +?, ??, {n,}?, and {n,m}?.

4. Anchors and Boundaries

Anchors do not match characters -- they match positions in the string. They are essential for ensuring your pattern matches at the right location.

Start and End Anchors

Using both anchors together ensures the entire string matches your pattern, which is critical for validation:

# Without anchors: matches "abc" inside "xyzabc123"
abc

# With anchors: only matches if the ENTIRE string is "abc"
^abc$

# Validate a 5-digit ZIP code (entire string must be exactly 5 digits)
^\d{5}$

Word Boundaries

The \b metacharacter matches a word boundary -- the position between a word character (\w) and a non-word character (\W), or at the start/end of the string. It is invaluable for matching whole words:

# Without word boundary: "cat" matches in "concatenate"
cat

# With word boundary: only matches the standalone word "cat"
\bcat\b

The inverse, \B, matches any position that is NOT a word boundary. For example, \Bcat\B would match "cat" inside "concatenate" but not the standalone word "cat".

5. Groups and Capturing

Parentheses serve multiple purposes in regex: grouping, capturing, and applying quantifiers to multi-character sequences.

Capturing Groups

Parentheses () create a capturing group. The text matched by the group is saved and can be referenced later -- either in the replacement string or within the pattern itself (backreference). Groups are numbered sequentially starting from 1, based on the position of the opening parenthesis:

# Pattern: (\d{4})-(\d{2})-(\d{2})
# Input:   "2026-03-12"
# Group 1: "2026"  (year)
# Group 2: "03"    (month)
# Group 3: "12"    (day)

Backreferences let you match the same text that was previously captured. The syntax \1 refers to the first group, \2 to the second, and so on:

# Match repeated words: "the the", "is is", etc.
\b(\w+)\s+\1\b

Non-Capturing Groups

Sometimes you need grouping for logical structure or to apply a quantifier, but you do not need to capture the matched text. Non-capturing groups use the syntax (?:):

# Capturing group (saves "http" or "https"):
(https?)://

# Non-capturing group (groups but does not save):
(?:https?)://

Non-capturing groups are slightly more efficient and keep your group numbering clean, which matters when you have many groups in a complex pattern.

Named Capturing Groups

Named groups assign a name to a capturing group, making the pattern self-documenting and the matched data easier to access in code. The syntax is (?<name>) (JavaScript, .NET, Python with (?P<name>)):

# Named groups for a date pattern:
(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})

# In JavaScript:
const match = "2026-03-12".match(/(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})/);
console.log(match.groups.year);   // "2026"
console.log(match.groups.month);  // "03"
console.log(match.groups.day);    // "12"

Alternation Within Groups

The pipe character | inside a group provides alternatives:

# Match "cat", "dog", or "bird"
(?:cat|dog|bird)

# Match image file extensions
\.(?:jpg|jpeg|png|gif|svg|webp)$

6. Lookahead and Lookbehind

Lookahead and lookbehind assertions (collectively called "lookaround") are zero-width assertions. They check whether a pattern exists ahead of or behind the current position without consuming any characters. This means the matched text does not include the lookaround content.

Positive Lookahead: (?=...)

Asserts that what immediately follows the current position matches the given pattern:

# Match "foo" only if followed by "bar"
foo(?=bar)

# Input: "foobar foobaz"
# Matches: "foo" in "foobar" (but NOT "foo" in "foobaz")

Negative Lookahead: (?!...)

Asserts that what immediately follows does NOT match:

# Match "foo" only if NOT followed by "bar"
foo(?!bar)

# Input: "foobar foobaz"
# Matches: "foo" in "foobaz" (but NOT "foo" in "foobar")

Positive Lookbehind: (?<=...)

Asserts that what immediately precedes the current position matches:

# Match a number only if preceded by "$"
(?<=\$)\d+

# Input: "$100 and 200"
# Matches: "100" (but NOT "200")

Negative Lookbehind: (?<!...)

Asserts that what immediately precedes does NOT match:

# Match a number only if NOT preceded by "$"
(?<!\$)\d+

# Input: "$100 and 200"
# Matches: "00" and "200"

Practical Lookaround Examples

Lookaround is commonly used for password validation, where multiple conditions must be met simultaneously:

# Password must contain:
# - at least 8 characters
# - at least one uppercase letter
# - at least one lowercase letter
# - at least one digit
# - at least one special character
^(?=.*[A-Z])(?=.*[a-z])(?=.*\d)(?=.*[!@#$%^&*]).{8,}$

Each lookahead checks a condition independently without advancing the position, so all conditions are verified against the same starting point.

7. Flags Explained

Flags (also called modifiers) change how the regex engine interprets your pattern. They are specified outside the pattern -- in JavaScript as a suffix like /pattern/flags, and in Python as constants like re.IGNORECASE.

g -- Global

Without the global flag, the regex engine stops after finding the first match. With g, it continues searching for all non-overlapping matches throughout the string:

// JavaScript example
"cat bat hat".match(/[a-z]at/);    // ["cat"]       (first match only)
"cat bat hat".match(/[a-z]at/g);   // ["cat", "bat", "hat"]  (all matches)

i -- Case-Insensitive

Makes the pattern match regardless of letter casing. Without i, /hello/ only matches "hello" but not "Hello" or "HELLO". With i, all casings match:

/hello/i   matches "hello", "Hello", "HELLO", "hElLo"

m -- Multiline

Changes the behavior of ^ and $. Without m, they match only the start and end of the entire string. With m, they also match the start and end of each line (after/before newline characters):

# Input (3 lines):
# "first line\nsecond line\nthird line"

/^second/      -- no match (start of string is "first")
/^second/m     -- matches "second" at the start of line 2

s -- DotAll (Single-Line)

By default, the dot . matches any character except newline (\n). The s flag makes the dot match newline characters as well, allowing patterns to span multiple lines:

# Match everything between START and END, even across lines
/START.*END/s

u -- Unicode

Enables full Unicode support. In JavaScript, this flag makes the regex engine treat the pattern and input as sequences of Unicode code points rather than UTF-16 code units. This is important for correctly handling characters outside the Basic Multilingual Plane, such as emojis and certain CJK characters:

// Without 'u': incorrect length for surrogate pairs
/^.$/u.test("😀")   // true (correct)
/^.$/.test("😀")    // false (incorrect -- emoji is 2 code units)

The u flag also enables Unicode property escapes like \p{Letter} and \p{Script=Greek} for matching characters by their Unicode properties.

8. Common Regex Patterns

Here are battle-tested regex patterns for the most common validation and extraction tasks. Each pattern includes an explanation of how it works.

Email Address

^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$

Breakdown: one or more alphanumeric characters (plus dots, underscores, percent signs, plus signs, and hyphens) before the @, followed by a domain name with at least one dot and a TLD of two or more letters. This handles the vast majority of real-world email addresses. Note that the full RFC 5322 specification allows more exotic formats, but this practical pattern covers 99.9% of valid addresses.

URL

^https?:\/\/(?:www\.)?[\w-]+(?:\.[\w-]+)+(?:\/[\w.,@?^=%&:\/~+#-]*)?$

Matches HTTP and HTTPS URLs with optional "www." prefix, domain name with at least one dot, and optional path with query parameters and fragments.

IPv4 Address

^(?:(?:25[0-5]|2[0-4]\d|[01]?\d\d?)\.){3}(?:25[0-5]|2[0-4]\d|[01]?\d\d?)$

Validates a proper IPv4 address where each octet is between 0 and 255. The pattern uses alternation to handle three ranges: 250-255, 200-249, and 0-199. A naive pattern like \d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3} would incorrectly accept values like "999.999.999.999".

Phone Number (US)

^(?:\+1[-.\s]?)?\(?\d{3}\)?[-.\s]?\d{3}[-.\s]?\d{4}$

Matches US phone numbers in various formats: "555-123-4567", "(555) 123-4567", "555.123.4567", "+1 555 123 4567", and similar variations. The optional \+1 prefix handles the country code.

Date (YYYY-MM-DD)

^\d{4}-(?:0[1-9]|1[0-2])-(?:0[1-9]|[12]\d|3[01])$

Validates ISO 8601 date format with basic range checking: months 01-12, days 01-31. Note that this does not validate whether the specific day exists in the given month (for example, it will accept "2026-02-31"). Full calendar validation requires additional logic beyond regex.

Hex Color Code

^#(?:[0-9a-fA-F]{3}){1,2}$

Matches CSS hex color codes in both 3-character shorthand (#FFF) and 6-character full (#FFFFFF) formats.

Strong Password

^(?=.*[a-z])(?=.*[A-Z])(?=.*\d)(?=.*[@$!%*?&])[A-Za-z\d@$!%*?&]{8,}$

Enforces a minimum of 8 characters with at least one lowercase letter, one uppercase letter, one digit, and one special character. Uses multiple positive lookaheads to check each requirement independently.

Username

^[a-zA-Z][a-zA-Z0-9._-]{2,19}$

Validates a username that starts with a letter, is 3-20 characters long, and contains only letters, digits, dots, underscores, and hyphens.

9. Regex in JavaScript, Python, and Go

While regex syntax is largely universal, each language has its own API for creating, compiling, and using regular expressions. Here is how to work with regex in three popular languages.

JavaScript

JavaScript provides regex as a first-class language feature with literal syntax and the RegExp constructor:

// Regex literal (preferred for static patterns)
const emailRegex = /^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$/;

// RegExp constructor (for dynamic patterns)
const pattern = new RegExp(`\\b${searchTerm}\\b`, 'gi');

// Testing
emailRegex.test("user@example.com");  // true

// Matching
const matches = "2026-03-12".match(/(\d{4})-(\d{2})-(\d{2})/);
// matches[0] = "2026-03-12"  (full match)
// matches[1] = "2026"        (group 1)
// matches[2] = "03"          (group 2)
// matches[3] = "12"          (group 3)

// matchAll (returns iterator of all matches with groups)
const text = "Call 555-1234 or 555-5678";
for (const match of text.matchAll(/(\d{3})-(\d{4})/g)) {
    console.log(`Found: ${match[0]} at index ${match.index}`);
}

// Replacing
"Hello World".replace(/world/i, "Regex");  // "Hello Regex"

// Replace with captured groups
"2026-03-12".replace(/(\d{4})-(\d{2})-(\d{2})/, "$2/$3/$1");
// "03/12/2026"

JavaScript supports named groups ((?<name>...)), lookbehind assertions, and the u (unicode) and s (dotAll) flags as of ES2018.

Python

Python provides the re module in the standard library. Patterns are typically written as raw strings (r"") to avoid backslash conflicts:

import re

# Compile a pattern (recommended for repeated use)
email_pattern = re.compile(r'^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$')

# Search: find first match anywhere in string
match = re.search(r'\d{4}-\d{2}-\d{2}', 'Date: 2026-03-12')
if match:
    print(match.group())  # "2026-03-12"

# Match: match only at the beginning of string
match = re.match(r'\d+', '123abc')
print(match.group())  # "123"

# findall: return all non-overlapping matches as a list
phones = re.findall(r'\d{3}-\d{4}', 'Call 555-1234 or 555-5678')
# ["555-1234", "555-5678"]

# finditer: return iterator of match objects
for m in re.finditer(r'(?P<area>\d{3})-(?P<number>\d{4})', text):
    print(f"Area: {m.group('area')}, Number: {m.group('number')}")

# sub: replace matches
result = re.sub(r'\bworld\b', 'Regex', 'Hello World', flags=re.IGNORECASE)
# "Hello Regex"

# split: split string by pattern
parts = re.split(r'[,;\s]+', 'one, two; three four')
# ["one", "two", "three", "four"]

Python uses (?P<name>...) for named groups (note the P). Flags are passed as arguments: re.IGNORECASE, re.MULTILINE, re.DOTALL, re.VERBOSE (allows comments and whitespace in patterns).

Go

Go provides the regexp package. Go uses the RE2 syntax, which intentionally omits backreferences and lookaround for guaranteed linear-time matching:

package main

import (
    "fmt"
    "regexp"
)

func main() {
    // Compile a pattern (use MustCompile for known-good patterns)
    emailRegex := regexp.MustCompile(
        `^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$`,
    )

    // MatchString: test if string matches
    fmt.Println(emailRegex.MatchString("user@example.com"))  // true

    // FindString: find first match
    re := regexp.MustCompile(`\d{4}-\d{2}-\d{2}`)
    fmt.Println(re.FindString("Date: 2026-03-12"))  // "2026-03-12"

    // FindAllString: find all matches
    phoneRe := regexp.MustCompile(`\d{3}-\d{4}`)
    matches := phoneRe.FindAllString("Call 555-1234 or 555-5678", -1)
    // ["555-1234", "555-5678"]

    // FindStringSubmatch: capture groups
    dateRe := regexp.MustCompile(`(\d{4})-(\d{2})-(\d{2})`)
    parts := dateRe.FindStringSubmatch("2026-03-12")
    // parts[0] = "2026-03-12", parts[1] = "2026",
    // parts[2] = "03", parts[3] = "12"

    // Named groups
    namedRe := regexp.MustCompile(
        `(?P<year>\d{4})-(?P<month>\d{2})-(?P<day>\d{2})`,
    )
    match := namedRe.FindStringSubmatch("2026-03-12")
    for i, name := range namedRe.SubexpNames() {
        if name != "" {
            fmt.Printf("%s: %s\n", name, match[i])
        }
    }

    // ReplaceAllString
    result := re.ReplaceAllString("Date: 2026-03-12", "REDACTED")
    fmt.Println(result)  // "Date: REDACTED"
}

Go's RE2 engine does not support lookahead, lookbehind, or backreferences. This is a deliberate design choice to guarantee O(n) matching time and prevent catastrophic backtracking. If you need these features in Go, consider pre-processing with a different approach.

10. Performance Tips: Avoiding Catastrophic Backtracking

Regex engines that use backtracking (which includes most engines except RE2 and Rust's regex crate) can exhibit worst-case exponential time complexity on certain patterns. This phenomenon is called catastrophic backtracking, and it can freeze your application or make it vulnerable to ReDoS (Regular expression Denial of Service) attacks.

What Causes It

Catastrophic backtracking occurs when the engine has too many ways to match (or fail to match) a string. The classic example involves nested quantifiers:

# Dangerous: nested quantifiers
(a+)+b

# On input "aaaaaaaaaaaaaaaaac" (many a's, no b):
# The engine tries every possible way to split the a's
# between the inner a+ and the outer +, leading to
# 2^n combinations where n is the number of a's.

Other dangerous patterns include:

How to Prevent It

Safe Alternatives to Dangerous Patterns

# Dangerous                  Safe alternative
(a+)+                       a+
(\w+\s*)+$                  [\w\s]+$
(.*?,)+                     (?:[^,]*,)+
(.+)+                       .+

11. Regex Best Practices

Following these best practices will help you write regex patterns that are correct, maintainable, and performant.

Start Simple, Then Refine

Do not try to write the perfect pattern on the first attempt. Start with a simple version that handles the common case, test it against real data, and incrementally add edge case handling. A pattern that covers 95% of cases and is readable is often better than one that covers 100% but is incomprehensible.

Use Anchors for Validation

When validating input (emails, phone numbers, dates), always use ^ and $ anchors. Without them, the pattern \d{5} will match any string that contains five consecutive digits, not just strings that are five digits.

Prefer Character Classes Over Dot

The dot (.) matches almost anything, which makes it a blunt instrument. Whenever possible, use a specific character class. Instead of .* to match "everything up to a quote", use [^"]* to match "everything that is not a quote". This is both faster and more correct.

Comment Complex Patterns

Many regex flavors support a verbose or extended mode that allows whitespace and comments inside the pattern. In Python, use re.VERBOSE:

email_pattern = re.compile(r"""
    ^                       # Start of string
    [a-zA-Z0-9._%+-]+       # Local part (before @)
    @                       # Literal @ sign
    [a-zA-Z0-9.-]+          # Domain name
    \.                      # Literal dot
    [a-zA-Z]{2,}            # TLD (2+ letters)
    $                       # End of string
""", re.VERBOSE)

Do Not Use Regex for Everything

Regex is powerful but not always the right tool. Do not use regex to:

Test Extensively

Always test your regex against a diverse set of inputs, including:

Keep Patterns Readable

If a regex pattern is too complex to understand at a glance, break it into smaller parts. Most languages let you build patterns by concatenating strings, and you can compose named sub-patterns:

// JavaScript: build a complex pattern from readable parts
const year  = '(?<year>\\d{4})';
const month = '(?<month>0[1-9]|1[0-2])';
const day   = '(?<day>0[1-9]|[12]\\d|3[01])';
const dateRegex = new RegExp(`^${year}-${month}-${day}$`);

12. Using Our Free Regex Tester Tool

Writing and debugging regular expressions is significantly easier with a dedicated testing tool. Our free Regex Tester provides an interactive environment for building, testing, and refining your patterns in real time.

Key Features

How to Use It

Enter your regex pattern in the pattern field, type or paste your test string, and select the flags you need. Matches are highlighted in the test string in real time. Click any match to see its captured groups and details. You can copy the pattern or the match results with a single click.

Whether you are debugging a tricky pattern, learning regex for the first time, or validating a production pattern against edge cases, our tool gives you immediate, visual feedback that makes working with regular expressions faster and more intuitive.

Test Your Regular Expressions Instantly

Stop guessing whether your regex works. Use our free Regex Tester to build, test, and debug patterns with real-time match highlighting and capture group display.

Try the Regex Tester Now

Related Articles

JSON Formatting and Validation: A Complete Guide

Master JSON syntax, formatting, validation, and common pitfalls with practical examples.

Understanding JWTs: A Complete Guide to JSON Web Tokens

Learn JWT structure, claims, signing algorithms, and security best practices for token-based authentication.

Cron Expressions Explained: Complete Guide to Cron Syntax

Master the 5-field cron format, special characters, 20+ examples, and cron in Kubernetes, GitHub Actions, and AWS.