Regex Guide — Regular Expressions for Developers

Learn regular expressions from basics to advanced: character classes, quantifiers, groups, lookaheads, and practical patterns for common tasks.

Regex Basics

A regular expression (regex) is a pattern that describes a set of strings. Supported in virtually every programming language, they're used for search, validation, and text extraction.

# Python
import re

# Basic match
re.search(r"hello", "hello world")  # Match object
re.search(r"xyz", "hello world")    # None

# Find all matches
re.findall(r"\d+", "price: $42 and $99")  # ['42', '99']

# Replace
re.sub(r"\s+", " ", "too   many    spaces")  # 'too many spaces'

Character Classes and Quantifiers

# Character classes
.        # any character except newline
\d       # digit [0-9]
\D       # non-digit
\w       # word character [a-zA-Z0-9_]
\W       # non-word character
\s       # whitespace [	

 ]
\S       # non-whitespace
[aeiou]  # any vowel
[^aeiou] # any non-vowel
[a-z]    # lowercase letter

# Quantifiers
?        # 0 or 1
*        # 0 or more
+        # 1 or more
{3}      # exactly 3
{2,5}    # 2 to 5
{3,}     # 3 or more

# Greedy vs lazy
.*       # greedy: match as much as possible
.*?      # lazy: match as little as possible

Groups and Capturing

# Capturing groups
pattern = r"(\d{4})-(\d{2})-(\d{2})"
match = re.search(pattern, "2026-03-14")
year, month, day = match.groups()   # ('2026', '03', '14')

# Named groups
pattern = r"(?P<year>\d{4})-(?P<month>\d{2})-(?P<day>\d{2})"
match = re.search(pattern, "2026-03-14")
match.group("year")   # '2026'

# Non-capturing group (grouping without capturing)
r"(?:https?|ftp)://\S+"

Lookaheads and Lookbehinds

# Positive lookahead: match X only if followed by Y
r"\d+(?= dollars)"     # matches '42' in '42 dollars' but not '42 euros'

# Negative lookahead: match X only if NOT followed by Y
r"\d+(?! dollars)"

# Positive lookbehind: match X only if preceded by Y
r"(?<=\$)\d+"          # matches '42' in '$42'

# Negative lookbehind
r"(?<!\$)\d+"

Common Patterns

# Email validation (simplified)
r"^[\w.+-]+@[\w-]+\.[\w.]+$"

# URL
r"https?://[\w./:%#&=?@-]+"

# IPv4
r"(?:\d{1,3}\.){3}\d{1,3}"

# Password: 8+ chars, upper, lower, digit
r"^(?=.*[A-Z])(?=.*[a-z])(?=.*\d).{8,}$"

# Extract JSON value
r'"key"\s*:\s*"([^"]+)"'

Frequently Asked Questions

Should I use regex to parse HTML?

No. Use a proper HTML parser like BeautifulSoup (Python) or cheerio (Node.js). HTML is not a regular language — regex can't reliably handle nested tags, attribute order, or encoding.

How do I test and debug regex patterns?

Use DevKits' online regex tester for real-time matching with explanations. regex101.com is another excellent tool with a full library of common patterns. In Python, compile patterns with re.compile() and use re.VERBOSE for multi-line patterns with comments.