Regex Basics
A regular expression (regex) is a pattern that describes a set of strings. Supported in virtually every programming language, they're used for search, validation, and text extraction.
# Python
import re
# Basic match
re.search(r"hello", "hello world") # Match object
re.search(r"xyz", "hello world") # None
# Find all matches
re.findall(r"\d+", "price: $42 and $99") # ['42', '99']
# Replace
re.sub(r"\s+", " ", "too many spaces") # 'too many spaces'
Character Classes and Quantifiers
# Character classes
. # any character except newline
\d # digit [0-9]
\D # non-digit
\w # word character [a-zA-Z0-9_]
\W # non-word character
\s # whitespace [
]
\S # non-whitespace
[aeiou] # any vowel
[^aeiou] # any non-vowel
[a-z] # lowercase letter
# Quantifiers
? # 0 or 1
* # 0 or more
+ # 1 or more
{3} # exactly 3
{2,5} # 2 to 5
{3,} # 3 or more
# Greedy vs lazy
.* # greedy: match as much as possible
.*? # lazy: match as little as possible
Groups and Capturing
# Capturing groups
pattern = r"(\d{4})-(\d{2})-(\d{2})"
match = re.search(pattern, "2026-03-14")
year, month, day = match.groups() # ('2026', '03', '14')
# Named groups
pattern = r"(?P<year>\d{4})-(?P<month>\d{2})-(?P<day>\d{2})"
match = re.search(pattern, "2026-03-14")
match.group("year") # '2026'
# Non-capturing group (grouping without capturing)
r"(?:https?|ftp)://\S+"
Lookaheads and Lookbehinds
# Positive lookahead: match X only if followed by Y
r"\d+(?= dollars)" # matches '42' in '42 dollars' but not '42 euros'
# Negative lookahead: match X only if NOT followed by Y
r"\d+(?! dollars)"
# Positive lookbehind: match X only if preceded by Y
r"(?<=\$)\d+" # matches '42' in '$42'
# Negative lookbehind
r"(?<!\$)\d+"
Common Patterns
# Email validation (simplified)
r"^[\w.+-]+@[\w-]+\.[\w.]+$"
# URL
r"https?://[\w./:%#&=?@-]+"
# IPv4
r"(?:\d{1,3}\.){3}\d{1,3}"
# Password: 8+ chars, upper, lower, digit
r"^(?=.*[A-Z])(?=.*[a-z])(?=.*\d).{8,}$"
# Extract JSON value
r'"key"\s*:\s*"([^"]+)"'
Frequently Asked Questions
Should I use regex to parse HTML?
No. Use a proper HTML parser like BeautifulSoup (Python) or cheerio (Node.js). HTML is not a regular language — regex can't reliably handle nested tags, attribute order, or encoding.
How do I test and debug regex patterns?
Use DevKits' online regex tester for real-time matching with explanations. regex101.com is another excellent tool with a full library of common patterns. In Python, compile patterns with re.compile() and use re.VERBOSE for multi-line patterns with comments.