Text Processing Classics
1. sed — Stream Editor
sed performs text transformations on input streams. The most common use case is in-place substitution:
# Replace first occurrence per line
sed 's/foo/bar/' file.txt
# Replace ALL occurrences (global flag)
sed 's/foo/bar/g' file.txt
# In-place edit (creates backup with .bak)
sed -i.bak 's/localhost/production.example.com/g' config.yml
# Delete lines matching pattern
sed '/^#/d' config.ini # Remove comment lines
# Print lines 5-10
sed -n '5,10p' access.log
# Multiple operations
sed -e 's/foo/bar/g' -e '/^$/d' file.txt
2. awk — Data Extraction and Reporting
awk processes text line by line, splitting each line into fields. Think of it as a mini programming language for text:
# Print specific columns (field separator: space by default)
awk '{print $1, $3}' access.log
# Use custom field separator
awk -F: '{print $1, $3}' /etc/passwd # username:uid
# Sum a column
awk '{sum += $5} END {print "Total:", sum}' sales.csv
# Filter rows
awk '$3 > 100 {print $1, $3}' data.txt
# Count occurrences
awk '{count[$1]++} END {for (ip in count) print count[ip], ip}' \
access.log | sort -rn | head -10
3. jq — JSON Processor
jq is the de facto standard for parsing and transforming JSON on the command line. Essential for working with APIs:
# Pretty print JSON
curl -s api.example.com/users | jq .
# Extract a field
jq '.name' user.json
jq '.[].name' users.json # All names from array
# Filter array
jq '[.[] | select(.age > 18)]' users.json
# Transform structure
jq '[.[] | {name: .name, email: .email}]' users.json
# Get keys
jq 'keys' object.json
# Format for shell scripts
NAME=$(curl -s api.example.com/user/1 | jq -r '.name')
# Validate JSON — also try DevKits JSON Formatter at /tools/json-formatter.html
echo '{"key": "value"}' | jq . > /dev/null && echo "Valid JSON"
Modern Replacements
4. ripgrep (rg) — Faster grep
ripgrep is 10-100x faster than grep, respects .gitignore, and supports Unicode by default:
# Basic search
rg "pattern" .
# Search specific file types
rg "TODO" --type py
# Case insensitive
rg -i "error" logs/
# Show context (3 lines around match)
rg -C 3 "segfault" /var/log/
# Count matches per file
rg --count "import" src/
5. bat — Better cat
bat is a cat clone with syntax highlighting, line numbers, and Git diff indicators:
bat README.md
bat --language python script.py
bat --paging never config.yml # Disable paging
# As a man page reader
export MANPAGER="sh -c 'col -bx | bat -l man -p'"
6. fzf — Fuzzy Finder
fzf is an interactive fuzzy finder for any list of items. The most productive terminal tool you can install:
# Interactive file finder
vim $(fzf)
# Fuzzy history search (Ctrl+R replacement)
history | fzf
# Kill a process interactively
kill -9 $(ps aux | fzf | awk '{print $2}')
# Git branch switcher
git checkout $(git branch | fzf)
# Shell integration (add to .bashrc/.zshrc)
# Ctrl+T: paste selected files
# Ctrl+R: search command history
# Alt+C: cd into selected directory
eval "$(fzf --bash)"
System Monitoring
7. htop — Interactive Process Viewer
htop # Interactive process viewer
htop -u username # Filter by user
htop -p 1234,5678 # Monitor specific PIDs
8. ncdu — Disk Usage Analyzer
ncdu / # Analyze entire filesystem
ncdu ~ # Analyze home directory interactively
9. ss — Socket Statistics (netstat replacement)
ss -tlnp # TCP listening sockets with process names
ss -tunp # All TCP/UDP connections
ss -s # Summary statistics
File and Data Tools
10. fd — find replacement
fd ".py" # Find Python files
fd -t f -e ".log" . # Find .log files only
fd --hidden ".env" # Include hidden files
11. xargs — Build and Execute Commands
# Delete all .tmp files
find . -name "*.tmp" | xargs rm
# Run in parallel (-P 4 = 4 parallel processes)
cat urls.txt | xargs -P 4 -I {} curl -sO {}
# Null-delimited (safe for filenames with spaces)
find . -name "*.log" -print0 | xargs -0 gzip
12. curl — HTTP Client
# GET with headers
curl -H "Authorization: Bearer $TOKEN" https://api.example.com/users
# POST JSON
curl -X POST -H "Content-Type: application/json" \
-d '{"name":"Alice"}' https://api.example.com/users
# Save to file, follow redirects, show progress
curl -L -o output.html https://example.com
# Show only HTTP status code
curl -o /dev/null -s -w "%{http_code}" https://example.com
Terminal Multiplexing
13. tmux — Terminal Multiplexer
tmux new -s myproject # New session named "myproject"
tmux attach -t myproject # Attach to existing session
# Key bindings (Ctrl+B prefix):
# Ctrl+B c — new window
# Ctrl+B " — split horizontal
# Ctrl+B % — split vertical
# Ctrl+B d — detach (session persists)
# Ctrl+B [ — scroll mode
14. watch — Repeat Command Periodically
watch -n 2 "docker stats --no-stream" # Update every 2s
watch -d "ls -la /tmp" # Highlight changes
Text Manipulation
15. sort and uniq
# Count unique IP addresses in log
awk '{print $1}' access.log | sort | uniq -c | sort -rn | head -20
# Sort by 3rd column numerically
sort -k3 -n data.csv
# Find duplicate lines
sort file.txt | uniq -d
16. cut — Extract Columns
cut -d: -f1,3 /etc/passwd # Fields 1 and 3, colon delimiter
cut -c1-10 file.txt # First 10 characters per line
ls -la | cut -c40- # Filenames from ls output
17. tr — Translate Characters
echo "hello world" | tr '[:lower:]' '[:upper:]' # HELLO WORLD
echo "hello world" | tr -s ' ' # Squeeze spaces
cat file.txt | tr -d '\r' # Remove Windows line endings
Debugging and Tracing
18. strace — System Call Tracer
strace -p 1234 # Trace running process
strace -e trace=network ls # Only network syscalls
strace -c python script.py # Count syscalls
19. lsof — List Open Files
lsof -i :8080 # What's using port 8080?
lsof -u username # Files opened by user
lsof +D /mnt/disk # Files on a mount point
20. tee — Split Output
# Write to file AND stdout simultaneously
make build 2>&1 | tee build.log
# Append mode
./test.sh | tee -a test-results.log
Frequently Asked Questions
What is the difference between sed and awk?
Use sed for simple substitutions and line deletions. Use awk when you need to work with specific columns, perform calculations, or apply conditional logic. awk is a complete programming language; sed is for stream editing.
How do I install fzf, ripgrep, and bat?
All are available via package managers: apt install fzf ripgrep bat on Debian/Ubuntu, brew install fzf ripgrep bat on macOS, or cargo install ripgrep bat via Rust's package manager.
Is jq available on macOS?
Yes — brew install jq. Also available as a web tool: use DevKits JSON Formatter for visual JSON exploration without installing anything. For complex regex used in text processing, try DevKits Regex Tester.