Advanced Text Processing and Regular Expressions
Category: Advanced Linux Administration
Type: Linux Commands
Generated on: 2025-07-10 03:16:05
For: System Administration, Development & Technical Interviews
Advanced Text Processing and Regular Expressions (Linux Commands) - Cheatsheet
Section titled “Advanced Text Processing and Regular Expressions (Linux Commands) - Cheatsheet”This cheatsheet provides a comprehensive guide to advanced text processing and regular expressions using common Linux commands, tailored for system administrators and developers.
1. Command Overview
Section titled “1. Command Overview”| Command | Description | When to Use |
|---|---|---|
grep | Globally search a Regular Expression and Print. Finds lines matching a pattern. | Searching logs, configuration files, code, or any text-based data for specific information. |
sed | Stream EDitor. Performs text transformations on a stream or file. | Replacing text, deleting lines, inserting content, or performing more complex text manipulations. |
awk | Pattern scanning and processing language. Processes data based on fields and records. | Parsing structured data (e.g., CSV, log files), performing calculations, and generating reports. |
tr | Translate or delete characters. Replaces or removes characters in a stream. | Converting case, removing control characters, and performing simple text transformations. |
cut | Remove sections from each line of files. Extracts specific columns from delimited data. | Extracting data from CSV files, log files, or any data where fields are separated by delimiters. |
paste | Merge lines of files. Joins lines from multiple files side-by-side. | Combining data from different files based on line number or a common field. |
join | Join lines of two files on a common field. Combines data from two files based on a specified key field. | Merging data from different files based on a common identifier (e.g., user ID, product ID). |
2. Basic Syntax
Section titled “2. Basic Syntax”grep [OPTIONS] PATTERN [FILE...]sed [OPTIONS] 'COMMAND' [FILE...]awk [OPTIONS] 'PATTERN { ACTION }' [FILE...]tr [OPTIONS] SET1 [SET2]cut [OPTIONS] [FILE...]paste [OPTIONS] [FILE...]join [OPTIONS] FILE1 FILE23. Practical Examples
Section titled “3. Practical Examples”-
Search for a specific string in a file:
Terminal window grep "error" /var/log/syslogOutput: (Lines from
/var/log/syslogcontaining “error”) -
Search for a string case-insensitively:
Terminal window grep -i "error" /var/log/syslogOutput: (Lines from
/var/log/syslogcontaining “error”, “Error”, “ERROR”, etc.) -
Search recursively in a directory:
Terminal window grep -r "password" /etc/Output: (File paths and lines containing “password” within the
/etc/directory and its subdirectories)
-
Replace the first occurrence of a string in a file:
Terminal window sed 's/old_string/new_string/' input.txtOutput: (Modified content of
input.txtprinted to standard output) -
Replace all occurrences of a string in a file:
Terminal window sed 's/old_string/new_string/g' input.txtOutput: (Modified content of
input.txtprinted to standard output) -
Replace all occurrences and write changes to the file (in-place):
Terminal window sed -i 's/old_string/new_string/g' input.txtWARNING: This modifies the file directly. Consider creating a backup first.
cp input.txt input.txt.bak -
Delete lines containing a specific string:
Terminal window sed '/error/d' /var/log/syslogOutput: (Lines from
/var/log/syslogexcluding those containing “error”)
-
Print the first column of a CSV file:
Terminal window awk -F',' '{print $1}' data.csvOutput: (The first column of each row in
data.csv, separated by newlines) -
Print lines where the third column is greater than 10:
Terminal window awk '$3 > 10 {print}' data.txtOutput: (Lines from
data.txtwhere the third field is greater than 10) -
Calculate the sum of the second column:
Terminal window awk '{sum += $2} END {print sum}' data.txtOutput: (The sum of the values in the second column of
data.txt)
-
Convert lowercase to uppercase:
Terminal window echo "hello world" | tr '[:lower:]' '[:upper:]'Output:
HELLO WORLD -
Delete specific characters:
Terminal window echo "hello world!" | tr -d '!'Output:
hello world -
Squeeze repeating characters:
Terminal window echo "hello world" | tr -s ' 'Output:
hello world
-
Extract the first field from a comma-separated file:
Terminal window cut -d ',' -f 1 data.csvOutput: (The first field of each line in
data.csv) -
Extract fields 1 and 3 from a tab-separated file:
Terminal window cut -f 1,3 --output-delimiter='|' data.tsvOutput: (Fields 1 and 3 of each line in
data.tsv, separated by|) -
Extract characters 1 to 5 from each line:
Terminal window cut -c 1-5 data.txtOutput: (The first 5 characters of each line in
data.txt)
-
Paste two files side by side, separated by a tab:
Terminal window paste file1.txt file2.txtOutput: (Lines from
file1.txtandfile2.txtmerged side-by-side, separated by a tab) -
Paste two files side by side, separated by a comma:
Terminal window paste -d ',' file1.txt file2.txtOutput: (Lines from
file1.txtandfile2.txtmerged side-by-side, separated by a comma)
-
Join two files based on the first field:
Terminal window join file1.txt file2.txt(Assuming both files have the common key in the first field and are sorted) Output: (Lines from both files joined based on matching values in the first field)
-
Join two files based on specific fields (File1 field 2, File2 field 1):
Terminal window join -1 2 -2 1 file1.txt file2.txtOutput: (Lines from both files joined based on matching values, where the key is the second field in
file1.txtand the first field infile2.txt) -
Join two files showing unmatched lines from file1 (-a 1):
Terminal window join -a 1 file1.txt file2.txtOutput: (All lines from file1, matched lines from file2, and unmatched lines from file1 marked with a default fill.)
4. Common Options
Section titled “4. Common Options”-i: Case-insensitive search.-v: Invert match (show lines that do not match).-ror-R: Recursive search.-n: Show line numbers.-c: Count the number of matching lines.-l: List only the files containing matches.-w: Match whole words only.-E: Interpret PATTERN as an extended regular expression.
-i: Edit the file in-place. WARNING: Use with caution!-n: Suppress default output (useful withpcommand).-e: Allow multiple commands (e.g.,sed -e 's/a/b/' -e 's/c/d/' file.txt).s/PATTERN/REPLACEMENT/: Substitute (replace) the first occurrence of PATTERN with REPLACEMENT.s/PATTERN/REPLACEMENT/g: Substitute (replace) all occurrences of PATTERN with REPLACEMENT./PATTERN/d: Delete lines matching PATTERN./PATTERN/p: Print lines matching PATTERN (useful with-n).i\: Insert text before a line.a\: Append text after a line.
-F: Specify the field separator. Defaults to whitespace. (e.g.,-F','for CSV files).BEGIN { ACTION }: Execute ACTION before processing any lines.END { ACTION }: Execute ACTION after processing all lines.$0: Represents the entire line.$1,$2, …: Represent the first, second, etc., fields.NF: Number of fields in the current record.NR: Number of the current record.
-d: Delete characters in SET1.-s: Squeeze repeating characters in SET1.-c: Complement SET1 (use characters not in SET1).
-d: Delimiter (character separating fields). Defaults to tab.-f: Field list (comma-separated list of field numbers).-c: Character list (comma-separated list of character positions).--output-delimiter: Specify a different output delimiter.
-d: Delimiter to use between pasted files. Default is tab.-s: Paste all lines from one file into a single line, separated by the delimiter.
-1 FIELD: Field to use as the join key in the first file.-2 FIELD: Field to use as the join key in the second file.-a FILE_NUMBER: Show unpairable lines from the specified file (1 or 2).-e STRING: Replace missing input fields with STRING.-t CHAR: Use CHAR as the field separator.
5. Advanced Usage
Section titled “5. Advanced Usage”grep with Regular Expressions
Section titled “grep with Regular Expressions”-
Find IP addresses in a file:
Terminal window grep -E '([0-9]{1,3}\.){3}[0-9]{1,3}' /var/log/nginx/access.logOutput: (Lines containing IP addresses from the access log)
-
Find lines starting with “ERROR” or “WARNING”:
Terminal window grep -E '^(ERROR|WARNING)' /var/log/syslogOutput: (Lines starting with ERROR or WARNING)
sed with Backreferences
Section titled “sed with Backreferences”-
Swap the first two words on each line:
Terminal window sed -E 's/^([^ ]*) ([^ ]*)/\2 \1/' input.txtOutput: (Modified lines with the first two words swapped)
awk with Multiple Actions and Conditions
Section titled “awk with Multiple Actions and Conditions”-
Calculate the average of the second column and print lines where the third column is above the average:
Terminal window awk '{sum += $2; count++} END {avg = sum / count; print "Average:", avg} $3 > avg {print $0}' data.txtOutput: (The average of the second column, followed by lines where the third column is above the average)
Combining Commands with Pipes
Section titled “Combining Commands with Pipes”-
Find all lines containing “error” in
/var/log/syslogand then count the number of occurrences of “authentication”:Terminal window grep "error" /var/log/syslog | grep "authentication" | wc -lOutput: (The number of lines containing both “error” and “authentication”)
-
Extract usernames from
/etc/passwdand sort them alphabetically:Terminal window cut -d ':' -f 1 /etc/passwd | sortOutput: (A sorted list of usernames from
/etc/passwd)
Complex sed script for HTML escaping:
Section titled “Complex sed script for HTML escaping:”sed 's/&/\&/g; s/</\</g; s/>/\>/g; s/"/\"/g; s/'"'"'/\'/g' input.txtThis script replaces special HTML characters with their corresponding HTML entities. It demonstrates the power of chained sed commands.
6. Tips & Tricks
Section titled “6. Tips & Tricks”- Use
manpages:man grep,man sed,man awk, etc. for detailed documentation. - Test regular expressions: Use online regex testers (e.g., regex101.com) to validate your patterns.
- Backup files before using
sed -i: Prevent data loss. - Use variables in
awk: Makes your scripts more readable and maintainable. - Quote your patterns: Especially if they contain special characters.
- Performance: For large files, consider using
awkorgrepwith optimized regular expressions for better performance. Avoid unnecessary piping. teefor debugging: Useteeto save intermediate output to a file for debugging purposes. e.g.,grep "pattern" input.txt | tee debug.txt | awk '{print $1}'- Use character classes:
[:alnum:],[:alpha:],[:digit:],[:lower:],[:upper:],[:punct:],[:space:]for more robust pattern matching.
7. Troubleshooting
Section titled “7. Troubleshooting”grepnot finding matches:- Check for typos in the pattern.
- Verify the case sensitivity. Use
-ifor case-insensitive search. - Ensure the file exists and is readable.
- Check for newline characters or other hidden characters that might be interfering with the pattern.
sed -icorrupting the file:- Always create a backup before using
-i. - Double-check the
sedcommand syntax. - If possible, test the command on a copy of the file first.
- Always create a backup before using
awknot processing fields correctly:- Verify the field separator using
-F. - Check for inconsistent field separators in the input data.
- Make sure the fields you are referencing (
$1,$2, etc.) actually exist.
- Verify the field separator using
joinnot working:- Ensure both files are sorted according to the join key. Use
sortif necessary. - Verify that the join key fields are correctly specified using
-1and-2. - Check for leading or trailing whitespace in the join key fields.
- Ensure both files are sorted according to the join key. Use
- Regular Expression errors:
- Ensure your regular expression syntax is correct for the command you’re using (e.g., basic vs. extended).
- Escape special characters properly (e.g.,
\.,\*,\+).
8. Related Commands
Section titled “8. Related Commands”sort: Sort lines of text files.uniq: Report or omit repeated lines.wc: Count lines, words, and characters.head: Output the first part of files.tail: Output the last part of files.diff: Compare files line by line.find: Search for files in a directory hierarchy.xargs: Build and execute command lines from standard input.iconv: Convert text from one character encoding to another.
This cheatsheet provides a solid foundation for advanced text processing and regular expressions in Linux. Remember to practice and experiment with these commands to become proficient in their use. Good luck!