Skip to content

Text Processing (grep, sed, awk)

Category: Linux Command Basics
Type: Linux Commands
Generated on: 2025-07-10 03:05:57
For: System Administration, Development & Technical Interviews


Text Processing Cheatsheet: grep, sed, awk (Linux)

Section titled “Text Processing Cheatsheet: grep, sed, awk (Linux)”

This cheatsheet provides a comprehensive guide to using grep, sed, and awk for text processing in Linux. It’s designed for both beginners and experienced users, covering basic syntax, practical examples, advanced techniques, and troubleshooting tips.

  • grep (Global Regular Expression Print): Searches for patterns within files and prints lines that match. Primarily used for finding text within files.

  • sed (Stream EDitor): Edits text streams. Used for find and replace, deleting lines, inserting text, and other text transformations. Operates on a line-by-line basis.

  • awk: A powerful text processing language that can be used for data extraction, report generation, and more complex text manipulations. Works by dividing each line into fields.

Terminal window
grep [OPTIONS] PATTERN [FILE...]
Terminal window
sed [OPTIONS] 'COMMAND' [FILE...]
Terminal window
awk [OPTIONS] 'CONDITION { ACTION }' [FILE...]
  • Find all lines containing “error” in logfile.txt:

    Terminal window
    grep "error" logfile.txt
    # Sample Output:
    # 2023-10-27 10:00:00 ERROR: Something went wrong
    # 2023-10-27 10:05:00 ERROR: Another error occurred
  • Find all lines that do not contain “success” in logfile.txt:

    Terminal window
    grep -v "success" logfile.txt
  • Find all lines starting with “DEBUG” (case-insensitive):

    Terminal window
    grep -i "^debug" logfile.txt
    # Sample Output:
    # DEBUG: Starting process
    # Debug: Another debug message
  • Replace the first occurrence of “old” with “new” in file.txt and print to stdout:

    Terminal window
    sed 's/old/new/' file.txt
    # Sample Input (file.txt):
    # This is an old file with an old problem.
    # Sample Output:
    # This is an new file with an old problem.
  • Replace all occurrences of “old” with “new” in file.txt and print to stdout:

    Terminal window
    sed 's/old/new/g' file.txt
    # Sample Input (file.txt):
    # This is an old file with an old problem.
    # Sample Output:
    # This is an new file with an new problem.
  • Replace all occurrences of “old” with “new” in file.txt and save the changes in place (use with caution!):

    Terminal window
    sed -i 's/old/new/g' file.txt
  • Delete all lines containing “error” in file.txt and print to stdout:

    Terminal window
    sed '/error/d' file.txt
    # Sample Input (file.txt):
    # This is a normal line.
    # This is an error line.
    # This is another normal line.
    # Sample Output:
    # This is a normal line.
    # This is another normal line.
  • Insert the line “New line” before the line containing “problem” in file.txt:

    Terminal window
    sed '/problem/i New line' file.txt
    # Sample Input (file.txt):
    # This is a line with a problem.
    # Sample Output:
    # New line
    # This is a line with a problem.
  • Print the first field of each line in data.txt:

    Terminal window
    awk '{print $1}' data.txt
    # Sample Input (data.txt):
    # John Doe 25
    # Jane Smith 30
    # Sample Output:
    # John
    # Jane
  • Print the first and third fields, separated by a comma, of each line in data.txt:

    Terminal window
    awk '{print $1 "," $3}' data.txt
    # Sample Input (data.txt):
    # John Doe 25
    # Jane Smith 30
    # Sample Output:
    # John,25
    # Jane,30
  • Print lines where the third field is greater than 25:

    Terminal window
    awk '$3 > 25 {print $0}' data.txt
    # Sample Input (data.txt):
    # John Doe 25
    # Jane Smith 30
    # Sample Output:
    # Jane Smith 30
  • Calculate the sum of the third field (assuming it’s a number):

    Terminal window
    awk '{sum += $3} END {print "Sum:", sum}' data.txt
    # Sample Input (data.txt):
    # John Doe 25
    # Jane Smith 30
    # Sample Output:
    # Sum: 55
  • -i: Case-insensitive search.
  • -v: Invert match (select non-matching lines).
  • -n: Show line numbers.
  • -c: Count matching lines.
  • -r or -R: Recursive search (through directories). -r follows symbolic links, -R does not.
  • -l: List only file names containing matches.
  • -w: Match whole words only.
  • -x: Match whole lines only.
  • -E: Use extended regular expressions (ERE).
  • -P: Use Perl-compatible regular expressions (PCRE).
  • -o: Print only the matching part of the line.
  • -A NUM: Print NUM lines after the matching line.
  • -B NUM: Print NUM lines before the matching line.
  • -C NUM: Print NUM lines around the matching line (context).
  • -i: Edit the file in-place. USE WITH CAUTION! Consider creating a backup first.
  • -n: Suppress default output (useful with p command).
  • -e: Execute multiple sed commands.
  • -f: Read sed commands from a file.
  • -r: Use extended regular expressions (ERE).
  • -F: Specify the field separator. Defaults to whitespace.
  • -v: Assign a variable value.
  • -f: Read awk program from a file.
  • Using grep with a regular expression to find IP addresses in a file:

    Terminal window
    grep -E "([0-9]{1,3}\.){3}[0-9]{1,3}" logfile.txt
    # Sample Output:
    # Connection from 192.168.1.100
    # Connection from 10.0.0.5
  • Combining grep and wc to count the number of lines containing a specific pattern:

    Terminal window
    grep "error" logfile.txt | wc -l
    # Sample Output:
    # 23
  • Using sed to replace multiple patterns in a single command:

    Terminal window
    sed -e 's/pattern1/replacement1/g' -e 's/pattern2/replacement2/g' file.txt
  • Using sed to extract a specific part of a line (e.g., a date) using capture groups:

    Terminal window
    sed -n 's/.*\(20[0-9][0-9]-[0-1][0-9]-[0-3][0-9]\).*/\1/p' logfile.txt
    # This extracts the date from lines like:
    # 2023-10-27 10:00:00 Message
    # Sample Output:
    # 2023-10-27
  • Backup before in-place edit:

    Terminal window
    sed -i.bak 's/old/new/g' file.txt # Creates file.txt.bak
  • Custom Field Separator:

    Terminal window
    awk -F':' '{print $1}' /etc/passwd # Prints usernames from /etc/passwd
  • Using awk to generate a CSV file from a space-separated file:

    Terminal window
    awk 'BEGIN {OFS=","} {print $1, $2, $3}' data.txt > output.csv
  • Using awk to format output:

    Terminal window
    awk '{printf "%-20s %5d\n", $1, $3}' data.txt
    # This prints the first field left-aligned in a 20-character field,
    # and the third field right-aligned in a 5-character field.
  • Using awk to process log files and generate reports:

    Terminal window
    awk '/error/ {count++} END {print "Total errors:", count}' logfile.txt
  • Piping: Combine commands for powerful text processing: cat file.txt | grep "pattern" | sed 's/old/new/g'
  • Regular Expressions: Master regular expressions for precise pattern matching. Use online regex testers to experiment.
  • Shell Variables: Use shell variables to store patterns or replacements: pattern="error"; grep "$pattern" logfile.txt
  • Testing with sed and awk: Always test your sed and awk commands without the -i option first to preview the changes.
  • Readability: For complex awk scripts, consider putting the script in a separate file and using the -f option.
  • grep: “Binary file (standard input) matches”: This means grep found a match in a binary file. Use -a to treat all files as text.
  • **sed: “unterminated s' command"**: This usually means you forgot a closing /in yours/old/new/` command.
  • awk: Incorrect field separation: Double-check your -F option or the default whitespace separation. Consider using FS variable in BEGIN block for more complex separators.
  • sed -i overwrites files unexpectedly: Double-check your command thoroughly before using -i. Always back up important files.
  • Performance: For large files, awk is often faster than sed for complex operations. grep is generally the fastest for simple pattern matching.
  • cut: Extract specific columns (fields) from a file.
  • tr: Translate or delete characters.
  • sort: Sort lines of text files.
  • uniq: Remove duplicate lines.
  • wc: Word, line, and character count.
  • head: Display the first few lines of a file.
  • tail: Display the last few lines of a file. Useful with -f for monitoring log files.
  • find: Find files based on various criteria, often used in conjunction with grep.

This cheatsheet provides a strong foundation for using grep, sed, and awk. Experiment with the examples and explore the advanced features to become proficient in text processing in Linux. Remember to always test your commands before applying them to critical data.