Skip to content

Monitoring and Alerting Setup

Category: DevOps and System Tools
Type: Linux Commands
Generated on: 2025-07-10 03:23:18
For: System Administration, Development & Technical Interviews


Monitoring and Alerting Setup (Linux Commands - DevOps and System Tools) - Cheat Sheet

Section titled “Monitoring and Alerting Setup (Linux Commands - DevOps and System Tools) - Cheat Sheet”

This cheat sheet provides a comprehensive overview of Linux commands and tools used for monitoring and alerting, focusing on DevOps and system administration tasks.

This section covers commands for monitoring system resources, logs, and network traffic, and setting up alerts based on predefined thresholds.

  • top / htop: Real-time process monitoring and system resource usage. htop is a more user-friendly, interactive version of top.
  • vmstat: Virtual memory statistics - reports information about processes, memory, paging, block IO, traps, and CPU activity.
  • iostat: Input/output statistics for devices. Reports disk I/O activity.
  • df: Disk space usage. Reports file system disk space usage.
  • du: Disk usage per directory. Estimate file space usage.
  • free: Memory usage. Displays the total amount of free and used physical and swap memory in the system.
  • netstat / ss: Network statistics. ss is the modern replacement for netstat.
  • tcpdump: Network packet analyzer. Captures and analyzes network traffic.
  • ping: Tests network connectivity. Sends ICMP echo requests to a host.
  • traceroute: Traces the route packets take to a host.
  • uptime: System uptime and load average.
  • sar: System activity reporter. Collects, reports, and saves system activity information.
  • journalctl: View and manage systemd journal logs.
  • tail: Displays the last part of a file. Used for monitoring log files.
  • grep: Search for patterns in files. Used for filtering log files.
  • awk: Powerful text processing tool, useful for parsing log files and extracting data.
  • sed: Stream editor for transforming text.
  • watch: Executes a command periodically and displays the output.
  • sensors: Monitors hardware sensors, such as temperature and voltage.
  • uptime-kuma: Self-hosted monitoring tool with a web UI. (Requires installation)
  • Prometheus: Time-series database and monitoring system (Requires installation and configuration).
  • Grafana: Data visualization and dashboarding tool (Requires installation and configuration).
  • Alertmanager: Handles alerts sent by Prometheus (Requires installation and configuration).

This section outlines the basic syntax for each command.

  • top:

    Terminal window
    top [options]
  • htop:

    Terminal window
    htop [options]
  • vmstat:

    Terminal window
    vmstat [delay] [count]
  • iostat:

    Terminal window
    iostat [options] [device...] [interval] [count]
  • df:

    Terminal window
    df [options] [file...]
  • du:

    Terminal window
    du [options] [file...]
  • free:

    Terminal window
    free [options]
  • netstat / ss:

    Terminal window
    netstat [options]
    ss [options]
  • tcpdump:

    Terminal window
    tcpdump [options] [expression]
  • ping:

    Terminal window
    ping [options] host
  • traceroute:

    Terminal window
    traceroute [options] host
  • uptime:

    Terminal window
    uptime
  • sar:

    Terminal window
    sar [options] [interval] [count]
  • journalctl:

    Terminal window
    journalctl [options]
  • tail:

    Terminal window
    tail [options] file
  • grep:

    Terminal window
    grep [options] pattern [file...]
  • awk:

    Terminal window
    awk 'pattern { action }' file
  • sed:

    Terminal window
    sed 's/pattern/replacement/g' file
  • watch:

    Terminal window
    watch [options] command
  • sensors:

    Terminal window
    sensors

This section provides practical examples of using these commands.

  • top: Monitor CPU and memory usage.

    Terminal window
    top
    top - 14:32:15 up 1 day, 2:15, 1 user, load average: 0.01, 0.05, 0.08
    Tasks: 154 total, 1 running, 153 sleeping, 0 stopped, 0 zombie
    %Cpu(s): 0.3 us, 0.3 sy, 0.0 ni, 99.3 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
    KiB Mem : 1999948 total, 139048 free, 1356200 used, 504700 buff/cache
    KiB Swap: 2097148 total, 2097148 free, 0 used. 507156 avail Mem
    PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
    1 root 20 0 167308 5544 3860 S 0.0 0.3 0:06.25 systemd
    2 root 20 0 0 0 0 S 0.0 0.0 0:00.00 kthreadd
    3 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 rcu_gp
    4 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 rcu_par_gp
    6 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 kworker/0:0H-kblockd
    8 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 mm_percpu_wq
    9 root 20 0 0 0 0 S 0.0 0.0 0:00.00 ksoftirqd/0
    10 root 20 0 0 0 0 I 0.0 0.0 0:00.00 rcu_sched
  • htop: Interactive process monitoring.

    Terminal window
    htop

    (Requires installation: sudo apt install htop or sudo yum install htop)

  • vmstat 1 5: Show virtual memory stats every 1 second, 5 times.

    Terminal window
    vmstat 1 5
    procs -----------memory---------- ---swap-- -----io---- -system-- --------cpu--------
    r b swpd free buff cache si so bi bo in cs us sy id wa st
    0 0 0 140364 9860 506008 0 0 0 0 11 12 0 0 99 0 0
    0 0 0 140364 9860 506008 0 0 0 0 10 12 0 0 100 0 0
    0 0 0 140364 9860 506008 0 0 0 0 10 12 0 0 100 0 0
    0 0 0 140364 9860 506008 0 0 0 0 10 12 0 0 100 0 0
    0 0 0 140364 9860 506008 0 0 0 0 10 12 0 0 100 0 0
  • iostat -x 1 5: Show extended I/O statistics every 1 second, 5 times.

    Terminal window
    iostat -x 1 5
    Linux 5.15.0-101-generic (hostname) 11/03/2024 _x86_64_ (1 CPU)
    avg-cpu: %user %nice %system %iowait %steal %idle
    0.10 0.00 0.10 0.00 0.00 99.80
    Device r/s w/s rkB/s wkB/s rrqm/s wrqm/s %rrqm %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm %util
    sda 0.00 0.10 0.00 0.80 0.00 0.00 0.00 0.00 0.00 8.00 0.00 0.00 8.00 1.00 0.01
    avg-cpu: %user %nice %system %iowait %steal %idle
    0.00 0.00 0.00 0.00 0.00 100.00
    Device r/s w/s rkB/s wkB/s rrqm/s wrqm/s %rrqm %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm %util
    sda 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
  • df -h: Display disk space usage in human-readable format.

    Terminal window
    df -h
    Filesystem Size Used Avail Use% Mounted on
    udev 959M 0 959M 0% /dev
    tmpfs 197M 1.1M 196M 1% /run
    /dev/sda1 20G 7.9G 11G 43% /
    tmpfs 984M 0 984M 0% /dev/shm
    tmpfs 5.0M 0 5.0M 0% /run/lock
    /dev/sdb1 100G 60G 40G 60% /data
    tmpfs 197M 4.0K 197M 1% /run/user/1000
  • du -sh /var/log: Show the size of /var/log directory in human-readable format.

    Terminal window
    du -sh /var/log
    32M /var/log
  • free -m: Display memory usage in megabytes.

    Terminal window
    free -m
    total used free shared buff/cache available
    Mem: 1953 1323 136 78 493 504
    Swap: 2047 0 2047
  • ss -ltnp: Show listening TCP ports with process names.

    Terminal window
    ss -ltnp
    State Recv-Q Send-Q Local Address:Port Peer Address:Port Process
    LISTEN 0 4096 0.0.0.0:22 0.0.0.0:* users:(("sshd",pid=1133,fd=3))
    LISTEN 0 4096 [::]:22 [::]:* users:(("sshd",pid=1133,fd=4))
  • tcpdump -i eth0 -n port 80: Capture HTTP traffic on interface eth0.

    Terminal window
    tcpdump -i eth0 -n port 80

    (This will output a stream of captured packets. Stop with Ctrl+C)

  • ping -c 4 google.com: Ping google.com 4 times.

    Terminal window
    ping -c 4 google.com
    PING google.com (142.250.184.142) 56(84) bytes of data.
    64 bytes from fra16s36-in-f14.1e100.net (142.250.184.142): icmp_seq=1 ttl=117 time=6.41 ms
    64 bytes from fra16s36-in-f14.1e100.net (142.250.184.142): icmp_seq=2 ttl=117 time=6.50 ms
    64 bytes from fra16s36-in-f14.1e100.net (142.250.184.142): icmp_seq=3 ttl=117 time=6.74 ms
    64 bytes from fra16s36-in-f14.1e100.net (142.250.184.142): icmp_seq=4 ttl=117 time=6.65 ms
    --- google.com ping statistics ---
    4 packets transmitted, 4 received, 0% packet loss, time 3004ms
    rtt min/avg/max/mdev = 6.412/6.578/6.742/0.122 ms
  • traceroute google.com: Trace the route to google.com.

    Terminal window
    traceroute google.com

    (This will output a series of hops to the destination)

  • uptime: Display system uptime and load average.

    Terminal window
    uptime
    14:40:02 up 1 day, 2:22, 1 user, load average: 0.00, 0.01, 0.05
  • sar -u 1 5: Report CPU utilization every 1 second, 5 times.

    Terminal window
    sar -u 1 5
    Linux 5.15.0-101-generic (hostname) 11/03/2024 _x86_64_ (1 CPU)
    14:40:50 CPU %user %nice %system %iowait %steal %idle
    14:40:51 all 0.00 0.00 0.00 0.00 0.00 100.00
    14:40:52 all 0.00 0.00 0.00 0.00 0.00 100.00
    14:40:53 all 0.00 0.00 0.00 0.00 0.00 100.00
    14:40:54 all 0.00 0.00 0.00 0.00 0.00 100.00
    14:40:55 all 0.00 0.00 0.00 0.00 0.00 100.00
    Average: all 0.00 0.00 0.00 0.00 0.00 100.00
  • journalctl -xe: View systemd journal logs with explanations and errors.

    Terminal window
    journalctl -xe

    (This will display a large amount of log data. Use arrow keys to navigate.)

  • tail -f /var/log/syslog: Follow the syslog file and display new entries in real-time.

    Terminal window
    tail -f /var/log/syslog

    (This will continuously display new log entries. Stop with Ctrl+C)

  • grep "error" /var/log/syslog: Search for “error” in the syslog file.

    Terminal window
    grep "error" /var/log/syslog

    (This will output lines containing the word “error”)

  • awk '/error/ {print $0}' /var/log/syslog: Use awk to print lines containing “error” from syslog.

    Terminal window
    awk '/error/ {print $0}' /var/log/syslog
  • sed 's/error/WARNING/g' /var/log/syslog: Replace all occurrences of “error” with “WARNING” in syslog (output to stdout, doesn’t modify the file). To modify the file in place: sed -i 's/error/WARNING/g' /var/log/syslog

    Terminal window
    sed 's/error/WARNING/g' /var/log/syslog
  • watch -n 1 "free -m": Run free -m every 1 second and display the output.

    Terminal window
    watch -n 1 "free -m"
  • sensors: Display hardware sensor information (requires lm-sensors package).

    Terminal window
    sensors

    (Requires installation: sudo apt install lm-sensors or sudo yum install lm-sensors. You may need to run sudo sensors-detect after installation.)

This section lists common options for each command.

  • top:

    • -d <seconds>: Delay between updates.
    • -u <user>: Show processes for a specific user.
    • -p <pid>: Show processes for a specific PID.
    • Shift+M: Sort by memory usage.
    • Shift+P: Sort by CPU usage.
  • htop:

    • F1: Help.
    • F2: Setup.
    • F3: Search.
    • F6: Sort.
    • k: Kill process.
  • vmstat:

    • <delay>: Delay between updates in seconds.
    • <count>: Number of updates.
    • -s: Display event counters and memory statistics.
  • iostat:

    • -x: Extended statistics.
    • -d: Display only device statistics.
    • -p [device] : Display statistics for block devices and their partitions.
    • <interval>: Update interval in seconds.
    • <count>: Number of updates.
  • df:

    • -h: Human-readable format.
    • -T: Show file system type.
    • -i: Show inode information.
    • -a: Include pseudo, duplicate, inaccessible file systems.
  • du:

    • -h: Human-readable format.
    • -s: Summarize disk usage.
    • -c: Grand total.
    • -d <depth>: Limit directory depth.
  • free:

    • -m: Megabytes.
    • -g: Gigabytes.
    • -h: Human-readable.
    • -s <seconds>: Update interval.
    • -c <count>: Number of updates.
  • netstat / ss:

    • -l: Listening sockets.
    • -t: TCP sockets.
    • -u: UDP sockets.
    • -n: Numeric addresses (don’t resolve hostnames).
    • -p: Show process name and PID.
    • -a: All sockets.
    • -i: Show network interfaces table.
    • -r: Show routing table.
  • tcpdump:

    • -i <interface>: Specify the interface to listen on.
    • -n: Numeric addresses (don’t resolve hostnames).
    • -nn: Don’t resolve hostnames or port names.
    • -v: Verbose output.
    • -vv: More verbose output.
    • -w <file>: Write packets to a file.
    • -r <file>: Read packets from a file.
    • -c <count>: Capture only number of packets.
  • ping:

    • -c <count>: Number of pings.
    • -i <interval>: Interval between pings.
    • -s <size>: Packet size.
    • -t <ttl>: Time to live.
  • traceroute:

    • -m <max_hops>: Maximum hops.
    • -n: Numeric addresses (don’t resolve hostnames).
  • sar:

    • -u: CPU utilization.
    • -r: Memory utilization.
    • -d: Disk utilization.
    • -n DEV: Network device statistics.
    • -P ALL: Per-processor statistics.
    • -f <file>: Read data from a file.
  • journalctl:

    • -xe: Explain and show errors.
    • -f: Follow the log.
    • -u <unit>: Show logs for a specific unit (e.g., nginx.service).
    • --since <date>: Show logs since a specific date/time.
    • --until <date>: Show logs until a specific date/time.
    • -k: Show kernel messages.
    • -b: Show logs from the current boot.
    • -n <lines>: Show the last of the log.
  • tail:

    • -f: Follow the file.
    • -n <lines>: Show the last of the file.
    • +<lines>: Begin output at line number .
  • grep:

    • -i: Ignore case.
    • -v: Invert match (show lines that don’t match).
    • -r: Recursive search.
    • -n: Show line numbers.
    • -c: Count the number of matching lines.
    • -l: List file names containing matches.
    • -w: Match whole words only.
    • -A <num>: Print lines after the matching line.
    • -B <num>: Print lines before the matching line.
    • -C <num>: Print lines around the matching line.
  • awk:

    • -F <delimiter>: Specify the field delimiter.
    • -v var=value: Assign a value to a variable.
    • -f <file>: Read awk commands from a file.
  • sed:

    • -i: Edit the file in place. WARNING: Use with caution, as this modifies the original file.
    • -n: Suppress automatic printing of pattern space.
    • s/pattern/replacement/g: Substitute pattern with replacement globally.
    • d: Delete lines matching the pattern.
  • watch:

    • -n <seconds>: Interval between updates.
    • -d: Highlight the differences between successive updates.

This section covers more complex examples and combinations of commands.

  • Combining tail, grep, and awk for real-time log analysis:

    Terminal window
    tail -f /var/log/nginx/error.log | grep "error" | awk '{print $1, $3, $7}'

    This command tails the Nginx error log, filters lines containing “error”, and then prints the timestamp, log level, and error message using awk.

  • Monitoring disk space usage and sending an email alert:

    #!/bin/bash
    THRESHOLD=90 # Disk usage threshold in percentage
    USAGE=$(df -h / | awk 'NR==2 {print $5}' | tr -d '%')
    if [ "$USAGE" -gt "$THRESHOLD" ]; then
    echo "Disk space on / is above $THRESHOLD%: $USAGE%" | mail -s "Disk Space Alert" admin@example.com
    fi

    This script checks disk usage on / and sends an email alert if it exceeds 90%. Save this script (e.g., disk_check.sh), make it executable (chmod +x disk_check.sh), and schedule it with cron (e.g., every 5 minutes).

  • Using sar to identify performance bottlenecks:

    Terminal window
    sar -u -d 1 10 # CPU and disk utilization every 1 second for 10 seconds

    Analyze the output to identify high CPU usage, disk I/O bottlenecks, or other performance issues.

  • Creating a custom monitoring dashboard with watch:

    Terminal window
    watch -n 5 'echo "CPU Usage:"; sar -u 1 1 | tail -1; echo "Memory Usage:"; free -m | tail -1; echo "Disk I/O:"; iostat -x 1 1 | tail -1'

    This command creates a simple dashboard that displays CPU usage, memory usage, and disk I/O statistics every 5 seconds. This is a very basic example, and more sophisticated dashboards can be created using tools like Grafana.

  • Using journalctl to debug service startup issues:

    Terminal window
    journalctl -u myapp.service -b # Show logs for myapp.service from the current boot

    This command is useful for troubleshooting issues that occur during service startup.

  • Monitoring network traffic with tcpdump and analyzing with tshark (Wireshark CLI):

    Terminal window
    tcpdump -i eth0 -w capture.pcap # Capture traffic to a file
    # (After capturing traffic)
    tshark -r capture.pcap -T fields -e ip.src -e ip.dst -e tcp.port -e frame.len # Extract specific fields

    This captures network traffic and then uses tshark to extract relevant data like source and destination IPs, ports, and frame length. This is helpful for network troubleshooting and security analysis.

  • Use aliases for frequently used commands: Add aliases to your ~/.bashrc or ~/.zshrc file. For example:

    Terminal window
    alias dfh='df -h'
    alias topm='top -o %MEM' # Sort by memory
    alias tailf='tail -f'
  • Combine commands with pipes for powerful filtering and analysis: As demonstrated in the examples above, piping commands together allows for complex data processing.

  • Use nohup to run monitoring commands in the background:

    Terminal window
    nohup tail -f /var/log/myapp.log > myapp.log.out 2>&1 &

    This command runs tail -f in the background, redirecting output to myapp.log.out. The 2>&1 redirects standard error to standard output.

  • Learn regular expressions for more powerful grep, awk, and sed usage: Regular expressions are essential for advanced text processing.

  • Consider using specialized monitoring tools like Prometheus and Grafana for production environments: These tools provide more advanced features like data visualization, alerting, and historical data analysis. They require setup and configuration but are well worth the effort for complex environments.

  • Use screen or tmux for persistent terminal sessions: This allows you to keep monitoring commands running even if your SSH connection is interrupted.

  • top or htop not showing all processes: Ensure you have sufficient permissions to view all processes. Try running with sudo.

  • df or du showing incorrect disk usage: Ensure the file system is mounted correctly and there are no hidden files or directories consuming space.

  • netstat not working: netstat is deprecated. Use ss instead.

  • tcpdump capturing no traffic: Double-check the interface name and filter expression. Ensure you have permissions to capture traffic.

  • journalctl showing no logs: Ensure the systemd journal is running and configured correctly. Check the /etc/systemd/journald.conf file.

  • sensors not working: Make sure you have the lm-sensors package installed and have run sudo sensors-detect.

  • Email alerts not being sent: Verify that your system is configured to send email and that the recipient address is valid. Check your mail server logs for errors.

  • ps: Process status. Provides a snapshot of current processes.
  • kill: Sends a signal to a process. Used to terminate or control processes.
  • killall: Kills processes by name.
  • systemctl: Controls systemd services.
  • crontab: Manages cron jobs (scheduled tasks).
  • lsof: List open files. Useful for identifying which processes are using specific files or network ports.
  • strace: Trace system calls made by a process. Useful for debugging.
  • dmesg: Display kernel messages.
  • iptables / nftables: Firewall configuration.

This cheat sheet provides a solid foundation for monitoring and alerting on Linux systems. Remember to adapt these commands and techniques to your specific needs and environment. Always test thoroughly before implementing changes in a production environment.