Skip to content

Troubleshooting Methodologies

Category: DevOps and System Tools
Type: Linux Commands
Generated on: 2025-07-10 03:23:47
For: System Administration, Development & Technical Interviews


Troubleshooting Methodologies Cheatsheet (Linux Commands - DevOps and System Tools)

Section titled “Troubleshooting Methodologies Cheatsheet (Linux Commands - DevOps and System Tools)”

This cheatsheet covers essential Linux commands and tools for troubleshooting, focusing on practical examples relevant to DevOps and System Administration.

1. Command Overview

CommandDescriptionWhen to Use
pingTest network connectivity to a host.Verify if a host is reachable, diagnose network latency.
tracerouteTrace the route packets take to a destination.Identify network bottlenecks, diagnose routing issues.
netstat/ssDisplay network connections, routing tables, interface statistics.Monitor network traffic, identify listening ports, diagnose connection problems.
tcpdumpCapture and analyze network traffic.Deep dive into network communication, debug protocol issues, analyze malicious traffic.
lsofList open files and the processes using them.Identify which process is using a specific file or port, troubleshoot file locking issues.
psDisplay running processes.Monitor resource usage, identify runaway processes, diagnose performance issues.
top/htopDisplay real-time system resource usage.Identify CPU, memory, and I/O bottlenecks, monitor process activity.
vmstatReport virtual memory statistics.Analyze memory usage, identify swapping issues, diagnose performance problems related to memory.
iostatReport CPU utilization and disk I/O statistics.Identify disk I/O bottlenecks, monitor disk performance, diagnose slow application performance.
dfDisplay disk space usage.Monitor disk space, identify full filesystems, prevent application failures due to lack of disk space.
duEstimate file space usage.Identify large files and directories, reclaim disk space.
freeDisplay amount of free and used memory in the system.Monitor memory usage, identify memory leaks, diagnose performance problems related to memory.
uptimeShow how long the system has been running.Quick check of system availability and load.
dmesgDisplay kernel messages.Diagnose hardware issues, identify driver problems, troubleshoot system boot issues.
journalctlQuery the systemd journal.Analyze system logs, troubleshoot service failures, diagnose system events.
straceTrace system calls made by a process.Debug application behavior, identify system call errors, understand how a process interacts with the kernel.
ncArbitrary TCP and UDP connections and listens.Test network ports, transfer files, simple server/client interactions.
curl/wgetTransfer data from or to a server.Test API endpoints, download files, debug HTTP requests.
systemctlControl the systemd system and service manager.Manage services, troubleshoot service failures, monitor service status.

2. Basic Syntax

  • General: command [options] [arguments]
  • Piping: command1 | command2 | command3 (Output of command1 becomes input of command2, etc.)
  • Redirection:
    • >: Redirect output to a file (overwrite).
    • >>: Redirect output to a file (append).
    • 2>: Redirect standard error to a file.
    • &>: Redirect both standard output and standard error to a file.

3. Practical Examples

  • ping: Check connectivity to Google.

    Terminal window
    ping google.com
    PING google.com (142.250.185.142) 56(84) bytes of data.
    64 bytes from fra16s36-in-f14.1e100.net (142.250.185.142): icmp_seq=1 ttl=118 time=7.58 ms
    64 bytes from fra16s36-in-f14.1e100.net (142.250.185.142): icmp_seq=2 ttl=118 time=7.46 ms
    ^C
    --- google.com ping statistics ---
    2 packets transmitted, 2 received, 0% packet loss, time 1002ms
    rtt min/avg/max/mdev = 7.465/7.526/7.588/0.061 ms
  • traceroute: Trace the route to Google.

    Terminal window
    traceroute google.com
    traceroute to google.com (142.250.185.142), 30 hops max, 60 byte packets
    1 _gateway (192.168.1.1) 1.344 ms 1.244 ms 1.187 ms
    2 10.0.0.1 (10.0.0.1) 10.239 ms 10.310 ms 10.267 ms
    3 * * *
    4 172.217.160.174 (172.217.160.174) 12.558 ms 13.294 ms 12.435 ms
    5 142.250.185.142 (142.250.185.142) 12.381 ms 12.384 ms 12.382 ms
  • ss: Show listening ports.

    Terminal window
    ss -tlpn
    State Recv-Q Send-Q Local Address:Port Peer Address:Port Process
    LISTEN 0 128 0.0.0.0:22 0.0.0.0:* users:(("sshd",pid=1234,fd=3))
    LISTEN 0 128 [::]:22 [::]:* users:(("sshd",pid=1234,fd=4))
  • tcpdump: Capture HTTP traffic on port 80.

    Terminal window
    sudo tcpdump -i eth0 port 80

    (Output will vary depending on traffic)

    15:34:56.789012 IP mymachine.example.com.54321 > webserver.example.com.80: Flags [S], seq 12345, win 65535, options [mss 1460,nop,wscale 5,nop,nop,TS val 123456789 ecr 0,sackOK,eol], length 0

    WARNING: tcpdump can generate large amounts of data. Use filters to narrow the capture. Always run with sudo to capture all packets.

  • lsof: Find the process using port 80.

    Terminal window
    lsof -i :80
    COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
    nginx 1234 root 6u IPv4 12345 0t0 TCP *:http (LISTEN)
  • ps: List all processes owned by the current user.

    Terminal window
    ps -u $USER
    PID TTY TIME CMD
    1234 pts/0 00:00:00 bash
    5678 pts/0 00:00:00 ps
  • top: Monitor system resources.

    Terminal window
    top

    (Interactive display of CPU, memory, and process information)

  • vmstat: Check virtual memory statistics every 5 seconds.

    Terminal window
    vmstat 5

    (Output shows memory, swap, and I/O statistics.)

  • iostat: Check disk I/O statistics for device sda every 5 seconds.

    Terminal window
    iostat -d sda 5

    (Output shows disk read/write speeds and CPU utilization.)

  • df: Show disk space usage in human-readable format.

    Terminal window
    df -h
    Filesystem Size Used Avail Use% Mounted on
    /dev/sda1 20G 10G 9.0G 53% /
    /dev/sdb1 100G 20G 80G 20% /data
  • du: Find the size of the current directory.

    Terminal window
    du -sh .
    1.2G .
  • free: Show memory usage in human-readable format.

    Terminal window
    free -h
    total used free shared buff/cache available
    Mem: 7.7G 2.1G 4.5G 172M 1.1G 5.3G
    Swap: 2.0G 0B 2.0G
  • uptime: Check system uptime and load average.

    Terminal window
    uptime
    15:34:56 up 1 day, 2:34, 1 user, load average: 0.01, 0.02, 0.00
  • dmesg: Display recent kernel messages.

    Terminal window
    dmesg | tail -n 20

    (Output shows recent kernel events, errors, and warnings.)

  • journalctl: Show logs for the nginx service.

    Terminal window
    journalctl -u nginx.service

    (Output shows logs related to the nginx service.)

  • strace: Trace system calls of the ls command.

    Terminal window
    strace ls -l

    (Output shows detailed system calls made by the ls command.)

  • nc: Test if port 80 is open on a remote server.

    Terminal window
    nc -vz google.com 80
    Connection to google.com 80 port [tcp/http] succeeded!
  • curl: Get the contents of a webpage.

    Terminal window
    curl https://www.google.com

    (Output shows the HTML source code of the Google homepage.)

  • systemctl: Check the status of the nginx service.

    Terminal window
    systemctl status nginx.service

    (Output shows the status, logs, and other information about the nginx service.)

4. Common Options

CommandOptionDescription
ping-c <count>Send only <count> number of packets.
ping-i <interval>Wait <interval> seconds between sending each packet.
traceroute-m <max_hops>Set the maximum number of hops to <max_hops>.
netstat/ss-tShow TCP connections.
netstat/ss-uShow UDP connections.
netstat/ss-lShow listening sockets.
netstat/ss-pShow the PID and name of the program to which each socket belongs.
netstat/ss-nShow numerical addresses instead of resolving hostnames.
tcpdump-i <interface>Specify the network interface to listen on.
tcpdump-nDon’t resolve hostnames.
tcpdump-w <file>Write the captured packets to a file.
lsof-iList files using internet protocol.
lsof-p <pid>List files opened by process with PID <pid>.
ps-efShow all processes.
ps-u <user>Show processes owned by user <user>.
top/htop-u <user>Show processes owned by user <user>.
df-hDisplay sizes in human-readable format.
df-TDisplay filesystem type.
du-hDisplay sizes in human-readable format.
du-sDisplay only a total for each argument.
du-xSkip directories on different filesystems.
free-hDisplay sizes in human-readable format.
journalctl-u <unit>Show logs for the specified systemd unit.
journalctl-fFollow log output (like tail -f).
journalctl-n <lines>Show the last <lines> lines.
strace-p <pid>Attach to a running process with PID <pid>.
strace-o <file>Write the trace output to a file.
curl-IShow only the response headers.
curl-vVerbose mode (show request and response headers).
curl-X <method>Specify the HTTP method (e.g., GET, POST, PUT, DELETE).
systemctlstart <service>Start a service.
systemctlstop <service>Stop a service.
systemctlrestart <service>Restart a service.
systemctlreload <service>Reload a service configuration.

5. Advanced Usage

  • Combining ps, grep, and kill: Kill all processes matching a pattern.

    Terminal window
    ps -ef | grep "my_process" | grep -v grep | awk '{print $2}' | xargs kill -9

    WARNING: Use with caution. kill -9 forcefully terminates processes and can lead to data loss if the process is not properly shut down. Always test in a non-production environment first.

  • Using tcpdump with BPF filters: Capture only traffic to or from a specific IP address and port.

    Terminal window
    sudo tcpdump -i eth0 "host 192.168.1.100 and port 80"
  • Analyzing slow queries with strace: Identify performance bottlenecks in database queries.

    Terminal window
    strace -p <mysql_pid> -T -e trace=network -s 200 -o strace.log

    This will trace network related system calls for the MySQL process, showing the time spent in each call and write the output to strace.log. -T shows the time spent in syscalls, -e trace=network filters only network related calls, and -s 200 increases the string size to 200 bytes for better readability.

  • Monitoring network traffic with nethogs: Identify bandwidth-hogging processes in real-time. (Install nethogs first).

    Terminal window
    sudo nethogs
  • Troubleshooting DNS resolution issues with dig and nslookup:

    Terminal window
    dig google.com
    nslookup google.com

    These utilities provide information about DNS records and name resolution.

6. Tips & Tricks

  • Use history | grep <keyword>: Search your command history for specific commands.
  • Use !!: Repeat the last command.
  • Use !$: Use the last argument of the previous command.
  • Use aliases: Create shortcuts for frequently used commands in your ~/.bashrc file.
    Terminal window
    alias gs="git status"
    alias la="ls -la"
  • Use tab completion: Press Tab to auto-complete commands, filenames, and options.
  • Use man <command>: Read the manual page for a command.
  • Use --help: Get a brief overview of command options.
  • Use watch: Run a command repeatedly and display the output. watch -n 1 df -h (Run df -h every 1 second).
  • Use set -x to debug shell scripts: Enable tracing to see each command executed. Use set +x to disable it.

7. Troubleshooting

  • “Command not found”:
    • Verify that the command is installed.
    • Check that the command’s directory is in your PATH environment variable. echo $PATH
  • “Permission denied”:
    • Use sudo if you need root privileges.
    • Check file permissions with ls -l. Use chmod to modify permissions.
  • “Network is unreachable”:
    • Verify network connectivity.
    • Check routing tables.
    • Check firewall rules.
  • High CPU usage:
    • Use top or htop to identify the process consuming the most CPU.
    • Use strace to analyze the process’s system calls.
  • High memory usage:
    • Use top or htop to identify the process consuming the most memory.
    • Use free -h to check overall memory usage.
    • Check for memory leaks in applications.
  • Disk full:
    • Use df -h to identify the full filesystem.
    • Use du -sh /* to identify large directories.
    • Remove unnecessary files.
  • Service failing to start:
    • Check the service’s logs using journalctl -u <service.service>.
    • Check the service’s configuration files for errors.
    • Verify that all dependencies are met.
  • Slow network performance:
    • Use ping and traceroute to identify network latency.
    • Use tcpdump to analyze network traffic.
    • Check network interface statistics with ifconfig or ip addr.

8. Related Commands

  • iptables: Configure the Linux firewall.
  • firewalld: Another firewall management tool.
  • rsync: Remote file synchronization.
  • scp: Secure copy (copy files over SSH).
  • ssh: Secure shell (remote login).
  • tmux: Terminal multiplexer.
  • screen: Another terminal multiplexer.
  • awk: Text processing tool.
  • sed: Stream editor for text manipulation.
  • grep: Search for patterns in text.
  • find: Search for files and directories.
  • xargs: Build and execute command lines from standard input.
  • ethtool: Display and modify network interface settings.
  • ip: Show / manipulate routing, devices, policy routing and tunnels.

This cheatsheet provides a starting point for troubleshooting. Always consult the official documentation for each command for more detailed information. Remember to test commands in a non-production environment before applying them to a production system. Good luck!