Troubleshooting Methodologies
Category: DevOps and System Tools
Type: Linux Commands
Generated on: 2025-07-10 03:23:47
For: System Administration, Development & Technical Interviews
Troubleshooting Methodologies Cheatsheet (Linux Commands - DevOps and System Tools)
Section titled “Troubleshooting Methodologies Cheatsheet (Linux Commands - DevOps and System Tools)”This cheatsheet covers essential Linux commands and tools for troubleshooting, focusing on practical examples relevant to DevOps and System Administration.
1. Command Overview
| Command | Description | When to Use |
|---|---|---|
ping | Test network connectivity to a host. | Verify if a host is reachable, diagnose network latency. |
traceroute | Trace the route packets take to a destination. | Identify network bottlenecks, diagnose routing issues. |
netstat/ss | Display network connections, routing tables, interface statistics. | Monitor network traffic, identify listening ports, diagnose connection problems. |
tcpdump | Capture and analyze network traffic. | Deep dive into network communication, debug protocol issues, analyze malicious traffic. |
lsof | List open files and the processes using them. | Identify which process is using a specific file or port, troubleshoot file locking issues. |
ps | Display running processes. | Monitor resource usage, identify runaway processes, diagnose performance issues. |
top/htop | Display real-time system resource usage. | Identify CPU, memory, and I/O bottlenecks, monitor process activity. |
vmstat | Report virtual memory statistics. | Analyze memory usage, identify swapping issues, diagnose performance problems related to memory. |
iostat | Report CPU utilization and disk I/O statistics. | Identify disk I/O bottlenecks, monitor disk performance, diagnose slow application performance. |
df | Display disk space usage. | Monitor disk space, identify full filesystems, prevent application failures due to lack of disk space. |
du | Estimate file space usage. | Identify large files and directories, reclaim disk space. |
free | Display amount of free and used memory in the system. | Monitor memory usage, identify memory leaks, diagnose performance problems related to memory. |
uptime | Show how long the system has been running. | Quick check of system availability and load. |
dmesg | Display kernel messages. | Diagnose hardware issues, identify driver problems, troubleshoot system boot issues. |
journalctl | Query the systemd journal. | Analyze system logs, troubleshoot service failures, diagnose system events. |
strace | Trace system calls made by a process. | Debug application behavior, identify system call errors, understand how a process interacts with the kernel. |
nc | Arbitrary TCP and UDP connections and listens. | Test network ports, transfer files, simple server/client interactions. |
curl/wget | Transfer data from or to a server. | Test API endpoints, download files, debug HTTP requests. |
systemctl | Control the systemd system and service manager. | Manage services, troubleshoot service failures, monitor service status. |
2. Basic Syntax
- General:
command [options] [arguments] - Piping:
command1 | command2 | command3(Output of command1 becomes input of command2, etc.) - Redirection:
>: Redirect output to a file (overwrite).>>: Redirect output to a file (append).2>: Redirect standard error to a file.&>: Redirect both standard output and standard error to a file.
3. Practical Examples
-
ping: Check connectivity to Google.Terminal window ping google.comPING google.com (142.250.185.142) 56(84) bytes of data.64 bytes from fra16s36-in-f14.1e100.net (142.250.185.142): icmp_seq=1 ttl=118 time=7.58 ms64 bytes from fra16s36-in-f14.1e100.net (142.250.185.142): icmp_seq=2 ttl=118 time=7.46 ms^C--- google.com ping statistics ---2 packets transmitted, 2 received, 0% packet loss, time 1002msrtt min/avg/max/mdev = 7.465/7.526/7.588/0.061 ms -
traceroute: Trace the route to Google.Terminal window traceroute google.comtraceroute to google.com (142.250.185.142), 30 hops max, 60 byte packets1 _gateway (192.168.1.1) 1.344 ms 1.244 ms 1.187 ms2 10.0.0.1 (10.0.0.1) 10.239 ms 10.310 ms 10.267 ms3 * * *4 172.217.160.174 (172.217.160.174) 12.558 ms 13.294 ms 12.435 ms5 142.250.185.142 (142.250.185.142) 12.381 ms 12.384 ms 12.382 ms -
ss: Show listening ports.Terminal window ss -tlpnState Recv-Q Send-Q Local Address:Port Peer Address:Port ProcessLISTEN 0 128 0.0.0.0:22 0.0.0.0:* users:(("sshd",pid=1234,fd=3))LISTEN 0 128 [::]:22 [::]:* users:(("sshd",pid=1234,fd=4)) -
tcpdump: Capture HTTP traffic on port 80.Terminal window sudo tcpdump -i eth0 port 80(Output will vary depending on traffic)
15:34:56.789012 IP mymachine.example.com.54321 > webserver.example.com.80: Flags [S], seq 12345, win 65535, options [mss 1460,nop,wscale 5,nop,nop,TS val 123456789 ecr 0,sackOK,eol], length 0WARNING:
tcpdumpcan generate large amounts of data. Use filters to narrow the capture. Always run withsudoto capture all packets. -
lsof: Find the process using port 80.Terminal window lsof -i :80COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAMEnginx 1234 root 6u IPv4 12345 0t0 TCP *:http (LISTEN) -
ps: List all processes owned by the current user.Terminal window ps -u $USERPID TTY TIME CMD1234 pts/0 00:00:00 bash5678 pts/0 00:00:00 ps -
top: Monitor system resources.Terminal window top(Interactive display of CPU, memory, and process information)
-
vmstat: Check virtual memory statistics every 5 seconds.Terminal window vmstat 5(Output shows memory, swap, and I/O statistics.)
-
iostat: Check disk I/O statistics for devicesdaevery 5 seconds.Terminal window iostat -d sda 5(Output shows disk read/write speeds and CPU utilization.)
-
df: Show disk space usage in human-readable format.Terminal window df -hFilesystem Size Used Avail Use% Mounted on/dev/sda1 20G 10G 9.0G 53% //dev/sdb1 100G 20G 80G 20% /data -
du: Find the size of the current directory.Terminal window du -sh .1.2G . -
free: Show memory usage in human-readable format.Terminal window free -htotal used free shared buff/cache availableMem: 7.7G 2.1G 4.5G 172M 1.1G 5.3GSwap: 2.0G 0B 2.0G -
uptime: Check system uptime and load average.Terminal window uptime15:34:56 up 1 day, 2:34, 1 user, load average: 0.01, 0.02, 0.00 -
dmesg: Display recent kernel messages.Terminal window dmesg | tail -n 20(Output shows recent kernel events, errors, and warnings.)
-
journalctl: Show logs for thenginxservice.Terminal window journalctl -u nginx.service(Output shows logs related to the nginx service.)
-
strace: Trace system calls of thelscommand.Terminal window strace ls -l(Output shows detailed system calls made by the
lscommand.) -
nc: Test if port 80 is open on a remote server.Terminal window nc -vz google.com 80Connection to google.com 80 port [tcp/http] succeeded! -
curl: Get the contents of a webpage.Terminal window curl https://www.google.com(Output shows the HTML source code of the Google homepage.)
-
systemctl: Check the status of thenginxservice.Terminal window systemctl status nginx.service(Output shows the status, logs, and other information about the nginx service.)
4. Common Options
| Command | Option | Description |
|---|---|---|
ping | -c <count> | Send only <count> number of packets. |
ping | -i <interval> | Wait <interval> seconds between sending each packet. |
traceroute | -m <max_hops> | Set the maximum number of hops to <max_hops>. |
netstat/ss | -t | Show TCP connections. |
netstat/ss | -u | Show UDP connections. |
netstat/ss | -l | Show listening sockets. |
netstat/ss | -p | Show the PID and name of the program to which each socket belongs. |
netstat/ss | -n | Show numerical addresses instead of resolving hostnames. |
tcpdump | -i <interface> | Specify the network interface to listen on. |
tcpdump | -n | Don’t resolve hostnames. |
tcpdump | -w <file> | Write the captured packets to a file. |
lsof | -i | List files using internet protocol. |
lsof | -p <pid> | List files opened by process with PID <pid>. |
ps | -ef | Show all processes. |
ps | -u <user> | Show processes owned by user <user>. |
top/htop | -u <user> | Show processes owned by user <user>. |
df | -h | Display sizes in human-readable format. |
df | -T | Display filesystem type. |
du | -h | Display sizes in human-readable format. |
du | -s | Display only a total for each argument. |
du | -x | Skip directories on different filesystems. |
free | -h | Display sizes in human-readable format. |
journalctl | -u <unit> | Show logs for the specified systemd unit. |
journalctl | -f | Follow log output (like tail -f). |
journalctl | -n <lines> | Show the last <lines> lines. |
strace | -p <pid> | Attach to a running process with PID <pid>. |
strace | -o <file> | Write the trace output to a file. |
curl | -I | Show only the response headers. |
curl | -v | Verbose mode (show request and response headers). |
curl | -X <method> | Specify the HTTP method (e.g., GET, POST, PUT, DELETE). |
systemctl | start <service> | Start a service. |
systemctl | stop <service> | Stop a service. |
systemctl | restart <service> | Restart a service. |
systemctl | reload <service> | Reload a service configuration. |
5. Advanced Usage
-
Combining
ps,grep, andkill: Kill all processes matching a pattern.Terminal window ps -ef | grep "my_process" | grep -v grep | awk '{print $2}' | xargs kill -9WARNING: Use with caution.
kill -9forcefully terminates processes and can lead to data loss if the process is not properly shut down. Always test in a non-production environment first. -
Using
tcpdumpwith BPF filters: Capture only traffic to or from a specific IP address and port.Terminal window sudo tcpdump -i eth0 "host 192.168.1.100 and port 80" -
Analyzing slow queries with
strace: Identify performance bottlenecks in database queries.Terminal window strace -p <mysql_pid> -T -e trace=network -s 200 -o strace.logThis will trace network related system calls for the MySQL process, showing the time spent in each call and write the output to
strace.log.-Tshows the time spent in syscalls,-e trace=networkfilters only network related calls, and-s 200increases the string size to 200 bytes for better readability. -
Monitoring network traffic with
nethogs: Identify bandwidth-hogging processes in real-time. (Installnethogsfirst).Terminal window sudo nethogs -
Troubleshooting DNS resolution issues with
digandnslookup:Terminal window dig google.comnslookup google.comThese utilities provide information about DNS records and name resolution.
6. Tips & Tricks
- Use
history | grep <keyword>: Search your command history for specific commands. - Use
!!: Repeat the last command. - Use
!$: Use the last argument of the previous command. - Use aliases: Create shortcuts for frequently used commands in your
~/.bashrcfile.Terminal window alias gs="git status"alias la="ls -la" - Use tab completion: Press
Tabto auto-complete commands, filenames, and options. - Use
man <command>: Read the manual page for a command. - Use
--help: Get a brief overview of command options. - Use
watch: Run a command repeatedly and display the output.watch -n 1 df -h(Rundf -hevery 1 second). - Use
set -xto debug shell scripts: Enable tracing to see each command executed. Useset +xto disable it.
7. Troubleshooting
- “Command not found”:
- Verify that the command is installed.
- Check that the command’s directory is in your
PATHenvironment variable.echo $PATH
- “Permission denied”:
- Use
sudoif you need root privileges. - Check file permissions with
ls -l. Usechmodto modify permissions.
- Use
- “Network is unreachable”:
- Verify network connectivity.
- Check routing tables.
- Check firewall rules.
- High CPU usage:
- Use
toporhtopto identify the process consuming the most CPU. - Use
straceto analyze the process’s system calls.
- Use
- High memory usage:
- Use
toporhtopto identify the process consuming the most memory. - Use
free -hto check overall memory usage. - Check for memory leaks in applications.
- Use
- Disk full:
- Use
df -hto identify the full filesystem. - Use
du -sh /*to identify large directories. - Remove unnecessary files.
- Use
- Service failing to start:
- Check the service’s logs using
journalctl -u <service.service>. - Check the service’s configuration files for errors.
- Verify that all dependencies are met.
- Check the service’s logs using
- Slow network performance:
- Use
pingandtracerouteto identify network latency. - Use
tcpdumpto analyze network traffic. - Check network interface statistics with
ifconfigorip addr.
- Use
8. Related Commands
iptables: Configure the Linux firewall.firewalld: Another firewall management tool.rsync: Remote file synchronization.scp: Secure copy (copy files over SSH).ssh: Secure shell (remote login).tmux: Terminal multiplexer.screen: Another terminal multiplexer.awk: Text processing tool.sed: Stream editor for text manipulation.grep: Search for patterns in text.find: Search for files and directories.xargs: Build and execute command lines from standard input.ethtool: Display and modify network interface settings.ip: Show / manipulate routing, devices, policy routing and tunnels.
This cheatsheet provides a starting point for troubleshooting. Always consult the official documentation for each command for more detailed information. Remember to test commands in a non-production environment before applying them to a production system. Good luck!