System Diagnostics and Debugging
Category: DevOps and System Tools
Type: Linux Commands
Generated on: 2025-07-10 03:19:09
For: System Administration, Development & Technical Interviews
System Diagnostics and Debugging - Linux Cheatsheet (DevOps & System Tools)
Section titled “System Diagnostics and Debugging - Linux Cheatsheet (DevOps & System Tools)”This cheatsheet provides a comprehensive overview of Linux commands used for system diagnostics and debugging, tailored for both DevOps engineers and system administrators.
1. Command Overview
Section titled “1. Command Overview”These commands help you monitor system performance, identify bottlenecks, troubleshoot issues, and diagnose problems in your Linux environment. They are essential for maintaining system stability, optimizing performance, and resolving incidents.
2. Basic Syntax
Section titled “2. Basic Syntax”General Syntax:
command [options] [arguments]Key Components:
command: The name of the command to execute.options: Flags that modify the command’s behavior (e.g.,-h,-v,-l).arguments: Parameters passed to the command (e.g., file paths, process IDs).
3. Practical Examples
Section titled “3. Practical Examples”top - Real-time process monitoring
Section titled “top - Real-time process monitoring”Example: Display the top processes consuming CPU and memory.
topSample Output:
top - 10:30:00 up 1 day, 2:00, 1 user, load average: 0.10, 0.15, 0.12Tasks: 200 total, 1 running, 199 sleeping, 0 stopped, 0 zombie%Cpu(s): 1.0 us, 0.5 sy, 0.0 ni, 98.5 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 stKiB Mem : 8000000 total, 2000000 free, 4000000 used, 2000000 buff/cacheKiB Swap: 2000000 total, 2000000 free, 0 used. 4000000 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 1234 user 20 0 1000000 50000 20000 S 1.0 0.6 0:05.00 process1 5678 user 20 0 500000 25000 10000 S 0.5 0.3 0:02.00 process2 ...ps - Process status
Section titled “ps - Process status”Example: List all processes running under the current user.
ps aux | grep <username>Sample Output:
user 1234 0.0 0.1 12345 6789 ? Ss Jan01 0:00 process1user 5678 0.0 0.2 54321 9876 ? Sl Jan01 0:00 process2df - Disk space usage
Section titled “df - Disk space usage”Example: Display disk space usage in a human-readable format.
df -hSample Output:
Filesystem Size Used Avail Use% Mounted on/dev/sda1 20G 10G 9.0G 53% //dev/sdb1 100G 50G 50G 50% /datadu - Disk usage
Section titled “du - Disk usage”Example: Display the disk usage of the current directory.
du -sh .Sample Output:
1.2G .netstat or ss - Network statistics
Section titled “netstat or ss - Network statistics”Example: List all listening TCP ports.
ss -ltSample Output:
State Recv-Q Send-Q Local Address:Port Peer Address:PortLISTEN 0 128 0.0.0.0:22 0.0.0.0:*LISTEN 0 128 [::]:22 [::]:*tcpdump - Network packet analyzer
Section titled “tcpdump - Network packet analyzer”Example: Capture network traffic on interface eth0 and save it to a file.
sudo tcpdump -i eth0 -w capture.pcapvmstat - Virtual memory statistics
Section titled “vmstat - Virtual memory statistics”Example: Display virtual memory statistics every 5 seconds.
vmstat 5Sample Output (repeated every 5 seconds):
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu----- r b swpd free buff cache si so bi bo in cs us sy id wa st 1 0 0 1900000 100000 2100000 0 0 1 1 100 500 1 1 98 0 0iostat - I/O statistics
Section titled “iostat - I/O statistics”Example: Display I/O statistics for all devices every 5 seconds.
iostat -x 5Sample Output (repeated every 5 seconds):
Linux 5.4.0-91-generic (hostname) 01/01/2024 _x86_64_ (1 CPU)
avg-cpu: %user %nice %system %iowait %steal %idle 1.00 0.00 0.50 0.00 0.00 98.50
Device r/s w/s rkB/s wkB/s rrqm/s wrqm/s %rrqm %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm %utilsda 0.50 0.50 2.00 2.00 0.00 0.00 0.00 0.00 1.00 1.00 0.00 4.00 4.00 1.00 0.10lsof - List open files
Section titled “lsof - List open files”Example: List all files opened by a specific process (e.g., process with PID 1234).
lsof -p 1234Sample Output:
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAMEprocess1 1234 user cwd DIR 253,0 4096 2 /home/userprocess1 1234 user rtd DIR 253,0 4096 2 /process1 1234 user txt REG 253,0 10000000 12345 /usr/bin/process1process1 1234 user 3u IPv4 12345 0t0 TCP localhost:12345->localhost:54321 (ESTABLISHED)strace - Trace system calls
Section titled “strace - Trace system calls”Example: Trace system calls made by a command (e.g., ls -l).
strace ls -lSample Output (truncated):
execve("/usr/bin/ls", ["ls", "-l"], [/* 27 vars */]) = 0brk(NULL) = 0x55b5d075a000access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory)access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory)openat(AT_FDCWD, "/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3fstat(3, {st_mode=S_IFREG, st_size=123456, ...}) = 0mmap(NULL, 123456, PROT_READ, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f2e7c7a9000close(3) = 0access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory)openat(AT_FDCWD, "/lib/x86_64-linux-gnu/libc.so.6", O_RDONLY|O_CLOEXEC) = 3read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\01\0\0\0\0\0\0\0", 832) = 832fstat(3, {st_mode=S_IFREG, st_size=2222222, ...}) = 0mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f2e7c797000mmap(NULL, 2222222, PROT_READ, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f2e7c570000mmap(0x7f2e7c78d000, 253952, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1d000) = 0x7f2e7c78d000mmap(0x7f2e7c7ca000, 28672, PROT_READ, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x4a000) = 0x7f2e7c7ca000mmap(0x7f2e7c7d1000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x50000) = 0x7f2e7c7d1000close(3) = 0...dmesg - Display kernel messages
Section titled “dmesg - Display kernel messages”Example: Display the kernel ring buffer messages.
dmesgExample: Filter for specific errors:
dmesg | grep -i errorSample Output:
[Jan 1 00:00:00] ACPI Error: No handler for Region [ECSI] (ffff888000000000) [SystemIO] (20200326/evregion-166)[Jan 1 00:00:00] ACPI Error: Region SystemIO(7) has no handler (20200326/exfldio-261)journalctl - Systemd journal
Section titled “journalctl - Systemd journal”Example: Display system logs for the current boot.
journalctl -bExample: Filter logs for a specific service:
journalctl -u nginx.serviceExample: Show logs from the last hour:
journalctl --since "1 hour ago"4. Common Options
Section titled “4. Common Options”| Command | Option | Description | Example |
|---|---|---|---|
top | -H | Show threads | top -H |
top | -d | Specify delay time between screen updates (seconds) | top -d 2 |
ps | aux | Display all processes for all users with detailed information. | ps aux |
ps | -ef | Display all processes with full command line information | ps -ef |
df | -h | Display disk space usage in a human-readable format (e.g., KB, MB, GB). | df -h |
df | -i | Display inode usage. | df -i |
du | -sh | Display the total size of a directory in a human-readable format. | du -sh /path/to/directory |
du | -a | Display the size of all files and directories. | du -a /path/to/directory |
netstat | -l | List only listening sockets. | netstat -lt |
netstat | -n | Display addresses and port numbers numerically (don’t resolve hostnames). | netstat -an |
ss | -l | List only listening sockets. | ss -lt |
ss | -n | Display addresses and port numbers numerically (don’t resolve hostnames). | ss -an |
tcpdump | -i | Specify the interface to capture traffic on. | tcpdump -i eth0 |
tcpdump | -w | Write the captured packets to a file. | tcpdump -i eth0 -w capture.pcap |
vmstat | <delay> | Specify the delay between updates in seconds. | vmstat 5 |
iostat | -x | Display extended statistics. | iostat -x |
iostat | <delay> | Specify the delay between updates in seconds. | iostat -x 5 |
lsof | -p | Filter by process ID. | lsof -p 1234 |
lsof | -i | Filter by network connection (e.g., lsof -i :80 for port 80). | lsof -i :80 |
strace | -p | Attach to a running process. | strace -p 1234 |
strace | -o | Write the trace output to a file. | strace -o trace.log ls -l |
dmesg | -H | Make output human-readable. | dmesg -H |
dmesg | -k | Display only kernel messages. | dmesg -k |
journalctl | -b | Show logs from the current boot. | journalctl -b |
journalctl | -u | Filter logs by systemd unit (service). | journalctl -u nginx.service |
journalctl | --since | Filter logs by time (e.g., --since "1 hour ago"). | journalctl --since "1 hour ago" |
journalctl | --until | Filter logs by time (e.g., --until "1 hour ago"). | journalctl --until "1 hour ago" |
free | -m | Display memory usage in megabytes. | free -m |
free | -g | Display memory usage in gigabytes. | free -g |
5. Advanced Usage
Section titled “5. Advanced Usage”Combining commands with pipes
Section titled “Combining commands with pipes”Example: Find the process consuming the most memory and then kill it.
ps aux | sort -nrk 4 | head -n 1 | awk '{print $2}' | xargs kill -9Explanation:
ps aux: List all processes.sort -nrk 4: Sort numerically in reverse order based on the 4th column (memory usage).head -n 1: Get the top process.awk '{print $2}': Extract the process ID (PID).xargs kill -9: Kill the process with the extracted PID.
WARNING: Use kill -9 with caution. It can lead to data corruption if used on critical processes. Consider using kill (without -9) first to allow the process to shut down gracefully.
Monitoring I/O performance of a specific process
Section titled “Monitoring I/O performance of a specific process”pidstat -d -p <PID> 1This command uses pidstat to monitor the I/O performance of a process with the specified <PID>. The -d option displays I/O statistics, and the 1 specifies a 1-second interval between updates.
Analyzing network traffic with tcpdump and Wireshark
Section titled “Analyzing network traffic with tcpdump and Wireshark”- Capture network traffic using
tcpdump:
sudo tcpdump -i eth0 -w capture.pcap-
Transfer the
capture.pcapfile to your local machine. -
Open the
capture.pcapfile in Wireshark for detailed analysis.
Using perf for performance analysis
Section titled “Using perf for performance analysis”perf is a powerful performance analysis tool.
Example: Profile the CPU usage of a command.
perf record -g <command>perf reportThis will create a perf.data file containing profiling information. perf report will generate a report showing the CPU usage of different functions.
6. Tips & Tricks
Section titled “6. Tips & Tricks”- Use aliases: Create aliases for frequently used commands to save time and reduce typing errors. For example:
alias htop='top -H'alias mem='free -m'alias logs='journalctl -xe'Add these aliases to your .bashrc or .zshrc file to make them permanent.
- Use
watchto monitor commands repeatedly:watchexecutes a command repeatedly and displays the output.
watch -n 2 df -h # Run 'df -h' every 2 seconds- Use shell history search (Ctrl+R): Quickly find previously executed commands.
- Use
lessormoreto page through long output:command | less - Learn
awkandsed: These tools are invaluable for parsing and manipulating text output from commands. - Use
-v(verbose) flag: Many commands have a-vflag for more detailed output. - Use
man <command>: Read the manual page for a command to learn about all its options and usage. - Tab completion: Use tab to autocomplete commands, options, and file paths.
- Double-check destructive commands: Before running commands like
kill -9orrm -rf, double-check that you are targeting the correct process or file.
7. Troubleshooting
Section titled “7. Troubleshooting”| Error | Solution |
|---|---|
| ”Permission denied” | Use sudo to run the command with root privileges if necessary. Ensure you have the required permissions to access the files or directories. |
| ”Command not found” | Verify that the command is installed and in your system’s PATH. Use which <command> to check if the command is in the path. If not, install the command or add its location to your PATH. |
| ”Address already in use” | Another process is already using the port. Use `netstat -an |
| High CPU/memory usage | Use top or htop to identify the processes consuming the most resources. Investigate the processes and identify the cause of the high usage. Consider optimizing the application, increasing system resources, or restarting the process. |
| Disk space full | Use df -h to identify the full partition. Use du -sh /path/to/directory to find large directories. Remove unnecessary files or move them to another storage location. |
| Network connectivity issues | Use ping, traceroute, and netstat to diagnose network problems. Check firewall rules, routing tables, and DNS settings. |
Logs not showing up in journalctl | Ensure the service is configured to log to the systemd journal. Check the service’s configuration file and verify that the logging settings are correct. Restart the service to apply the changes. Check the journald configuration (/etc/systemd/journald.conf). |
strace not working | Ensure you have the necessary privileges to trace the process. You may need to run strace with sudo. |
8. Related Commands
Section titled “8. Related Commands”htop: An interactive process viewer (better thantop).iotop: Monitor disk I/O usage by process.iftop: Monitor network bandwidth usage by connection.nload: Display network usage.ncdu: NCurses Disk Usage analyzer.sar: System Activity Reporter (collects and reports system activity data).systemctl: Control the systemd system and service manager.journalctl: Query the systemd journal.lshw: Hardware information.lspci: List PCI devices.lsusb: List USB devices.free: Display amount of free and used memory in the system.pmap: Report memory map of a process.gdb: GNU Debugger
This cheatsheet provides a solid foundation for diagnosing and debugging Linux systems. Remember to consult the manual pages (man <command>) for more detailed information on each command. Good luck!