Skip to content

System Diagnostics and Debugging

Category: DevOps and System Tools
Type: Linux Commands
Generated on: 2025-07-10 03:19:09
For: System Administration, Development & Technical Interviews


System Diagnostics and Debugging - Linux Cheatsheet (DevOps & System Tools)

Section titled “System Diagnostics and Debugging - Linux Cheatsheet (DevOps & System Tools)”

This cheatsheet provides a comprehensive overview of Linux commands used for system diagnostics and debugging, tailored for both DevOps engineers and system administrators.

These commands help you monitor system performance, identify bottlenecks, troubleshoot issues, and diagnose problems in your Linux environment. They are essential for maintaining system stability, optimizing performance, and resolving incidents.

General Syntax:

Terminal window
command [options] [arguments]

Key Components:

  • command: The name of the command to execute.
  • options: Flags that modify the command’s behavior (e.g., -h, -v, -l).
  • arguments: Parameters passed to the command (e.g., file paths, process IDs).

Example: Display the top processes consuming CPU and memory.

Terminal window
top

Sample Output:

top - 10:30:00 up 1 day, 2:00, 1 user, load average: 0.10, 0.15, 0.12
Tasks: 200 total, 1 running, 199 sleeping, 0 stopped, 0 zombie
%Cpu(s): 1.0 us, 0.5 sy, 0.0 ni, 98.5 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
KiB Mem : 8000000 total, 2000000 free, 4000000 used, 2000000 buff/cache
KiB Swap: 2000000 total, 2000000 free, 0 used. 4000000 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
1234 user 20 0 1000000 50000 20000 S 1.0 0.6 0:05.00 process1
5678 user 20 0 500000 25000 10000 S 0.5 0.3 0:02.00 process2
...

Example: List all processes running under the current user.

Terminal window
ps aux | grep <username>

Sample Output:

user 1234 0.0 0.1 12345 6789 ? Ss Jan01 0:00 process1
user 5678 0.0 0.2 54321 9876 ? Sl Jan01 0:00 process2

Example: Display disk space usage in a human-readable format.

Terminal window
df -h

Sample Output:

Filesystem Size Used Avail Use% Mounted on
/dev/sda1 20G 10G 9.0G 53% /
/dev/sdb1 100G 50G 50G 50% /data

Example: Display the disk usage of the current directory.

Terminal window
du -sh .

Sample Output:

1.2G .

Example: List all listening TCP ports.

Terminal window
ss -lt

Sample Output:

State Recv-Q Send-Q Local Address:Port Peer Address:Port
LISTEN 0 128 0.0.0.0:22 0.0.0.0:*
LISTEN 0 128 [::]:22 [::]:*

Example: Capture network traffic on interface eth0 and save it to a file.

Terminal window
sudo tcpdump -i eth0 -w capture.pcap

Example: Display virtual memory statistics every 5 seconds.

Terminal window
vmstat 5

Sample Output (repeated every 5 seconds):

procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
r b swpd free buff cache si so bi bo in cs us sy id wa st
1 0 0 1900000 100000 2100000 0 0 1 1 100 500 1 1 98 0 0

Example: Display I/O statistics for all devices every 5 seconds.

Terminal window
iostat -x 5

Sample Output (repeated every 5 seconds):

Linux 5.4.0-91-generic (hostname) 01/01/2024 _x86_64_ (1 CPU)
avg-cpu: %user %nice %system %iowait %steal %idle
1.00 0.00 0.50 0.00 0.00 98.50
Device r/s w/s rkB/s wkB/s rrqm/s wrqm/s %rrqm %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm %util
sda 0.50 0.50 2.00 2.00 0.00 0.00 0.00 0.00 1.00 1.00 0.00 4.00 4.00 1.00 0.10

Example: List all files opened by a specific process (e.g., process with PID 1234).

Terminal window
lsof -p 1234

Sample Output:

COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
process1 1234 user cwd DIR 253,0 4096 2 /home/user
process1 1234 user rtd DIR 253,0 4096 2 /
process1 1234 user txt REG 253,0 10000000 12345 /usr/bin/process1
process1 1234 user 3u IPv4 12345 0t0 TCP localhost:12345->localhost:54321 (ESTABLISHED)

Example: Trace system calls made by a command (e.g., ls -l).

Terminal window
strace ls -l

Sample Output (truncated):

execve("/usr/bin/ls", ["ls", "-l"], [/* 27 vars */]) = 0
brk(NULL) = 0x55b5d075a000
access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory)
access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG, st_size=123456, ...}) = 0
mmap(NULL, 123456, PROT_READ, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f2e7c7a9000
close(3) = 0
access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/lib/x86_64-linux-gnu/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\01\0\0\0\0\0\0\0", 832) = 832
fstat(3, {st_mode=S_IFREG, st_size=2222222, ...}) = 0
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f2e7c797000
mmap(NULL, 2222222, PROT_READ, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f2e7c570000
mmap(0x7f2e7c78d000, 253952, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1d000) = 0x7f2e7c78d000
mmap(0x7f2e7c7ca000, 28672, PROT_READ, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x4a000) = 0x7f2e7c7ca000
mmap(0x7f2e7c7d1000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x50000) = 0x7f2e7c7d1000
close(3) = 0
...

Example: Display the kernel ring buffer messages.

Terminal window
dmesg

Example: Filter for specific errors:

Terminal window
dmesg | grep -i error

Sample Output:

[Jan 1 00:00:00] ACPI Error: No handler for Region [ECSI] (ffff888000000000) [SystemIO] (20200326/evregion-166)
[Jan 1 00:00:00] ACPI Error: Region SystemIO(7) has no handler (20200326/exfldio-261)

Example: Display system logs for the current boot.

Terminal window
journalctl -b

Example: Filter logs for a specific service:

Terminal window
journalctl -u nginx.service

Example: Show logs from the last hour:

Terminal window
journalctl --since "1 hour ago"
CommandOptionDescriptionExample
top-HShow threadstop -H
top-dSpecify delay time between screen updates (seconds)top -d 2
psauxDisplay all processes for all users with detailed information.ps aux
ps-efDisplay all processes with full command line informationps -ef
df-hDisplay disk space usage in a human-readable format (e.g., KB, MB, GB).df -h
df-iDisplay inode usage.df -i
du-shDisplay the total size of a directory in a human-readable format.du -sh /path/to/directory
du-aDisplay the size of all files and directories.du -a /path/to/directory
netstat-lList only listening sockets.netstat -lt
netstat-nDisplay addresses and port numbers numerically (don’t resolve hostnames).netstat -an
ss-lList only listening sockets.ss -lt
ss-nDisplay addresses and port numbers numerically (don’t resolve hostnames).ss -an
tcpdump-iSpecify the interface to capture traffic on.tcpdump -i eth0
tcpdump-wWrite the captured packets to a file.tcpdump -i eth0 -w capture.pcap
vmstat<delay>Specify the delay between updates in seconds.vmstat 5
iostat-xDisplay extended statistics.iostat -x
iostat<delay>Specify the delay between updates in seconds.iostat -x 5
lsof-pFilter by process ID.lsof -p 1234
lsof-iFilter by network connection (e.g., lsof -i :80 for port 80).lsof -i :80
strace-pAttach to a running process.strace -p 1234
strace-oWrite the trace output to a file.strace -o trace.log ls -l
dmesg-HMake output human-readable.dmesg -H
dmesg-kDisplay only kernel messages.dmesg -k
journalctl-bShow logs from the current boot.journalctl -b
journalctl-uFilter logs by systemd unit (service).journalctl -u nginx.service
journalctl--sinceFilter logs by time (e.g., --since "1 hour ago").journalctl --since "1 hour ago"
journalctl--untilFilter logs by time (e.g., --until "1 hour ago").journalctl --until "1 hour ago"
free-mDisplay memory usage in megabytes.free -m
free-gDisplay memory usage in gigabytes.free -g

Example: Find the process consuming the most memory and then kill it.

Terminal window
ps aux | sort -nrk 4 | head -n 1 | awk '{print $2}' | xargs kill -9

Explanation:

  1. ps aux: List all processes.
  2. sort -nrk 4: Sort numerically in reverse order based on the 4th column (memory usage).
  3. head -n 1: Get the top process.
  4. awk '{print $2}': Extract the process ID (PID).
  5. xargs kill -9: Kill the process with the extracted PID.

WARNING: Use kill -9 with caution. It can lead to data corruption if used on critical processes. Consider using kill (without -9) first to allow the process to shut down gracefully.

Monitoring I/O performance of a specific process

Section titled “Monitoring I/O performance of a specific process”
Terminal window
pidstat -d -p <PID> 1

This command uses pidstat to monitor the I/O performance of a process with the specified <PID>. The -d option displays I/O statistics, and the 1 specifies a 1-second interval between updates.

Analyzing network traffic with tcpdump and Wireshark

Section titled “Analyzing network traffic with tcpdump and Wireshark”
  1. Capture network traffic using tcpdump:
Terminal window
sudo tcpdump -i eth0 -w capture.pcap
  1. Transfer the capture.pcap file to your local machine.

  2. Open the capture.pcap file in Wireshark for detailed analysis.

perf is a powerful performance analysis tool.

Example: Profile the CPU usage of a command.

Terminal window
perf record -g <command>
perf report

This will create a perf.data file containing profiling information. perf report will generate a report showing the CPU usage of different functions.

  • Use aliases: Create aliases for frequently used commands to save time and reduce typing errors. For example:
Terminal window
alias htop='top -H'
alias mem='free -m'
alias logs='journalctl -xe'

Add these aliases to your .bashrc or .zshrc file to make them permanent.

  • Use watch to monitor commands repeatedly: watch executes a command repeatedly and displays the output.
Terminal window
watch -n 2 df -h # Run 'df -h' every 2 seconds
  • Use shell history search (Ctrl+R): Quickly find previously executed commands.
  • Use less or more to page through long output: command | less
  • Learn awk and sed: These tools are invaluable for parsing and manipulating text output from commands.
  • Use -v (verbose) flag: Many commands have a -v flag for more detailed output.
  • Use man <command>: Read the manual page for a command to learn about all its options and usage.
  • Tab completion: Use tab to autocomplete commands, options, and file paths.
  • Double-check destructive commands: Before running commands like kill -9 or rm -rf, double-check that you are targeting the correct process or file.
ErrorSolution
”Permission denied”Use sudo to run the command with root privileges if necessary. Ensure you have the required permissions to access the files or directories.
”Command not found”Verify that the command is installed and in your system’s PATH. Use which <command> to check if the command is in the path. If not, install the command or add its location to your PATH.
”Address already in use”Another process is already using the port. Use `netstat -an
High CPU/memory usageUse top or htop to identify the processes consuming the most resources. Investigate the processes and identify the cause of the high usage. Consider optimizing the application, increasing system resources, or restarting the process.
Disk space fullUse df -h to identify the full partition. Use du -sh /path/to/directory to find large directories. Remove unnecessary files or move them to another storage location.
Network connectivity issuesUse ping, traceroute, and netstat to diagnose network problems. Check firewall rules, routing tables, and DNS settings.
Logs not showing up in journalctlEnsure the service is configured to log to the systemd journal. Check the service’s configuration file and verify that the logging settings are correct. Restart the service to apply the changes. Check the journald configuration (/etc/systemd/journald.conf).
strace not workingEnsure you have the necessary privileges to trace the process. You may need to run strace with sudo.
  • htop: An interactive process viewer (better than top).
  • iotop: Monitor disk I/O usage by process.
  • iftop: Monitor network bandwidth usage by connection.
  • nload: Display network usage.
  • ncdu: NCurses Disk Usage analyzer.
  • sar: System Activity Reporter (collects and reports system activity data).
  • systemctl: Control the systemd system and service manager.
  • journalctl: Query the systemd journal.
  • lshw: Hardware information.
  • lspci: List PCI devices.
  • lsusb: List USB devices.
  • free: Display amount of free and used memory in the system.
  • pmap: Report memory map of a process.
  • gdb: GNU Debugger

This cheatsheet provides a solid foundation for diagnosing and debugging Linux systems. Remember to consult the manual pages (man <command>) for more detailed information on each command. Good luck!