Exploring Linux Performance Monitoring's Hidden Gems

Table of Contents show

Linux performance monitoring is very important for Linux administrator as well as for developers. It is one of the most important skills you can have the ability to monitor and optimize the performance of your systems. Whether you’re troubleshooting issues or simply trying to get the most out of your hardware, understanding the various tools and commands available to you is crucial.

In this post, we’ll take a look at some of the most essential command-line tools for monitoring Linux performance. From system uptime to CPU and memory usage, we’ll cover the key metrics you need to know to keep your systems running smoothly. We’ll also explore some advanced commands that can help you dig deeper into performance issues and identify bottlenecks.

Whether you’re new to Linux or a seasoned pro, this guide will give you the knowledge you need to effectively monitor and optimize your systems. So let’s dive in and take a closer look at the tools and commands that can help you keep your Linux systems running at their best.

Uptime

The uptime command is a simple command that shows the current time, the amount of time the system has been running, and the number of users currently logged in. It gives a quick overview of the system’s health and can help identify any potential issues.

The output of the command typically looks like this:

uptime
 21:33:44 up  2:34,  3 users,  load average: 10.00, 9.01, 1.05

The first line shows the current time (21:33:44 in this example).
The second line shows how long the system has been running (in this example, 2 hours and 34 minutes).
The third line shows the number of users currently logged in e.g. 3 users are currently logged in. If this number is very big then we need to understand that many users are working on it and accordingly do our analysis.
The last line shows the system load averages for the past 1, 5, and 15 minutes. Load average is a measure of the amount of work that a computer system is doing.

In the example above, the load averages are 10.00, 9.01, and 1.05, which indicates that the system has started doing significant things in last 5 mins onwards..

By running this command regularly and tracking the output, you can identify any trends that may indicate performance issues, such as a consistently high load average or a large number of logged-in users.

dmesg | tail

The dmesg command is used to examine or control the kernel message buffer. The kernel message buffer is a circular buffer in which the kernel stores messages that are generated by the kernel and by device drivers. These messages can include information about system initialization, hardware detection, and other events that take place while the system is running.

By default, dmesg will print all the messages that are currently stored in the kernel message buffer. By piping the output of dmesg to the tail command, we can display only the last 10 lines of the kernel message buffer. This can be useful for displaying recent kernel messages, such as those generated during system initialization or during the detection of new hardware.

The output of the command typically looks like this:

dmesg | tail
[13142.732107] usb 1-1.3: new high-speed USB device number 5 using xhci_hcd
[13142.846475] usb 1-1.3: New USB device found, idVendor=8087, idProduct=07dc, bcdDevice= 0.01
[13142.846478] usb 1-1.3: New USB device strings: Mfr=0, Product=0, SerialNumber=0
[13142.847110] hub 1-1.3:1.0: USB hub found
[13142.847555] hub 1-1.3:1.0: 6 ports detected
[13142.904383] usb 1-1.3.3: new high-speed USB device number 6 using xhci_hcd

The above output shows some recent kernel messages that have been generated by the system. The messages include information about USB device detection and a new USB hub being found.

By using dmesg | tail, you can quickly view last 10 kernel messages and troubleshoot any issues related to hardware or driver problems.

vmstat

The vmstat command is a useful tool for monitoring system performance and identifying bottlenecks. It reports information about processes, memory, paging, block IO, traps, and CPU activity.

The vmstat command takes two optional arguments: the first is the sampling interval (in seconds), and the second is the number of samples.

For example, the command vmstat 1 will display a snapshot of the system’s performance every second, while vmstat 1 5 will display 5 snapshots of the system’s performance, each taken one second apart.

The output of the command typically looks like this:

vmstat 1 5
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 0  0 220216 7056048      0      0    3    4     7     1    6    3  3  2 94  0  0
 0  0 220216 7056396      0      0 1856    0    29     0 5351 6812  6  5 89  0  0
 0  0 220216 7054096      0      0 3072    0    48     0 3526 4215  2  1 97  0  0
 0  0 220212 7054064      0      0    0    0     0     0 3413 4357  2  2 96  0  0
 0  0 220212 7056180      0      0 3840    0    60     0 3299 4231  2  4 94  0  0

The first line of the output shows the header, which describes the different columns of the output:

procs: The number of processes in different states. ‘r’ is the number of running processes, ‘b’ is the number of blocked processes.
memory: The amount of memory used by the system. ‘swpd’ is the amount of virtual memory used, ‘free’ is the amount of idle memory, ‘buff’ is the amount of memory used as buffers, and ‘cache’ is the amount of memory used as cache.
swap: The amount of swap space used. ‘si’ is the amount of memory swapped in from disk, ‘so’ is the amount of memory swapped out to disk.
io: The amount of input/output operations. ‘bi’ is the number of blocks received from a block device (block input), ‘bo’ is the number of blocks sent to a block device (block output).
system: The number of system calls. ‘in’ is the number of interrupts per second, ‘cs’ is the number of context switches per second.
cpu: The percentage of CPU utilization. ‘us’ is the percentage of CPU time spent on user processes, ‘sy’ is the percentage of CPU time spent on system processes, ‘id’ is the percentage of time the CPU is idle, ‘wa’ is the percentage of time the CPU is waiting for I/O to complete, ‘st’ is the percentage of time stolen from a virtual machine.

By using vmstat, you can quickly identify bottlenecks and potential issues with the system’s performance. For example, high values for ‘si’ and ‘so’ may indicate that the system is running low on memory and is swapping frequently. High values for ‘us’ or ‘sy’ may indicate that the CPU is overutilized, and high values for ‘wa’ may indicate that there are I/O bottlenecks in the system.

Additionally, vmstat also shows the number of running and blocked processes, which can help identify if there are any process related issues such as a high number of blocked processes indicating a deadlock.

mpstat -P ALL 1

The mpstat command is used to monitor the performance of individual CPUs or cores in a multi-processor system. It provides detailed information about the usage of each CPU or core, including the percentage of time spent in user mode, system mode, and idle mode.

There are several other options that can be used with mpstat, some of which are:

-A : Displays statistics for all processors or cores, including individual and cumulative statistics
-u : Shows the statistics of CPU time spent in user mode, system mode, nice mode, etc
-I SUM : Provide a summary of all interrupt types
-I 5,10 : Provide statistics of specific interrupt types, in this case 5 and 10
-P 0-3 : Provide statistics of specific processors or cores, in this case 0 to 3.
-r : Displays statistics for disk I/O
-t : Displays timestamp with each report

The mpstat -P ALL 1 command will display the usage statistics for all CPUs or cores every second.

The output of the command typically looks like this:

mpstat -P ALL 1
Linux 4.19.0-14-amd64 (server)  01/26/2023  _x86_64_    (2 CPU)

18:02:35     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle
18:02:37     all    1.15    0.00    2.54    0.00    0.00    0.00    0.00    0.00   96.31
18:02:37       0    0.00    0.00    7.80    0.00    0.00    0.00    0.00    0.00   92.20
18:02:37       1    1.50    0.00    3.20    0.00    0.00    0.00    0.00    0.00   95.30
18:02:37       2    1.50    0.00    0.00    0.00    0.00    0.00    0.00    0.00   98.50
18:02:37       3    0.00    0.00    3.10    0.00    0.00    0.00    0.00    0.00   96.90
18:02:37       4    0.00    0.00    1.60    0.00    0.00    0.00    0.00    0.00   98.40
18:02:37       5    1.60    0.00    0.00    0.00    0.00    0.00    0.00    0.00   98.40
18:02:37       6    3.10    0.00    1.50    0.00    0.00    0.00    0.00    0.00   95.40
18:02:37       7    1.50    0.00    3.10    0.00    0.00    0.00    0.00    0.00   95.40

The first line of the output shows the header, which describes the different columns of the output:

CPU: The CPU number
%usr: The percentage of CPU time spent in user mode
%nice: The percentage of CPU time spent on low-priority processes
%sys: The percentage of CPU time spent in system mode
%iowait: The percentage of CPU time spent waiting for I/O operations to complete
%irq: The percentage of CPU time spent handling interrupts
%soft: The percentage of CPU time spent handling softirqs
%steal: The percentage of CPU time spent in involuntary wait by the virtual CPU or hypervisor waiting for the real CPU
%guest: The percentage of CPU time spent running a virtual CPU for guest operating systems under the control of the Linux kernel
%gnice: The percentage of CPU time spent by guest virtual CPUs on low-priority processes.
%idle: The percentage of CPU time spent idle

By using mpstat, you can quickly identify which CPUs or cores are the most heavily utilized, and identify any potential performance bottlenecks. It can also help you to identify if any specific process is causing high CPU usage, and if there are any issues related to I/O operations, interrupts, or softirqs.

One important thing to notice is that mpstat shows the average CPU usage over a certain period of time, so it might not capture short burst of high CPU usage.

In addition, mpstat also shows the system wide averages, as well as per-cpu usage, which can be useful in identifying any imbalance or issues related to CPU scheduling.

Overall, mpstat is a useful command for monitoring the performance of individual CPUs or cores and is particularly useful in multi-processor systems. It provides detailed information about the usage of each CPU or core which can be used to quickly identify and troubleshoot performance issues related to CPU usage.

pidstat

The pidstat command is used to monitor the performance of individual processes on a Linux system. It provides detailed information about the usage of each process, including the percentage of CPU time spent in user mode, system mode, and the number of major and minor page faults.

The pidstat 1 command will display the usage statistics for all processes every second.

The output of the command typically looks like this:

Linux 4.19.0-14-amd64 (server)  01/26/2023    _x86_64_    (2 CPU)

01:23:24 PM   UID       PID    %usr %system  %guest    %CPU   CPU  minflt/s  majflt/s     VSZ    RSS   cmd
01:23:25 PM     0          1    0.00    0.00    0.00    0.00     0    0.00    0.00    15292    868   systemd
01:23:25 PM     0          2    0.00    0.00    0.00    0.00     0    0.00    0.00       0      0   kthreadd
01:23:25 PM     0          4    0.00    0.00    0.00    0.00     0    0.00    0.00       0      0   kworker/0:0H
01:23:25 PM     0          6    0.00    0.00    0.00    0.00     0    0.00    0.00       0      0   mm_percpu_wq
01:23:25 PM     0          7    0.00    0.00    0.00    0.00     0    0.00    0.00       0      0   ksoftirqd/0

The first line of output shows the header,, which describes the different columns of the output:

UID: The user ID of the process owner
PID: The process ID of the process
%usr: The percentage of CPU time spent in user mode by the process
%system: The percentage of CPU time spent in system mode by the process
%guest: The percentage of CPU time spent running a virtual CPU for guest operating systems under the control of the Linux kernel
%CPU: The percentage of CPU time spent by the process
CPU: The CPU or core on which the process is running
minflt/s: The number of minor page faults per second
majflt/s: The number of major page faults per second
VSZ: The virtual memory size of the process in bytes
RSS: The resident set size of the process in bytes
cmd: The command that started the process

By using pidstat, you can quickly identify which processes are the most heavily utilizing CPU, memory and other resources. It can also help you to identify any page faults that could be causing performance issues.

It is also possible to monitor specific process by specifying the PID with pidstat -p PID 1

pidstat is a useful command for monitoring the performance of individual processes and is particularly useful in identifying performance issues related to specific process. It provides detailed information about the usage of each process which can be used to quickly identify and troubleshoot performance issues related to CPU, memory and other resources usage.

iostat

The iostat command is used to monitor input/output (I/O) statistics on a Linux system. It provides detailed information about the usage of the disk drives, including the number of read and write operations per second, the number of sectors read and written per second, and the average time it takes for each operation.

Some of the other options that can be used with the iostat command are:

-c : Display the utilization statistics for CPU
-d : Display the disk statistics
-n : Display the statistics for the number of devices specified
-p : Display the statistics for the specified device(s)
-t : Display the timestamp with the statistics
-x : Display extended statistics
-y : Display the statistics in a format that is easy to parse by scripts.
-z : Display only devices that are in a running state and omits any devices that are in a stopped state.

For example, the command "iostat -x -p sda 1" will display the extended statistics for the device sda every 1 second & the command “iostat -xz 1" will display the I/O statistics every second.

The output of the command typically looks like this:

Linux 4.19.0-14-amd64 (server)  01/26/2023    _x86_64_    (2 CPU)

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00

The first line of the output shows the header, which describes the different columns of the output:

Device: The name of the disk drive
rrqm/s: The number of read requests merged per second
wrqm/s: The number of write requests merged per second
r/s: The number of read operations per second
w/s: The number of write operations per second
rkB/s: The number of kilobytes read per second
wkB/s: The number of kilobytes written per second
avgrq-sz: The average size of a request in sectors
avgqu-sz: The average queue length of the requests
await: The average time (in milliseconds) for I/O requests issued to the device to be served
r_await: The average time (in milliseconds) for read requests to be served
w_await: The average time (in milliseconds) for write requests to be served
svctm: The average service time (in milliseconds) for I/O requests that were issued to the device
%util: The percentage of CPU time during which I/O requests were issued to the device (bandwidth utilization for the device)

By using iostat, you can quickly identify which devices are the most heavily utilized, and if there are any issues related to I/O operations. It can also help you to identify if any specific process is causing high I/O usage, and if there are any issues related to disk drives.

iostat is a useful command for monitoring the performance of disk drives and is particularly useful in identifying performance issues related to I/O operations. It provides detailed information about the usage of disk drives which can be used to quickly identify and troubleshoot performance issues related to disk drives and I/O operations.

sar

The sar (System Activity Report) command is a powerful tool for monitoring and analyzing system performance on Linux systems. It is used to collect, report, and save system activity information such as CPU usage, memory usage, I/O activity, network activity, and more.

The sar command is typically used with options that specify the type of system activity to monitor, the interval at which to collect data, and the file or files in which to save the data.

For example, the sar -u 1 command will display the CPU usage statistics every 1 second.

Some of the other options that can be used with the sar command are:

-b : Display the I/O and transfer rate statistics
-r : Display the memory and swap usage statistics
-W : Display the paging statistics
-P : Display the statistics for the specified processor(s) or core(s)
-A : Display all the available statistics
-u : Display system usages statistics
-n : Display network statistics

It is also possible to specify a date and time range for the data that sar should collect and report, by using options like -s and -e. For example, the command sar -u -s 10:00:00 -e 11:00:00 will display the CPU usage statistics for the time period between 10 am and 11 am.

The sar -n DEV 1 command will display the network device statistics every 1 second.

The sar -n TCP,ETCP 1 command will display the TCP and TCP extension statistics every 1 second.

free

The free command provides a quick way to check the system’s memory usage and can be useful for troubleshooting performance issues related to memory usage.

Here are some of the other options that can be used with the free command:

-b : Display the memory information in bytes
-g : Display the memory information in gigabytes
-h : Display the memory information in human-readable format (e.g., 1K, 234M, 2G)
-k : Display the memory information in kilobytes
-l : Display the information about the locked memory
-m : Display the information about the amount of free and used memory in the system
-o : Display the memory information without any buffer/cache information
-t : Display the total memory usage, including the memory used by the buffers and cache

For example, the command free -h will display the memory information in human-readable format.
The free -m command is used to display the amount of free and used memory in the system. The -m option is used to display the memory information in megabytes, rather than in bytes.

Here’s an example of the output of the free -m command:

              total        used        free      shared  buff/cache   available
Mem:           7864        6913         731         123         220        829
Swap:          8191           0        8191

The output is divided into three sections:

The first line shows the total, used, free, shared, buff/cache, and available memory in the system.
The second line shows the total, used, and free swap space in the system.

The ‘total’ column shows the total amount of memory installed in the system. ‘used’ column shows the amount of memory currently in use by the system, ‘free’ column shows the amount of memory that is currently not being used by the system, ‘shared’ column shows the amount of memory that is being shared by multiple processes, ‘buff/cache’ column shows the amount of memory that is being used as buffers and cache, and ‘available’ column shows the amount of memory that is available for use.

top

The top command is a real-time system monitoring tool that displays a dynamic, scrolling view of the processes running on a Linux system. It provides detailed information about the performance of the system, including CPU usage, memory usage, and running processes.

Here’s an example of the output of the top command:

top - 11:38:54 up 3 days,  7:03,  2 users,  load average: 0.00, 0.01, 0.05
Tasks: 244 total,   2 running, 242 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0.2 us,  0.1 sy,  0.0 ni, 99.7 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem :  8077248 total,  7097368 used,   979880 free,    60788 buffers
KiB Swap:  8191996 total,        0 used,  8191996 free.   161720 cached Mem

PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
  839 root      20   0       0      0      0 S   0.0  0.0   0:00.00 kworker/u2:2
  838 root      20   0       0      0      0 S   0.0  0.0   0:00.00 kworker/u2:1
  837 root      20   0       0      0      0 S   0.0  0.0   0:00.00 kworker/u2:0
  836 root      20   0       0      0      0 S   0.0  0.0   0:00.00 kworker/u2:3
  835 root      20   0       0      0      0 S   0.0  0.0   0:00.00 kworker/

The first line of the top output shows the current time, the system uptime, the number of users logged in, and the system’s load averages for the past 1, 5, and 15 minutes.

The second line shows the total number of processes running on the system, as well as the number of processes in different states (running, sleeping, stopped, and zombie).

The third line shows the CPU usage statistics, including the percentage of CPU time spent in user mode, system mode, and idle mode, as well as the percentage of CPU time spent waiting for I/O operations to complete.

The fourth line shows the memory usage statistics, including the total, used, and free physical memory, as well as the amount of memory being used as buffers and cache.

The fifth line shows the swap memory usage statistics, including the total, used, and free swap space, as well as the amount of memory being used as cache.

The remaining lines show a list of the processes running on the system, including the process ID, user, priority, virtual memory usage, resident memory usage, shared memory usage, and CPU usage. The process that is currently using the most CPU time is listed at the top of the list.

The top command updates the information in real-time, so the output will change as the system’s performance changes. You can press q to exit the top command.

The top command is a powerful tool for monitoring the performance of a Linux system. It can be used to identify which processes are consuming the most resources, and can help to identify performance bottlenecks. The top command can be used with a variety of options such as -d to specify the delay time, -n to specify the number of iterations and -u to specify a particular user’s process.

Another useful option is -p which allows you to monitor the specific process by giving the process id. For example, top -p 1234 will only show the process with ID 1234. The -H option will also show all the threads of the process, in addition to the parent process.

You can also sort the process list by different criteria like memory usage, CPU usage, process name etc by pressing different keys like M for memory, P for CPU, N for process name etc.

In addition to these options, there are many other options available for the top command that can be used to customize the output and display the information that is most relevant for your specific use case.

Conclusion

In conclusion, monitoring the performance of a Linux system is essential for ensuring that the system is running at optimal performance. The commands we discussed in this post – uptime, dmesg, vmstat, mpstat, pidstat, iostat, free, sar, and top – are all powerful tools that provide detailed information about the system’s performance. By understanding and utilizing these commands, you can quickly identify performance bottlenecks and take steps to optimize the system’s performance. Whether you are a system administrator, developer, or power user, these commands are essential for understanding the performance of your Linux systems. Remember that these commands are just the tip of the iceberg in terms of Linux performance monitoring and there are many other tools and techniques available for monitoring and optimizing Linux systems.

Reference

Exploring Linux Performance monitoring’s hidden gems

ByAdmin

Uptime

dmesg | tail

vmstat

mpstat -P ALL 1

pidstat

iostat

sar

free

top

Conclusion