Professional Documents
Culture Documents
Object 2
More lsof command usage and examples : 10 lsof Command Examples in Linux
For more Iostat usage and examples visit : 6 Iostat Command Examples in Linux
9. IPTraf Real Time IP LAN Monitoring
IPTraf is an open source console-based real time network (IP LAN) monitoring utility for Linux. It
collects a variety of information such as IP traffic monitor that passes over the network, including TCP
flag information, ICMP details, TCP/UDP traffic breakdowns, TCP connection packet and byne counts.
It also gathers information of general and detaled interface statistics of TCP, UDP, IP, ICMP, non-IP, IP
checksum errors, interface activity etc.
IP Traffic Monitor
For more information and usage of IPTraf tool, please visit : IPTraf Network Monitoring Tool
Collectl Monitoring
1. CPU
You should understand the four critical performance metrics for CPU context switch, run queue, cpu
utilization, and load average.
Context Switch
When CPU switches from one process (or thread) to another, it is called as context switch.
When a process switch happens, kernel stores the current state of the CPU (of a process or
thread) in the memory.
Kernel also retrieves the previously stored state (of a process or thread) from the memory and
puts it in the CPU.
Context switching is very essential for multitasking of the CPU.
However, a higher level of context switching can cause performance issues.
Run Queue
Run queue indicates the total number of active processes in the current queue for CPU.
When CPU is ready to execute a process, it picks it up from the run queue based on the priority
of the process.
Please note that processes that are in sleep state, or i/o wait state are not in the run queue.
So, a higher number of processes in the run queue can cause performance issues.
Cpu Utilization
This indicates how much of the CPU is currently getting used.
This is fairly straight forward, and you can view the CPU utilization from the top command.
100% CPU utilization means the system is fully loaded.
So, a higher %age of CPU utilization will cause performance issues.
Load Average
This indicates the average CPU load over a specific time period.
On Linux, load average is displayed for the last 1 minute, 5 minutes, and 15 minutes. This is
helpful to see whether the overall load on the system is going up or down.
For example, a load average of 0.75 1.70 2.10 indicates that the load on the system is coming
down. 0.75 is the load average in the last 1 minute. 1.70 is the load average in the last 5
minutes. 2.10 is the load average in the last 15 minutes.
Please note that this load average is calculated by combining both the total number of process in
the queue, and the total number of processes in the uninterruptable task status.
2. Network
A good understanding of TCP/IP concepts is helpful while analyzing any network issues. Well
discuss more about this in future articles.
For network interfaces, you should monitor total number of packets (and bytes) received/sent
through the interface, number of packets dropped, etc.,
3. I/O
I/O wait is the amount of time CPU is waiting for I/O. If you see consistent high i/o wait on you
system, it indicates a problem in the disk subsystem.
You should also monitor reads/second, and writes/second. This is measured in blocks. i.e
number of blocks read/write per second. These are also referred as bi and bo (block in and block
out).
tps indicates total transactions per seconds, which is sum of rtps (read transactions per second)
and wtps (write transactions per seconds).
4. Memory
As you know, RAM is your physical memory. If you have 4GB RAM installed on your system,
you have 4GB of physical memory.
Virtual memory = Swap space available on the disk + Physical memory. The virtual memory
contains both user space and kernel space.
Using either 32-bit or 64-bit system makes a big difference in determining how much memory a
process can utilize.
On a 32-bit system a process can only access a maximum of 4GB virtual memory. On a 64-bit
system there is no such limitation.
The unused RAM will be used as file system cache by the kernel.
The Linux system will swap when it needs more memory. i.e when it needs more memory than
the physical memory. When it swaps, it writes the least used memory pages from the physical
memory to the swap space on the disk.
Lot of swapping can cause performance issues, as the disk is much slower than the physical
memory, and it takes time to swap the memory pages from RAM to disk.
All of the above 4 subsystems are interrelated. Just because you see a high reads/second, or
writes/second, or I/O wait doesnt mean the issue is there with the I/O sub-system. It also depends on
what the application is doing. In most cases, the performance issue might be caused by the application
that is running on the Linux system.
Remember the 80/20 rule 80% of the performance improvement comes from tuning the application,
and the rest 20% comes from tuning the infrastructure components.
There are various tools available to monitor Linux system performance. For example: top, free, ps,
iostat, vmstat, mpstat, sar, tcpump, netstat, iozone, etc., Well be discussing more about these tools and
how to use them in the upcoming articles in this series.
Following is the 4 step approach to identify and solve a performance issue.
Step 1 Understand (and reproduce) the problem: Half of the problem is solved when you
clearly understand what the problem is. Before trying to solve the performance issue, first work
on clearly defining the problem. The more time you spend on understanding and defining the
problem will give you enough details to look for the answers in the right place. If possible, try
to reproduce the problem, or at least simulate a situation that you think closely resembles the
problem. This will later help you to validate the solution you come up to fix the performance
issue.
Step 2 Monitor and collect data: After defining the problem clearly, monitor the system and
try to collect as much data as possible on various subsystems. Based on this data, come up list
of potential issues.
Step 3 Eliminate and narrow down issues: After having a list of potential issues, dive into
each one of them and eliminate any non issues. Narrow it down further to see whether it is an
application issue, or an infrastructure issue. Drill down further and narrow it down to a specific
component. For example, if it is an infrastructure issue, narrow it down and identify the
subsystem that is causing the issue. If it is an I/O subsystem issue, narrow it down to a specific
partition, or raid group, or LUN, or disk. Basically, keep drilling down until you put your finger
on the root cause of the issue.
Step 4 One change at a time: Once youve narrowed down to a small list of potential issues,
dont try to make multiple changes at one time. If you make multiple changes, you wouldnt
know which one fixed the original issue. Multiple changes at one time might also cause new
issues, which youll be chasing after instead of fixing the original issue. So, make one change at
a time, and see if it fixes the original problem.
In the upcoming articles of the performance series, well discuss more about how to monitor and
address performance issues on CPU, Memory, I/O and Network subsystem using various Linux
performance monitoring tools.