Key factors for finding resource bottlenecks in Linux server overload

It is very common, despite affordable hardware, to have server load issues. There can be a number of reasons for a high load on the server, such as inadequate RAM/CPU, slower hard drives, or simply not optimized software. This article will help you identify where the bottleneck is and where you need to invest. However, please do not take it as a substitute for professional advice/service. You should always seek professional service if you can afford the associated costs.

I) First of all, are you really in trouble?

Usually, people look for the upload in control panels, using the “uptime” or “top” command. You can probably run the “uptime” command in your root shell to find out what the payload is, but I’d like you to use “top” for now (please). This will help you identify how many CPUs are reported*. You should be able to see something like cpu00, cpu01, etc.
A load of ~1 per CPU is reasonable. For example, it’s ok if the load is 3.50 and you have 4 CPUs.

Another thing to consider when looking at loading through uptime or higher is understanding what it shows. For example: (on a 2HT cpus server, reported as 4)

18:30:55 up to 17 days, 5:17, 2 users, average load: 4.76, 2.97, 2.62

The first part (3.76) shows the load average in the last 5 min, while the second (2.97) and the third (2.62) show averages of 10 and 15 min respectively. It’s probably a peak here that I wouldn’t be too concerned about (a little carefree?), but if you are, keep reading!

Pretty happy how you were able to identify that your server is really overloaded? Sorry to hear that, but you never know because sometimes the servers can handle much more load than the load shown. After all, load averages aren’t that accurate and can’t always be the final deciding factor. Confused? It was just technical information that you don’t have to worry about as much. Go ahead if your loads are anything to worry about.

*note the use of the term “informed”. I used this term because a P4 CPU with HT technology will report as 2 even if you know your server has a CPU.

II) Where is the problem?

To identify the problem, you need to run a series of logical tests (okay, not as scary as it may sound). All you need is some free time, probably 30-45 minutes, and root access to your server (don’t expect magic ;)). Ready to start? Come on!

Note: Run the checks multiple times to reach a good conclusion.

1. Check RAM (most common bottleneck!).

# free -m

The output should be similar to this:

# free -m

total cached used free shared buffers
Memory: 1963 1912 50 0 28 906
-/+ buffer/cache: 978 985
Change: 1027 157 869

Any reaction like, “Ohh gosh, almost all the RAM is gone”? Do not panic. Take a look at the buffers/cache which says “985” mb of RAM is still free in the buffers. As long as you have enough memory in the buffers and your server isn’t using a lot of swapping, you’ll be pretty good on RAM. Your server starts using SWAP (as does Pagefile), which is part of your allocated disk as memory, but it’s comparatively very slow and can slow down your system even more if you have a busy hard drive (which I doubt it wouldn’t if you’re using so much RAM). In summary, at least 175 mb available in buffer and no more than 200 mb of swap.

If RAM is the problem, you should probably look at optimizations in your PHP/Perl scripts, MySQL queries + server, and Apache.

2. Check if I/O (input/output) usage is excessive

If there are too many read/write requests on a single HDD, it will slow down and you will have to upgrade to a faster drive (with higher RPM and cache). The alternative option to a faster single drive is to split the load across multiple drives by spreading most of the requested content across multiple drives, which can be easily accomplished using “symbolic links” (soft links to files/folders). To identify if your I/O problem is slowing down your server:

# above

Read the output in the “iowait” section, for each CPU. In ideal situations, it should be close to 0%. However, if you are analyzing at the time of a load spike, consider rechecking these values ​​several times to reach a good conclusion. Anything above 15% is worrisome. You can then check the speed of your hard drive to see if it really is lagging:

If you know your hard drive exists at /dev/sda or /dev/hda, just do the following. Or run the command “df -h” to check which drive your data resides on.

# hdparm -Tt /dev/sda

The exit:

/dev/sda:

Cached read time: 1484 MB in 2.01 seconds = 739.00 MB/s

Time buffered disk reads: 62 MB in 3.00 seconds = 20.66 MB/s

It was impressive on buffered cache reads, most likely due to the onboard disk cache, however buffered disk reads are only 20.66 MB/sec. Anything under 25MB is something you should be concerned about.

3. Has all CPU power been consumed?

# above

Check the top output to find out if you are using too much CPU power. You should look at the value below idle in addition to each CPU entry. Anything below 45% is something you should really worry about.

III) Identified problem, what is the solution?

To conclude, let me offer some solutions for each problem:

A global solution to all problems is to optimize MySQL and the web server, including PHP/Perl scripts and queries. Or the least you can do is optimize Apache and MySQL server parameters to work better.

1. Too Much CPU Usage

In “ps -auxf” or “top” look for processes that are using too much CPU. If it’s HTTP or MySQL, you’d better optimize your scripts and queries, if possible. In most cases it is extremely difficult to optimize all the scripts and queries and a better option is to simply change/upgrade the CPU. A dual CPU should work better, but the type of upgrade you’re looking for depends on your current CPU.

2. RAM is exhausted

It’s like you’re in the same kind of situation as the CPU. Optimize HTTP, MySQL, scripts, etc. or go for a RAM upgrade. You can install Opcode caching software like APC (from Pear) for PHP to make it work better while decreasing the load.

3. The disk is all used up (hey, I don’t mean space)

Here you have to go for a faster drive like SATA over regular IDE or SCSI over SATA. Well, I was just speaking in general. You have to consider factors like RPM and cache to end up opting for a worthwhile upgrade. The second option is to get multiple drives of the same class and spread the load across the drives. A common methodology is to serve MySQL from a second unit.

IV.conclusion

Wasn’t that very helpful? My article could have flaws, ahh, excuse me. It’s my first article and this thing really consumed quite a few brain cells of mine. That’s a bit personal, isn’t it? Let’s get back to business.

For your information, in the example, the problem was that the I/O usage and the hard drive were getting slow.

A guide can never be complete on its own or give you everything you need to reach expert level (you need to keep learning to reach that level). Whenever in doubt, hire experts to review your server. Somehow, if you don’t have money to spend, you’re still safe! You can go to our server optimization help section for help with optimizing your server.

Related Post

Leave a Reply

Your email address will not be published. Required fields are marked *