Monitoring 101/201 by Leon Adato (SolarWinds Evangelist)

These guides have been written by one of the best SolarWinds experts I have ever known - Leon Adato. He is a Head Geek and technical evangelist at SolarWinds, and is a Cisco® Certified Network Associate (CCNA), MCSE and SolarWinds Certified Professional (he was once a customer, after all). His 25 years of network management experience spans financial, healthcare, food and beverage, and other industries.


Excerpt from 101:

"If you have worked in the IT field for more than 15 minutes, the situation described above is neither unique nor rare, even if it is somewhat colorful. Systems crash unexpectedly, users make bizarre claims about how “the internet is slow”, and managers ask for historical statistics that leave you scratching your head wondering how to collect in a way that is meaningful and doesn’t consign you to the hell of hitting “refresh” and writing down numbers on a paper for half a day, just to get a baseline for a report.

The answer to all these challenges lies in effectively monitoring your environment – collecting statistics and/or checking for error conditions so that you can act or report effectively when needed."

Excerpt from 201:

"Do you know what is wrong with an alert that triggers when CPU utilization is over 90%? Everything. It says nothing about what is going wrong, or even if anything is going wrong. As a SysAdmin, if I see a server that is consistently running at 90% and keeping up with its workload, I call that “correctly sized.” It is likely that you do, too.

But what you’d really like to know is when the number of jobs waiting for CPU is greater than the number of CPUs in the system while there is high CPU utilization that has persisted for a significant length of time. Better still, the alert should tell me what the top running processes are at the time of the trigger.

That tells a clear story about what is wrong and how to go fix it.

These are the issues this guide is designed to address."

You can download both guides from the original source here

(in case above link is broken, you can download copies below)