Monit CPU Usage problem

I just recently fixed an issue I wanted my Monit monitoring process to restart a daemon who was segfaulting and causing 100% CPU usage according to top and most other system tools. I had seen configuration examples where Monit could detect that and restart the process, so I figured that adding a configuration like that below would fix it easily enough:

check process foo with pidfile /var/run/foo.pid
  start program = "/etc/init.d/foo start" timeout 10 seconds
  stop program  = "/etc/init.d/foo stop"
  if cpu usage > 90% for 8 cycles then restart

After letting that run for a bunch of cycles the process remained running, and monit didn’t do anything to acknowledge it even in log files. (FYI, a “cycle” is defined in the Monitrc config file in the “set daemon” line and defaults to 120 seconds).

After some research, I finally came upon this post on the Monit mailing list where somebody describes that the CPU usage that Monit bases its numbers off is a percentage of the CPU available for all processors. My machine had 4 processors, so what was seeing as 100% CPU usage in top, monit would only see that as 25%.

I quickly changed my Monit config to check for CPU Usage > 22% as ween in the following. That now works perfectly, even acknowledging in the log each of the 8 times that the CPU was over the limit before restarting it:

check process foo with pidfile /var/run/foo.pid
  start program = "/etc/init.d/foo start" timeout 10 seconds
  stop program  = "/etc/init.d/foo stop"
  if cpu usage > 22% for 8 cycles then restart

…. Now I need to solve the real problem and see why the latest Mongo PHP pecl module is segfaulting….

Leave a Reply

Your email address will not be published. Required fields are marked *