Why is monitoring important?
Any server issues cost time and money to fix. Server monitoring is important and allows you to pick up any small issues before they evolve into anything major. Server monitoring is essential in ensuring service availability.
We offer to build a monitoring solution based on Icinga2 on your server infrastructure. The solution includes.
What is server monitoring?
continuously scanning servers on a designated network and scans the network for any failures or any irregularities that are detected by server monitoring software.
Server monitoring is essential in ensuring service availability.
Icinga 2 is an open source monitoring system which checks the availability of your network resources, notifies users of outages and generates performance data for reporting. Scalable and extensible, Icinga 2 can monitor large, complex environments across multiple locations.
The open platform for beautiful analytics platform for all your metrics. Grafana allows you to visualize and understand your metrics.
InfluxDB is used as a data store for any use case involving large amounts of time-stamped data, including DevOps monitoring, log data, application metrics, IoT sensor data, and real-time analytics.
Icinga Director, the bleeding edge configuration tool for Icinga 2! Developed as an Icinga Web 2 module it aims to be your new favorite Icinga config deployment tool. Even if you prefer plain text files and manual configuration, chances are good that the Director will change your mind. Director is developed to make your life easier.
Plugins are a bundle of for monitoring. Each plugin is a stand-alone command line tool that provides a specific type of check.
Some of the plugins let you check local system metrics (such as load averages, processes, or disk space usage), others use various network protocols (such as ICMP, SNMP, or HTTP) to perform remote checks. This allows for monitoring a large number of common host and service types. For more specific needs, we can create a custom plugin.
Thousands of community-contributed plugins can be found on sites such as Nagios Exchange or Icinga Exchange.
Our solution includes these plugins
CPU checks idle time percentages by SNMP
A computer processor is described as idle when it is not being used by any program. CPU time (or process time) is the amount of time for which a central processing unit (CPU) was used for processing instructions of a computer program or operating system, as opposed to, for example, waiting for input/output (I/O) operations or entering low-power (idle) mode. The CPU time is measured in clock ticks or seconds. Often, it is useful to measure CPU time as a percentage of the CPU’s capacity, which is called the CPU usage.
Is the average system load calculated over a given period of time of 1, 5 and 15 minutes.
Nginx status page can give real-time data about Nginx’s health. It can help you tweak few Nginx config. Status data can be used in load-balancer env also.
PHP-FPM maintains “pools” with workers available to respond to PHP requests to accomplish this. If you have received alert may need to increase numbers of workers if you have enough free CPU resource and memory. PHP-FPM (FastCGI Process Manager) is an alternative PHP FastCGI implementation with some additional features useful for sites of any size, especially busier sites.
Fetches stats from specified Memcached server (Memcached is an open source, distributed memory object caching system that alleviates database load to speed up dynamic Web applications.)
Fetches one or more stats from specified Redis server. Redis is an open source, in-memory data structure store, used as a database, cache and message broker.
CPU and memory usage for an app
Checks how many percents of memory and CPU used by a provided app. It allows being clear what apps use memory or CPU a lot. You may provide any names of up to ten apps.
Bandwidth Usage Calculator In/Out
Bandwidth or data usage is the total amount of data – such as images, movies, photos, videos, and other files – that you send (upload) or receive (download) over a specific period of time. Regularly checking your bandwidth usage will help you become aware of what your average day-to-day network usage looks like. And if your traffic suddenly grows this means you may need to check if your site becomes very popular or was hacked and sending a lot of emails or something another.
Checks free memory
Checks free memory available and alerts if value lower than a threshold specified.
Reports Central Processing Unit (CPU) statistics and input/output statistics for devices and partitions. The IOSTAT command is used for monitoring system input/output device loading by observing the time the devices are active in relation to their average transfer rates.
- Average CPU Utilization – For multiprocessor systems, the CPU values are global averages among all processors.
- Average IO waits – The average time (in milliseconds) for I/O requests issued to the device to be served. This includes the time spent by the requests in a queue and the time spent servicing them.
- Average read / write wait – The average time (in milliseconds) for read requests issued to the device to be served. This includes the time spent by the requests in a queue and the time spent servicing them.
- Average Service wait – The average time (in milliseconds) for reading requests issued to the device to be served. This includes the time spent by the requests in a queue and the time spent servicing them
MySQL statistics bin_log: The binary, relay, and DDL MySQL logs are mainly used for replication and recovery tasks. The binary log holds master server changes that can be sent for replication on slave servers, which in turn store these changes in their relay logs for recovery purposes. The DDL log stores metadata changes and is also used for recovery purposes when crashes occur during metadata operations.
MySQL statistics connections: The number of connection attempts (successful or not) to the MySQL server. Measures and output connection response time in seconds.
- Aborted Connections, Aborted clients, Max connections, Max_used_connections, Slow launch threads, Threads connected, Threads running
MySQL statistics innodb_data:
- innodb_data_read, Innodb_data_writes
MySQL statistics innodb_io:
- Innodb_data_pending_fsyncs, Innodb_data_pending_reads, Innodb_data_pending_writes
MySQL statistics innodb_logfile
- Innodb_log_waits, Innodb_log_writes, Innodb_log_write_requests
And statistics innodb_queries, statistics myisam_key_buffer, statistics network_traffic, statistics slow_queries, statistics table_cache, statistics uptime.
- Checks free memory available and alerts if value lower than a threshold specified.
- Checks if available new updates for Debian like Linux. Debian packages are standard Unix ar archives that include two tar archives.
- Checks the number of messages in the mail queue (supports multiple sendmail queues, qmail)
- Checks the number of users currently logged in on the local system and generates an error if the number exceeds the thresholds specified.
- Try to connect to an SSH server at specified server and port. Secure Socket Shell, is a network protocol that provides administrators with a secure way to access a remote computer.
- Checks the amount of used disk space on a mounted file system and generates an alert if free space is less than one of the threshold values.
- Tests the HTTPS service on the specified host and report on certificate expiration times.
- Short information about the server.
- Checks Linux software RAID uses the standard mdadm program to get the status of all the linux md arrays on the local machine using. A Redundant Array of Independent Drives (or Disks), also known as Redundant Array of Inexpensive Drives (or Disks) (RAID) is a term for data storage schemes that divide and/or replicate data among multiple hard drives.
- Hardware RAID SMART – MegaRAID Intel RAID HP RAID – are supported
- Check S.M.A.R.T. status of ATA/SCSI disks.
- HP Smart Array Controllers RAID (Redundant Array of Independent Disks, originally Redundant Array of Inexpensive Disks)
- PING (Packet Internet Groper) is the most commonly used tool for troubleshooting a network, included with most operating systems. It is invoked using the ping command.
- Checks the clock offset with the NTP server. Network Time Protocol is the most common method to synchronize the software clock of a GNU/Linux system with internet time servers. It is designed to mitigate the effects of variable network latency and can usually maintain time to within tens of milliseconds over the public Internet.