To maximize system uptime and performance, eLabNext uses Zabbix to continuously monitors the health and application uptime of Private Cloud installation.
In case of system unavailability or irregularities, our system engineers are automatically notified to solve the issue in order to avoid escalation. All events on the server are automatically logged.
Internal Server Monitoring Rules
Platform |
Component |
Check frequency |
Alert rule (09:00-17:30 CET) |
Alert rule (24/7) |
Generic |
Host availability |
1 min |
Host unreachable for 5 minutes |
Host unreachable for 5 minutes |
Generic |
Host restarted |
1 min |
Host restarted |
|
Generic |
Number of processes |
1 min |
> 300 during last 5 min |
|
Generic |
Time synchronization |
15 min |
Time offset > 30 seconds |
Time offset > 30 seconds |
Linux Server |
Free nodes |
1 min |
< 20% free |
< 10% free |
Linux Server |
Max number of open files |
1 hr |
< 1024 |
|
Database |
Database down |
1 min |
Database down |
Database down |
Database |
Database cluster status |
1 min |
< 3 nodes |
< 2 nodes |
Database |
Database node status |
1 min |
Database rejoining cluster |
|
CPU |
Load |
1 min |
Load > 5 during 5 minutes |
Load > 5 during 15 minutes |
Memory |
Available memory |
1 min |
< 150 MB available |
< 50 MB available |
Swap |
Free swap space |
1 min |
< 25% |
|
Disk space |
Free disk space |
1 min |
Disk space < 20% |
Disk space < 10% |
Disk I/O performance |
Disk queue |
1 min |
Disk queue > 20% for 5 minutes |
|
Windows Server |
Services |
1 min |
Service is down |
|
External Server Monitoring
Platform |
Component |
Check frequency |
Alert rule (09:00-17:30 CET) |
Alert rule (24/7) |
Generic |
HTTP 200 response |
1 min |
Code < > 200 for last 2 times |
Code < > 200 for last 2 times |
Generic |
Response time |
1 min |
Last 3 times > 1 |
Last 3 times > 1 |
Generic |
Download speed |
1 min |
Last 3 times < 2048 |
Last 3 times < 2048 |
Generic |
Certificate validity check |
6 hr |
< 30 days remaining, < 15 days remaining |
< 7 days remaining, 0 days remaining |