In the Webitel Architecture document, all possible Webitel applications and their interactions are described. In this document, we will cover the main recommendations regarding monitoring and maintenance of the Webitel infrastructure.
Resources
Hardware requirements may vary depending on the applications running on the server or VM, as well as the intensity of application usage. A properly functioning operating system, running on top of the infrastructure, is crucial to maintaining the stability of the entire application complex. Early detection of spikes in resource usage helps prevent possible incidents in the future.
Below are the key components and resource monitoring recommendations.
Database
Webitel uses PostgreSQL as a unified storage for both historical and operational data for all its services.
-
CPU - When executing complex queries, the database maximizes CPU usage. Monitoring should ensure that CPU utilization does not exceed 80% for more than 5 minutes. If CPU utilization frequently reaches 80% or higher, it may be necessary to analyze the queries or increase the number of cores on the server.
-
RAM - Large and complex queries use RAM to generate responses. Therefore, it is necessary to monitor RAM usage to ensure that the occupied memory does not exceed 85%. Note that there are concepts of used memory and cached memory. Cached memory is temporarily used and automatically released. You need to monitor used memory (this also applies to other services where monitoring of RAM usage is required).
-
Disk - When it comes to disks under the database, disk latency is important. A slow disk means slow database performance, which affects all Webitel applications. As for free disk space, more than 90% occupied space is a critical signal.
It is good practice to use database replication for working with analytical reports (such as Grafana or another external service).
Consul and RabbitMQ
To determine services and exchange messages, Webitel services use Consul and RabbitMQ. The main criteria for monitoring resources are:
-
CPU - CPU utilization per core should not exceed 80% for more than 1 minute.
-
RAM - The occupied RAM should not exceed 80%.
-
Disk - Like with the database, disk latency affects the performance of the entire application complex. It is important not to forget to monitor the free disk space, which should never drop below 5% of the total disk volume or less than 5GB.
Telephony
Webitel's telephony services use several applications: OpenSIPS, FreeSWITCH, and rtpengine. For all three, CPU utilization is critical.
-
CPU - If CPU utilization exceeds 60% for more than 5 minutes, it can negatively affect voice quality (metallic voice, distortion, dropped words).
-
RAM - Occupied RAM should not exceed 80%.
-
Disk - Free disk space should not drop below 10% of the total disk volume.
Call recordings
It is a good practice to use an S3-compatible storage for call recordings. If you use a local file system, we recommend monitoring the availability of free space on the disk used for recordings, which should not fall below 10% of the total disk capacity.
Webitel Services
Below are general recommendations for other Webitel services:
-
CPU - CPU load on each core should not exceed 80% for more than 5 minutes.
-
RAM - Memory usage should not exceed 90%.
-
Disk - The level of free disk space should not drop below 10% of the total disk volume.
Network availability
Bandwidth
The network bandwidth between the servers hosting all Webitel services should be less than 100 Mbps with an average ping value of up to 10ms without packet loss.
To ensure stable telephony operation, the channel between the telephony server and the SIP line provider must meet the following requirements:
-
Guaranteed bandwidth towards the telephony provider should not be less than 10 Mbps (for 30 simultaneous calls using G711 codec).
-
The average ping value to the telephony provider server should not exceed 100ms.
-
The Packet Loss value to the telephony provider server should not exceed 1%.
-
The Jitter value to the telephony provider server should not exceed 50ms.
There are no specific requirements for the guaranteed channel towards telephony users when only chats are used. However, the following requirements must be met for audio and video calls:
-
Not less than 512 kbps per user for chats only.
-
Not less than 1 Mbps per user for chats and audio calls.
-
Not less than 5 Mbps per user for chats, audio, and video calls.
The Packet Loss value to the telephony provider server should not exceed 1%, and the delay should not exceed 50ms. If the delay exceeds 100ms, there may be problems with voice quality (distortion of phrases or missing words).
Traffic filtering
All Webitel services should be able to communicate freely with each other. We recommend referring to the Webitel Architecture document for details on network interaction.
Ports of services
Below is a table of the main ports that need to be monitored for availability:
|
Applications |
Ports |
|---|---|
|
Consul |
8500/tcp |
|
RabbitMQ |
5672/tcp |
|
PostgreSQL |
5432/tcp |
|
Opensips |
5060/udp, 5060/tcp, 5061/tcp |
|
FreeSWITCH |
5080/udp, 5080/tcp |
|
Nginx |
443/tcp |
Service Availability
Important to check the validity of the SSL certificate for Nginx.
All Webitel services must be launched and operational.
webitel-app webitel-uac webitel-api engine messages-srv messages-bot flow_manager call_center storage freeswitch ngcp-rtpengine-daemon opensips grafana-server nginx
Monitoring of telephony
It is a good practice to monitor SIP and RTP protocols using Homer. The configuration is described in the article Monitoring SIP and RTP protocols. This will help diagnose telephony quality and availability issues more quickly.