Manual for the Maintenance and Problem Solving of the Aarenet VoIP Switch

From help.aarenet.com
Jump to: navigation, search


Note The features and/or parameters listed in this article may not be available from your telephone service provider.



Home Support

 

 

Download PDF

 



Introduction

In the first part the VoIP Switch administrator finds instructions for basic VoIP Switch components commands for e.g.:

  • Start the VoIP Switch Component
  • Stop the VoIP Switch Component
  • Check the VoIP Switch Component status
  • Put Out of Service a VoIP Switch Component
  • Put Back to Service a VoIP Switch Component


In the second part the VoIP Switch administrator finds information how to maintain the VoIP Switch due to monitor events e.g.:

  • VoIP Switch component messages
  • DataBase replication broken
  • IP connectivity or network problems
  • Server problems like hard-disk space low


In the third part VoIP Switch administrator finds information how to handle all over VoIP System problematic, e.g.:

  • Fraud


Contents



→ Top

VoIP Switch Component Handling


Warning

All described actions can jeopardize the VoIP Switch's telephony service or server functionality!

If there are uncertainties the contact the "VoIP Switch Supplier Support"





→ Top

Basic VoIP Switch Component Commands  

The VoIP Switch Administrator finds here instruction for VoIP Switch Component handling on OS console level:

  • Start the VoIP Switch Component
  • Stop the VoIP Switch Component
  • Check the VoIP Switch Component status
  • Restart the VoIP Switch Component
  • etc


The VoIP Switch Component command affects only the instance on this server and can be executed with root rights only!


Command syntax:

root# <AS_COMPONENT> <COMMAND_OPTION>



Example:

root# configcenter status



Warning

Do not use other VoIP Switch Component command options as they can produce heavy problems!



Command Command Option Remark
<AS_COMPONENT>

e.g.:

configcenter
  VoIP Switch Component command
  version Lists the VoIP Switch Component version
  status Lists the VoIP Switch Component status and process ID
  stop Stops the VoIP Switch Component

→ The VoIP Switch Component stops immediately and any activity of the component will be interrupted!

  start Starts the VoIP Switch Component

→ The VoIP Switch Component becomes immediately active and operative!

  startpassive Starts the VoIP Switch Component but it remains passive.

→ For becoming operative the VoIP Switch Component has to be started with the start option.
→ Not all VoIP Switch Components offer this option.

  restart Stops and starts the VoIP Switch Component

→ The VoIP Switch Component becomes immediately active and operative!

  restartpassive Stops and starts the VoIP Switch Component but it remains passive.

→ For becoming operative the VoIP Switch Component has to be started with the start option.
→ Not all VoIP Switch Component offer this option.

  error Opens the error log file of the VoIP Switch Component
  log Opens the actual log file of the VoIP Switch Component





→ Top

Put Out of / Back to Service a VoIP Switch Component in an Operative VoIP Switch

The VoIP Switch Administrator finds here instruction for putting out or back of a VoIP Switch Component.




→ Top

Put Out of Service a VoIP Switch Component

There are two ways to put out of service a VoIP Switch Component:


Variant 1: "Stop it hard"

Action:

A) Stop and check the component via the shell:

root# <AS_COMPONENT> stop
root# <AS_COMPONENT> status


The consequences are that the component stops immediately its operative work and all its running tasks.


The following VoIP Switch components may be stopped this way without jeopardizing the telephony service:

  • ConfigCenter
  • AdminCenter
  • DataAccessCenter
  • MediaCenter
  • RatingCenter
  • DataBase


Note

Make sure that:

  • The second component is active
  • The VoIP Switch administrators, operators and supporters are informed which ConfigCenter, AdminCenter are active
  • The users are able to use the active AdminCenter
  • The provider's CRM is able to use the active DataAccessCenter
  • The active RatingCenter is producing the CDR




Variant 2: "Stop it gracefully"

Action:

A) Stop gracefully the component via the ConfigCenter.

For the following components do flip the "active – passive" role:

  • HealthCenter
  • LoadBalancer
  • CallBalancer

do:

ConfigCenter GUI → Menu "System" → Menu "Components"
→ Click the active component HealthCheck
→ Click the fat right arrow at "Make component passive"
→ Confirm by clicking Button [ Yes ]


For the following components do a "pre-bar":

  • ServiceCenter
  • MediaServer
  • FaxServer
  • CallAgent

do:

ConfigCenter GUI → Menu "System" → Menu "Components"
→ Click the desired VoIP Switch component
→ Change the parameter "Acceptance" to 0


C) Wait until the component displays no activity anymore.

ConfigCenter GUI → Menu "System" → Menu "Components"


D) Stop and check the component via the shell:

root# <AS_COMPONENT> stop
root# <AS_COMPONENT> status





→ Top

Put Back to Service a VoIP Switch Component

There are two ways to put back to service a VoIP Switch Component:


Variant 1: "Start it"

Action:

A) Start and check the component via the shell:

root# <AS_COMPONENT> start
root# <AS_COMPONENT> status


The consequence is that the component starts immediately its operative work.


Variant 2: "Start it gracefully"

This variant may make sense when the following VoIP Switch components shall become active but not operative immediately:

  • ServiceCenter
  • MediaServer
  • FaxServer
  • CallAgent


Action:

A) Start "passive" the component via the ConfigCenter.

root# <AS_COMPONENT> startpassive
root# <AS_COMPONENT> status


B) Make the component operative at the appropriate time:

ConfigCenter GUI → Menu "System" → Menu "Components"
→ Click the desired VoIP Switch component
→ Change the parameter "Acceptance" to 100
The "Acceptance" may by any value >0 according. Choose according the load balancing scheme of the component.


C) Check if the component displays activity:

ConfigCenter GUI → Menu "System" → Menu "Components"




→ Top

Work Flow for Analyzing VoIP Switch Problems  

Note

Not every red alarm jeopardizes the telephony service as a whole but a bulk of yellow warnings may endanger it!



The VoIP Switch Administrator and other service personnel find here a work flow for analyzing VoIP Switch problem indications and find out the appropriate action.

The main task is to find out if:

  1. The situation jeopardizes the telephony service as a whole, e.g.:
    • IP network issues
    • Several VoIP Switch servers failed or off line
       
  2. The database replication is broken
    • IP network issues
    • Server with running database failed
    • Linux service MySQL failed
       
  3. The situation hampers the operation of configuration of customer accounts, addresses etc.
    • Management server failed or off line
    • VoIP Switch component ConfigCenter, AdminCenter DataAccessCenter, RatingCenter stopped working correctly


The VoIP Switch Administrator finds here the work flow for analyzing VoIP Switch problems:



Analysis:

1. Check if it is a single alarm or a bulk alarm situation.

a) Connect to the VoIP Switch monitor Xymon "Main View"
→ As a rule of thumb: It is a single error if only one issue is displayed.



2. Analyze and treat a single alarm situation:

a) Check the contents of the error message.
b) Compare the error description against the Indication "Xymon Event" ones in chapter "VoIP Switch Maintenance"
c) Check if the actual situation is equal or similar as described and the recommended actions suitable.
d) Execute the suitable actions.
→ If you are not sure contact the "VoIP Switch Supplier Support"



3. Analyze the bulk alarm situation:

a) Get a first overview of the situation by analyzing the Xymon Monitor :
Check in the MS-01 Xymon monitor the server, component and IP status:
Xymon GUI → Xymon "Main View"
  1. Which type of server are affected?
    • At least one LoadBalancer LB server must be active that the telephony service can work!
    • At least one ServiceCenter SC server must be active that the telephony service can work!
    • At least one server with the operative database must be active that the telephony service can work!
     
  2. Check the CPE registration statistic :
    • Do drop the CPE registrations?
     
  3. Check the call statistic:
    • Do drop the VoIP Switch number of calls?
      Xymon GUI → Management Server → Column "calls_sys"
    • Do drop the calls on one or more ServiceCenter?
      Xymon GUI → ServiceCenter Server → Column "calls_sc"
    • Do drop the calls on one or more gateways?
      Xymon GUI → Gateway → Column "calls_gw"
     
  4. Do the same check as above on MS-02 Xymon Monitor
     
  5. Does the comparison of the two Xymon Monitor point out that:
    • The same single component on the same server failed?
    • All components of one side failed?
    • The Xymon Monitor sees only the components on its side?
    • The telephony service is running at least on one side


b) Extend the overview by analyzing the ConfigCenter "System Component" Overview :
Check in the MS-01 ConfigCenter the status of the VoIP Switch components:
ConfigCenter GUI → Menu "System" → Menu "Components"
  1. Are actually calls running and new calls can be established?
     
  2. Make test calls:
    • To and from a telephone number in the PSTN
    • On-net test calls
    • Call a well known VoiceMail Box from on-net and from PSTN
     
  3. Is the number of running calls fast dropping and no new calls are established?
     
  4. Which type of VoIP Switch components are affected?
    • At least one LoadBalancer component must be active that the telephony service can work!
    • At least one ServiceCenter component must be active that the telephony service can work!
    • At least one operative database must be active that the telephony service can work!
    • Does this picture correspond to the results of the first overview in the Xymon Monitor ?
     
  5. Do the same check as above on MS-02 ConfigCenter
     
  6. Does the comparison of the two ConfigCenter point out that:
    • The same single component on the same server failed?
    • All components of one side failed?
    • The ConfigCenter sees only the components on its side?
    • The telephony service is running at least on one side



4) Treat bulk alarm situations:

a) Is there a VoIP Switch server hardware, RAID or hard-disk problem?
→ Indications:
Indication:
<HOST_NAME> "snmptrapd" "failure"
<HOST_NAME> "snmptrapd" "degraded"


→ Actions:
For DELL server see: "Treating Problems of Servers from DELL Inc ®"



b) Is the IP connectivity affected to or between VoIP Switch servers?


Note

If VoIP Switch servers are affected then a lot of additional alarming messages of missing VoIP Switch components will pop up!!
This can be one of the most annoying erroneous situations!



→ Indications:
Indication:
<HOST_NAME> conn "Host does not respond to ping" <IP_ADDRESS>
* Dropping CPE registrations !


→ Actions:
See: "Maintenance Due to IP Network Alarm"


c) → If you are not sure what to do then contact the "VoIP Switch Supplier Support"




→ Top

VoIP Switch Server Maintenance


→ Top

Maintenance Due to VoIP Switch Components General Alarms  


→ Top

Maintenance Due to Messages from Java Framework

Indication "Xymon Event":
Monitor Log, Email or SMTP Trap may contain the following information:

Indication:
<HOST_NAME> msgs "Jdbc"



Description:
Java internal exceptions. Mostly due to database accesses which are hopefully handled by the application.


Consequences:
→ For the VoIP Switch telephony service:

  • Mostly none

→ For the operations:

  • Mostly none

→ For the user:

  • Mostly none


Solution:
Observe the frequency of this event


Action:
1. Observe the frequency of this event

2. If the erroneous condition is to frequent then contact the "VoIP Switch Supplier Support"!




→ Top

Maintenance Due to Messages from VoIP Switch Components Internals

Indication "Xymon Event":
Monitor Log, Email or SMTP Trap may contain the following information:

Indication:
<HOST_NAME> msgs "EventQueue"
<HOST_NAME> msgs "SysCompDatabase - Cannot evalute status"



Description:
These events may happen on all VoIP Switch servers and are VoIP Switch component internal notes.


Consequences:
→ For the VoIP Switch telephony service:

  • Mostly none

→ For the operations:

  • Mostly none

→ For the user:

  • Mostly none


Solution:
Observe the frequency of this event


Action:
1. Observe the frequency of this event

2. If the erroneous condition is to frequent then contact the "VoIP Switch Supplier Support"!




→ Top

Maintenance Due to Messages from LoadBalancer Server


→ Top

Maintenance Due to HealthCheck Message

Indication "Xymon Event":
Monitor Log, Email or SMTP Trap may contain the following information:

Indication:
<HOST_NAME> msgs "HealthCheck"



Description:
The HealthCheck supervises the status of virtual IP addresses and their associated physical IP addresses. If the HealthCheck on one server doesn't see the peer physical IP address it takes over the virtual IP address. It most probably points out an IP network problem in the "Public Voice Segment"


Consequences:

Warning

This erroneous condition must be checked within reasonable time!


→ For the VoIP Switch telephony service:

  • None if concurrently no other IP network problems arise

→ For the operations:

  • None

→ For the user:

  • None


Solution:
Solve the IP network if needed.

Check status the VoIP Switch component with an active-passive scheme:

  • LoadBalancer
  • CallBalancer
  • RatingCenter


Action:
1. Check if the IP network is OK


2. Check the status of the LoadBalancer components

→ Confirm if the active LoadBalancer swapped, e.g. from *-lb-01 to *-lb-02


3. Check the status of the CallBalancer components

→ Confirm if the active CallBalancer swapped, e.g. from *-lb-01 to *-lb-02


4. Check the status of the RatingCenter components

→ Confirm if the active CallBalancer swapped, e.g. from *-ms-01 to *-ms-02
→ Confirm if the active RatingCenter is processing the CDR's


5. Treat the problem:

a) If there are IP network problems
→ Actions see: "Maintenance Due to IP Network Alarm"


b) If there is a LoadBalancer problem try to restart the component:
  root# loadbalancer restart


c) If there is a CallBalancer problem try to restart the component:
  root# callbalancer restart


d) If there is a RatingCenter problem try to restart the component:
  root# ratingcenter restart


e) If the RatingCenter swapped make sure that the CDR are processed:
  1. ConfigCenter GUI → Menu "System" → Menu "Components"
    → Click line at "active" RatingCenter -> In dialog select "Process CDRs"
    → Click button [ Close ]
  2. The CDR CSV-Files are processed:
  root# cd /home/servicecenter/cdrs


Check if the CSV files have an actual time stamp which indicates that new CDRs where written:
  root# ls -ltra


Open a CSV file and check for new entries, e.g.:
  root# less monthly.csv



6. If the erroneous condition remains then contact the "VoIP Switch Supplier Support"!

If those events are logged subsequently then rapport it to the "VoIP Switch Supplier Support"!




→ Top

Maintenance Due to LoadBalancer Message

Indication "Xymon Event":
Monitor Log, Email or SMTP Trap may contain the following information:

Indication:
<HOST_NAME> msgs "Balancer"



Description:
LoadBalancer internal problem that is treated internally by the component. The LoadBalancer has an "active-passive" redundancy scheme.


Consequences:
→ For the VoIP Switch telephony service:

  • None

→ For the operations:

  • None

→ For the user:

  • None


Solution:
Not defined yet


Action:
1. If the erroneous condition remains then contact the "VoIP Switch Supplier Support"!

If those events are logged subsequently then rapport it to the "VoIP Switch Supplier Support"!




→ Top

Maintenance Due to LoadBalancer Message "Missing ServiceCenter"

Indication "Xymon Event":
Monitor Log, Email or SMTP Trap may contain the following information:

Indication:
<HOST_NAME> msgs "BalancerSwitch" <SERVICECENTER> "not available anymore"



Description:
The LoadBalancer indicates that it doesn't see a certain ServiceCenter.

This happens when:

  • the ServiceCenter has restarted
→ the event will be transient
  • the ServiceCenter is stopped
→ the event will remain until the ServiceCenter is started again
  • no IP connectivity
→ the event will remain until the IP connectivity is reestablished


Consequences:

Warning This erroneous condition must be handled within reasonable time!


→ For the VoIP Switch telephony service:

  • None, the other ServiceCenter take over the work load
  • If a ServiceCenter is missing then the VoIP Switch looses redundancy capability

→ For the operations:

  • None

→ For the user:

  • None


Solution:
Solve the IP network problems if needed:

→ Actions see: "Maintenance Due to IP Network Alarm"

Solve the server problem if needed

→ Actions see: "Treating Server Hardware Problems"


Action:
1. Check if the IP network is OK


2. Check the status of the ServiceCenter components

→ Confirm that the reported ServiceCenter server is affected


3. Check the reported ServiceCenter server with the "Server Administrator (OMSA)"


4. Treat the problem:

a) If there are IP network problems
→ Actions see: "Maintenance Due to IP Network Alarm"


b) If there is a ServiceCenter problem try to restart the component:
  root# servicecenter restart



5. If the erroneous condition remains then contact the "VoIP Switch Supplier Support"!




→ Top

Maintenance Due to CallBalancer Message

Indication "Xymon Event":
Monitor Log, Email or SMTP Trap may contain the following information:

Indication:
<HOST_NAME> msgs



Description:
The CallAgent dispatches MGCP messages to the CallAgent components.

The CallAgent has an "active-passive" redundancy scheme.


Consequences:

Warning

This erroneous condition must be checked within short time!


→ For the VoIP Switch telephony service:

  • None

→ For the operations:

  • None

→ For the user:

  • Users with MGCP MTA as telephone adapter may not be able to telephone


Solution:
Check status the CallBalancer active-passive scheme and if the MGCP messages are processed.


Action:
1. Check if the IP network is OK


2. Check the status of the CallBalancer components:

a) Confirm if the active CallBalancer swapped , e.g. from *-ms-01 to *-ms-02


b) Confirm if the active CallBalancer is processing the MGCP messages
→ Check if the CallAgent treat MGCP connections and that the total number of MGCP connections is not dropping.


3. Check if the MGCP audits are not dropping:

a) Connect to a Xymon monitor and check in Xymon Column "regs" the numbers of MGCP-Active and MGCP-Brocken


b) Check the questions:
  • Do drop the number of MGCP-Active?
→ If yes => There may be a IP backbone problem or CallBalancer, CallAgent outage!


4. Treat the problem:

a) If there are IP network problems
→ Actions see: "Maintenance Due to IP Network Alarm"
b) If there is a CallBalancer problem try to restart the component:
  root# callbalancer restart



5. If the erroneous condition remains then contact the "VoIP Switch Supplier Support"!




→ Top

Maintenance Due to MediaServer Message

Indication "Xymon Event":
Monitor Log, Email or SMTP Trap may contain the following information:

Indication:
<HOST_NAME> msgs "MediaConnection (06) Cannot handle outgoing message"
<HOST_NAME> msgs "MediaServerProvider (MS) refreshing mediaserver mc1ms2 failed"



Description:
The MediaServer records or plays back announcements and VoiceMail messages. Occasionally it may not correctly record a message and transfer it to the MediaCenter or play back an announcement or message.

The MediaServer can act as media proxy for active connections and transcode media streams.


Consequences:

Warning

If in this VoIP Switch the MediaServer acts as media proxy then the erroneous situation must be checked soon!


→ For the VoIP Switch telephony service:

  • None

→ For the operations:

  • None

→ For the user:

  • A VoiceMail Box message or announcement couldn't correctly record or played back.
  • User may not hear the other side or vica versa.


Solution:
Depends on the situation.


Action:
1. If the erroneous condition remains or happens to often then contact the "VoIP Switch Supplier Support"!




→ Top

Maintenance Due to Messages from Management Server  


→ Top

Maintenance Due to AdminCenter Message "Missing FMC Application Server"

Indication "Xymon Event":
Monitor Log, Email or SMTP Trap may contain the following information:

Indication:
<HOST_NAME> msgs "FmcRequest - Cannot post request"
<HOST_NAME> msgs "FmcProvider - could not provision pbx"



Description:
The AdminCenter tried to configure the FMC application.

Consequences:

Warning

This erroneous condition is sporadic or must be handled within reasonable time!


→ For the VoIP Switch telephony service:

  • None

→ For the operations:

  • A configuration on a FMC server failed

→ For the user:

  • A user "an MC-Phone" is not working


Solution:
Check the state of the FMC servers and their IP connectivity toward the VoIP Switch servers.


Action:
1. Check if the IP network is OK


2. Check the status of the FMC server


3. Treat the problem:

a) If there are IP network problems
→ Actions see: "Maintenance Due to IP Network Alarm"


b) If there is a FMC server problem
→ Contact the "VoIP Switch Supplier Support"!


4. If the erroneous condition remains then contact the "VoIP Switch Supplier Support"!




→ Top

Maintenance Due to AdminCenter Message "Missing Redirection Server"

Indication "Xymon Event":
Monitor Log, Email or SMTP Trap may contain the following information:

Indication:
<HOST_NAME> msgs "FmcProvider - could not provision user" <USER_TELEPHONE_NUMBER>



Description:
The mobile app "an MC-Phone" couldn't get the information from the associated redirection server (by default a Comdasys server located in Europe) where its responsible configuration server is located. Therefore the users "an MC-Phone" couldn't obtain its configuration and will not work.


Consequences:

→ For the VoIP Switch telephony service:

  • None

→ For the operations:

  • None

→ For the user:

  • The mobile app "an MC-Phone" will not work


Solution:
Make sure to have good IP connectivity to the Internet


Action:
1. The user must find a reliable Internet connection and restart the app "an MC-Phone" until it gets its configuration




→ Top

Maintenance Due to ConfigCenter Message "Wrong User Login"

Indication "Xymon Event":
Monitor Log, Email or SMTP Trap may contain the following information:

Indication:
<HOST_NAME> msgs "msgsAccessLogger - ADMIN:login; user" <USERNAME> "-> User Blocked"



Description:
A VoIP Switch Administrator, Operator, Supporter tried to login to the ConfigCenter with wrong credentials. The user will be blocked for several minutes.


Consequences:

→ For the VoIP Switch telephony service:

  • None

→ For the operations:

  • The user will be blocked from the ConfigCenter for several minutes.

→ For the user:

  • None


Solution:
Wait


Action:
1. Retry after a few minutes with the correct login credentials.


2. If the erroneous condition remains then contact the "VoIP Switch Supplier Support"!




→ Top

Maintenance Due to ConfigCenter Message "DB Replication Check"

Indication "Xymon Event":
Monitor Log, Email or SMTP Trap may contain the following information:

Indication:
<HOST_NAME> msgs JdbcReplicationMonitor "Replication" '<BROKEN_REPLICATION_DIRECTION>' "is broken!"



Description:
The database replication check was not successful. This can happen from time to time when the database has to process heavy load.

In most cases the database replication recovers automatically even after several hours of failed replication. If it is not recovering then this is a severe problem and must be treated.


Consequences:

Warning If this erroneous condition remains then this is a SEVERE erroneous condition and must be treated within short time!


→ For the VoIP Switch telephony service:

  • The database redundancy is endangered

→ For the operations:

  • None

→ For the user:

  • None


Solution:
Restore the MySQl DB replication if the erroneous condition remains.


Action:
1. Check periodically (ca. every half hour) the Xymon monitor for this error condition.

2. If the erroneous condition remains then contact the "VoIP Switch Supplier Support"!




→ Top

Maintenance Due to DataAccessCenter Message

Indication "Xymon Event":
Monitor Log, Email or SMTP Trap may contain the following information:

Indication:
<HOST_NAME> msgs "Jdbc" "SQL-Exception during statement"



Description:
A configuration via the DataAccessCenter may have failed.

This may happen if the database is under heavy load.


Consequences:

Warning This erroneous condition must be checked within reasonable time!


→ For the VoIP Switch telephony service:

  • None

→ For the operations:

  • A customer configuration may have failed (which is hopefully covered by the CRM application).

→ For the user:

  • None


Solution:
Inter-working between the DataAccessCenter and database must be optimized.


Action:
1. If this Java event is logged subsequently then rapport it to the "VoIP Switch Supplier Support"!




→ Top

Maintenance Due to RatingCenter Message

Indication "Xymon Event":
Monitor Log, Email or SMTP Trap may contain the following information:

Indication:
<HOST_NAME> msgs



Description:
The RatingCenter has an "active-passive" scheme. Every RatingCenter event has to be checked if the active RatingCenter is working correctly and is processing the CDRs.


Consequences:

Warning

This erroneous condition must be checked within short time!



→ For the VoIP Switch telephony service:

  • None

→ For the operations:

  • A CDR may be not written correctly into the CDR database and/or CSV files.
  • The customer billing contains not all CDR


→ For the user:

  • None


Solution:
Check status the RatingCenter active-passive scheme and if the CDR are processed.


Action:
1. Check the status of the RatingCenter component

→ Confirm if the active RatingCenter is processing the CDR's


2. Treat the problem:

a) If the RatingCenter swapped make sure that the CDR are processed:
Open the ConfigCenter Menu "Components"
→ Click line at "active" RatingCenter -> In dialog select "Process CDRs"
→ Click button [ Close ]


b) Check if the CDR CSV-Files are processed:
Open the CDR directory:
  root# cd /home/ratingcenter/cdrs


Check if the CSV files have an actual time stamp which indicates that new CDRs where written:
  root# ls -ltra


Open a CSV file and check for new entries, e.g.:
  root# less monthly.csv


3. If the erroneous condition remains then contact the "VoIP Switch Supplier Support"!




→ Top

Maintenance Due to Messages from ServiceCenter Server  


→ Top

Maintenance Due to FaxServer Message

Indication "Xymon Event":
Monitor Log, Email or SMTP Trap may contain the following information:

Indication:
<HOST_NAME> msgs



Description:
Fax may not received correctly. The mailing of the PDF file may fail.


Consequences:

→ For the VoIP Switch telephony service:

  • None

→ For the operations:

  • None

→ For the user:

  • A received Fax may not be correctly received and transferred to the user. This situation is usually handled by the Fax device either automatically or manually.


Solution:
Restart the FaxServer component.


Action:
1. Check if no Fax at all are received.

→ Send test fax.


2. Restart the FaxServer:

  root# faxserver restart



3. If the FaxServer logs subsequently then rapport it to the "VoIP Switch Supplier Support"!




→ Top

Maintenance Due to MediaCenter Message

Indication "Xymon Event":
Monitor Log, Email or SMTP Trap may contain the following information:

Indication:
<HOST_NAME> msgs MediaCenterCall
<HOST_NAME> msgs MediaServer
<HOST_NAME> msgs "file not found"



Description:
The MediaCenter handles the WAV files from announcements and VoiceMail messages. Occasionally it may not correctly record a message, loose a message file. Also an order to the MediaServer may fail to replay a message or announcement.


Consequences:

→ For the VoIP Switch telephony service:

  • None

→ For the operations:

  • None

→ For the user:

  • A VoiceMail Box message or announcement couldn't correctly recorded or played back


Solution:
Clean up the VioceMail message date base.

Optimize the inter-working of MediaCenter and MediaServer


Action:
1. If those events are logged subsequently then rapport it to the "VoIP Switch Supplier Support"!




→ Top

Maintenance Due to ServiceCenter Message

Indication "Xymon Event":
Monitor Log, Email or SMTP Trap may contain the following information:

Indication:
<HOST_NAME> msgs



Description:
The ServiceCenter is the main component of the VoIP Switch. It computes the connections signaling and telephony features.

The ServiceCenter has an all active redundancy scheme. If one ServiceCenter fails the remaining ServiceCenter take over the work load.


Consequences:

Warning This erroneous condition must be checked and treated within short time!


→ For the VoIP Switch telephony service:

  • As long one ServiceCenter remains the VoIP Switch works!

→ For the operations:

  • None

→ For the user:

  • None


Solution:
Depends on the analyzed problem.


Action:
1. Check how acute the problem is:

a) Check if the IP network is OK


b) Check the status of the ServiceCenter component
  • Are enough ServiceCenter active that the work load can be treated?
→ If NO then there is a most SEVERE erroneous situation


c) Check in the ConfigCenter Menu "Components" if the active ServiceCenter is processing the connections:
  • Do drop the total number of connections?
→ If YES then there is a most SEVERE erroneous situation:
→ There may be a IP backbone problem!


d) Check in the Xymon Column "regs" the number of registered SIP-Devices:
  • Do drop the number of SIP-Devices?
→ If YES then there is a most SEVERE erroneous situation:
→ There may be a IP backbone problem!


e) Check the reported ServiceCenter server with the "Server Administrator (OMSA)"
  • Are problems signaled?


2. Treat the problem:

a) If there are IP network problems
→ Actions see: "Maintenance Due to IP Network Alarm"


b) If there is a ServiceCenter problem try to restart the component:
  root# servicecenter restart



c) If there is a hardware problem:
→ Actions see: "Treating Server Hardware Problems"


3. If the erroneous condition remains then contact the "VoIP Switch Supplier Support"!




→ Top

Maintenance Due to ServiceCenter Message "License Violation"

Indication "Xymon Event":
Monitor Log, Email or SMTP Trap may contain the following information:

Indication:
<HOST_NAME> msgs License "License Violation"
<HOST_NAME> msgs License "grace-period remaining:"



Description:
This ServiceCenter has a license problem and will work only for the remaining grace period.


Consequences:

Warning

This erroneous condition must be checked and treated within the remaining grace period!


→ For the VoIP Switch telephony service:

  • As long one ServiceCenter remains the VOIP Switch works
  • The telephony service will be stopped on this ServiceCenter after passing of the grace period

→ For the operations:

  • None

→ For the user:

  • None


Solution:
Get actual licenses from the VoIP Switch Supplier.


Action:
1. Check in the ConfigCenter Menu "Components" which ServiceCenter component has a license problem and how long the grace period is.


2. Contact the "VoIP Switch Supplier Support"!




→ Top

Maintenance Due to ServiceCenter Message "Failed Emergency Call"

Indication "Xymon Event":
Monitor Log, Email or SMTP Trap may contain the following information:

Indication:
<HOST_NAME> msgs ServicePrioCallControl "Could not establish priority-call". Call from Connection/<SIP_CALL_ID>/<CALLING_NUMBER> to <CALLED_EMERGENCY_NUMBER>



Description:
A user's emergency call failed!


Consequences:

Warning Severe legal condition that must be handled!

This case can have legal consequences for the provider!


→ For the VoIP Switch telephony service:

  • None

→ For the operations:

  • None

→ For the user:

  • The emergency call did not work


Solution:
Check if the call routing failed due to a VoIP Switch emergency call treating or routing. If yes fix them.

Check if the PSTN provider did reject the emergency call. If yes contact the PSTN provider.


Action:
1. Archive traces for legal responsibilities:

  • Save the trace of this emergency call and all subsequent calls from this user toward emergency numbers


2. Check where the call was rejected.

  • If the call was rejected at the PSTN provider side contact the PSTN provider and let investigate into this case.


3. Check the VoIP Switch's emergency routing:

  • Emergency numbers
  • Emergency number rewriter
  • Routing Tables toward the PSTN
  • RuleSet that may tag outgoing calls toward emergency numbers


4. Check if any IP network devices may interfere with the SIP signaling:

  • If there are external Session Board Controller SBC or SIP-SS7 Gateway involved check their behavior concerning the emergency calls
  • If a firewall FW is involved check that no SIP ALG or "SIP Helpers" are active


5. Treat the problem:

a) Adjust the emergency routing of the VoIP Switch if needed


b) Fix the IP network devices if needed


6. If the erroneous condition remains then contact the "VoIP Switch Supplier Support"!




→ Top

Maintenance Due to ServiceCenter Message "TopStop"

Indication "Xymon Event":
Monitor Log, Email or SMTP Trap may contain the following information:

Indication:
<HOST_NAME> msgs ServiceRatingControl (01) <CALLING_NUMBER> "max available charges reached for account:"
<HOST_NAME> msgs AlarmLogger "[TOPSTOP][ALARM] tenant" <TENANT> "topstop limit nearly reached for account"



Description:
A user's TopStop limit was reached!


Note

A TopStop alarm early in the month or for a lot of users indicates a possible fraud case!



Consequences:

→ For the VoIP Switch telephony service:

  • None

→ For the operations:

  • A TopStop alarm early in the month indicates a possible fraud case

→ For the user:

  • No outgoing calls except emergency call will work when the TopStop limit is reached


Solution:
If it is a regular TopStop then contact the user and enhance the monthly TopStop limit.

If it is a fraud situation handle according "Best Practice: Fraud"


Action:
1. Check if it is a regular TopStop situation.


2. Check if it is a possible fraud case:

  • Reached TopStop limit early in the month?
  • Concurrently a lot of TopStop limits reached?
  • High call peak during the night or weekend?
→ Check at Xymon Column " calls_sys " .


3. Treat according " Best Practice for "Fraud Situation"


4. If the erroneous condition remains then contact the "VoIP Switch Supplier Support"!




→ Top

Maintenance Due to Nimbus Message

Indication "Xymon Event":
Monitor Log, Email or SMTP Trap may contain the following information:

Indication:
<HOST_NAME> msgs "NimbusLink (ue) Cannot subscribe"



Description:
The Nimbus component is a VoIP Switch internal bus that connects the various VoIP Switch components on the servers. If a Nimbus endpoint on one server is missing the other Nimbus endpoints start to complain.

If a Nimbus endpoint is missing then the component may be stopped, the server not on line or an IP network problem.

→ This error is often displayed during VoIP Switch software upgrades of the servers. In this situation just wait until the upgrade is finished.


Consequences:

Warning

This erroneous condition must be checked and treated within reasonable time!


→ For the VoIP Switch telephony service:

  • Usually none

→ For the operations:

  • None

→ For the user:

  • None


Solution:
Solve the IP network problems or server problems if needed.


Action:
1. Check if the IP network is OK


2. Check the status of the VoIP Switch components located on the server where the Nimbus is missing:

→ Is only Nimbus missing or other components to on this server?


3. Treat the problem:

a) If there are IP network problems
→ Actions see: "Maintenance Due to IP Network Alarm"


b) If there is not a planned outage then try to solve the server problem


c) If there is not a planned outage then try to restart the Nimbus on this server:
  root# nimbus restart



4. If the erroneous condition remains then contact the "VoIP Switch Supplier Support"!




→ Top

Maintenance Due to Messages from CallAgent Server


→ Top

Maintenance Due to CallAgent Message

Indication "Xymon Event":
Monitor Log, Email or SMTP Trap may contain the following information:

Indication:
<HOST_NAME> msgs



Description:
The CallAgent treats the message exchange with the MGCP MTA. The CallAgent has an all active redundancy scheme. If one CallAgent fails the remaining CallAgent take over the work load.


Consequences:

Warning

This erroneous condition must be checked within short time!


→ For the VoIP Switch telephony service:

  • As long one CallAgent remains the VOIP Switch works

→ For the operations:

  • None

→ For the user:

  • Single MGCP MTA at the user's premises is not working correctly. The telephone service may not always work for this users.


Solution:
Depends on the analyzed problem.


Action:
1. Check if the IP network is OK


2. Check the status of the CallAgent components

→ Confirm that the reported CallAgent server is affected

3. Check the reported CallAgent server with the "Server Administrator (OMSA)"


4. Treat the problem:

a) If there are IP network problems
→ Actions see: "Maintenance Due to IP Network Alarm"


b) If there is a CallAgent problem try to restart the component:
  root# callagent restart



5. If the erroneous condition remains then contact the "VoIP Switch Supplier Support"!




→ Top

Maintenance Due to Messages from CPECenter Server


→ Top

Maintenance Due to CpeCenterMessage

Indication "Xymon Event":
Monitor Log, Email or SMTP Trap may contain the following information:

Indication:
<HOST_NAME> msgs
<HOST_NAME> msgs "DevAdmProvider (-1) duplicated devicetype:" <DEVICE_TYPE>



Description:
During the preparation of a device configuration file two device configuration templates were found. If a CPE loads a device configuration file which was produced under these conditions it may not work correctly.


Consequences:

→ For the VoIP Switch telephony service:

  • None

→ For the operations:

  • None

→ For the user:

  • The CPE may not work with the produced configuration file


Solution:
One device configuration template has to be deleted.


Action:
1. Contact the "VoIP Switch Supplier Support"!




→ Top

Maintenance Due to IP Network Alarms

Indication "Xymon Event":
Monitor Log, Email or SMTP Trap may contain the following information:

Indication:
<HOST_NAME> conn "Host does not respond to ping" <IP_ADDRESS>



Description:
This test performs a "ping" toward the IP address of the host. If the "ping" is not answered then there is a problem with the IP network, e.g.:

  • Pinged host defect or off line
  • Layer2 IP Switch defect or off line
  • Brocken IP backbone network


Consequences:

Warning

MOST SEVERE condition if several VoIP Switch server are affected for a longer duration (ca 15min)!


→ For the VoIP Switch telephony service:

  • The telephone service may be interrupted

→ For the operations:

  • The MySQL databases may loose their replication

→ For the user:

  • The telephone service may be interrupted for the users!


Solution:
Solve the IP network problems!

Check the IP network devices:

  • Pinged host
  • Layer 2 IP switches
  • IP Routes
  • Firewalls

Check the VoIP Switch server IP connectivity.


Action:
1. Evaluate the severity of the IP network outage:

a) Check if it is a occasional ping failure:
  • Only one host doesn't respond
  • Only 1 or 2 poll cycle fail
→ Type "Occasional Failure":
  • In this situation the erroneous situation may be neglected.


b) Check if it is only a single host:
  • One host doesn't respond anymore
→ Type "Host Failure":
  • Check the hardware condition and IP connectivity of this device
  • Check with the VoIP Switch Administrator in the ConfigCenter Menu "Components" how the VoIP Switch is affected


c) Check if more than one VoIP Switch server is affected:
  • More than one VoIP Switch server don't respond anymore
→ Type "VoIP Switch Failure":
1. Check with the VoIP Switch Administrator how the VoIP Switch is affected:
a) Connect to both (*-ms-01, *-ms-02) ConfigCenter Menu "Components" and check the component status
b) Check the questions:
  • Which VoIP Switch servers are not visible?
  • Are they the same on both ConfigCenter?
  • Does one ConfigCenter see only the servers on its side? E.g.:
Side A components complain that they doesn't see their peers on Side B?
Side B components complain that they doesn't see their peers on Side A?
→ If yes => There is a heavy IP backbone problem
c) Check in the ConfigCenter Menu Channles if new connections were established since the IP outage
→ If yes => Some users still can make phone calls


2. Check with the VoIP Switch Administrator how the users are affected:
a) Connect to both (*-ms-01, *-ms-02) Xymon Column "regs" and check the CPE and MTA registrations status.
b) Check the questions:
  • Check: Do drop the user's CPE registration?
→ If yes => There is a heavy IP backbone problem some users cannot use the telephony service anymore!


3. Treat the Type "VoIP Switch Failure":

a) VoIP Switch Administrator:
In this situation the erroneous situation may be neglected. Observe if the situation remains.


2. Treat the Type " Occasional Failure ":

a) VoIP Switch Administrator:
If possible pre-bar the VoIP Switch component on this server
b) Solve the IP or hardware issue with the failed host


3. Treat the Type "VoIP Switch Failure":

a) VoIP Switch Administrator:
Contact the "VoIP Switch Supplier Support"


4. If the erroneous condition remains then contact the "VoIP Switch Supplier Support"!




→ Top

Maintenance Due to Operating System Alarms

The VoIP Switch Administrator and/or server service personnel find here instructions for managing problems indicated by the operating system supervision.




→ Top

Maintenance Due to Supervised Processes Missing

Indication "Xymon Event":
Monitor Log, Email or SMTP Trap may contain the following information:

Indication:
<HOST_NAME> procs "Processes not OK" <MISSING_PROCESS>



Description:
One or more supervised process of a Linux service or VoIP Switch component is missing.


Consequences:

Warning

SEVERE erroneous condition that must be handled!


→ For the VoIP Switch telephony service:

  • Depends If a VoIP Switch component is missing then the VoIP Switch looses redundancy capability
  • If a Linux service is missing the VoIP Switch may be hampered or the server is not working correctly

→ For the operations:

  • Depends on the VoIP Switch components or Linux service

→ For the user:

  • Depends on the VoIP Switch components or Linux service


Solution:
Restart the VoIP Switch component or Linux service.


Action:
1. Check with the VoIP Switch Administrator if it is possible to restart the component or service without endangering the VoIP Switch telephony service.

→ If possible pre-bar the VoIP Switch component via the ConfigCenter!


2. Restart the VoIP Switch component or Linux service:

a) Restart the VoIP Switch component
  root# <COMPONENT> restart


  • Example:
  root# servicecenter restart



b) Restart the service:
  root# /etc/init.d/<SERVICE> restart


  • Example:
  root# monit restart



3. If the erroneous condition remains then contact the "VoIP Switch Supplier Support"!




→ Top

Maintenance Due to Supervised IP Ports

Indication "Xymon Event":
Monitor Log, Email or SMTP Trap may contain the following information:

Indication:
<HOST_NAME> ports "Ports not OK" <MISSING_PROCESS_PORTS>



Description:
One or more supervised IP port of a Linux service or VoIP Switch component is missing.


Consequences:

Warning

SEVERE erroneous condition that must be handled!


→ For the VoIP Switch telephony service:

  • Depends If a VoIP Switch component is missing then the VoIP Switch looses redundancy capability
  • If a Linux service is missing the VoIP Switch may be hampered or the server is not working correctly

→ For the operations:

  • Depends on the VoIP Switch components or Linux service

→ For the user:

  • Depends on the VoIP Switch components or Linux service


Solution:
Restart the VoIP Switch component or Linux service.


Action:
1. Check with the VoIP Switch Administrator if it is possible to restart the component or service without endangering the VoIP Switch telephony service.

→ If possible pre-bar the VoIP Switch component via the ConfigCenter!


2. Restart the VoIP Switch component or Linux service:

a) Restart the VoIP Switch component
  root# <COMPONENT> restart


  • Example:
  root# servicecenter restart



b) Restart the service:
  root# /etc/init.d/<SERVICE> restart


  • Example:
  root# monit restart



3. If the erroneous condition remains then contact the "VoIP Switch Supplier Support"!




→ Top

Maintenance Due to Supervised Hard-Disk Usage

Indication "Xymon Event":
Monitor Log, Email or SMTP Trap may contain the following information:

Indication:
<HOST_NAME> disk "File systems not OK"



Description:
A hard-disk or hard-disk partition is full. If a hard-disk is full then the Linux operating system behaves unpredictable and the server will most probably crash.


Consequences:

Warning SEVERE erroneous condition that must be handled!


→ For the VoIP Switch telephony service:

  • Depends on the VoIP Switch components running on the server

→ For the operations:

  • Depends on the VoIP Switch components running on the server

→ For the user:

  • Depends on the VoIP Switch components running on the server


Solution:
Identify big files or directories. Delete or archive files externally.


Action:
1. Check hard-disk usage:

  root# df -h



2. Find fat files:

  root# ls -lahS $(find / -type f -size +100k)



  • Example find file sizes >60MByte:
  root# ls -lahS $(find /opt/backup/ -type f -size +60000k)



  • Check for fat files in the following suspicious directories:
    /opt/backup/
  • Do not touch big files in:
    /var/lib/mysql/


3. Find big directories:

  root# du -hs



Example of a more specific search → find directory sizes >1GByte:
  root# du -hs /home/ratingcenter/* | grep G
  root# du -hs /home/*/* | grep G



  • Check the following suspicious directories:
    /opt/backup/
    /home/mediacenter/messages
    //home/ratingcenter/cdrs


4. Prior of deleting files or directories check with the VoIP Switch Administrator if they are not needed anymore!

→ If you are suspicious but not sure if it is wise to delete a certain file or directory then contact the "VoIP Switch Supplier Support"!




→ Top

Maintenance Due to Supervised Memory Usage

Indication "Xymon Event":
Monitor Log, Email or SMTP Trap may contain the following information:

Indication:
<HOST_NAME> memory "Memory low"



Description:
One or more processes consume a lot of memory space. If the memory becomes low the operating system Linux start to swap memory to and from hard-disk. This reduces the performance of the server.


Consequences:

Warning

This erroneous condition must be handled within reasonable time!


→ For the VoIP Switch telephony service:

  • Depends on the VoIP Switch components running on the server

→ For the operations:

  • None

→ For the user:

  • None


Solution:
Identify which process or consumes the memory. Restart the process in order to free memory. Stop and restart the swapping on the server.


Action:

1. If a LoadBalancer *-lb-* or ServiceCenter *-sc-* server is affected:

→ Contact the "VoIP Switch Supplier Support"!


2. Find which processes use the memory:

  • This is a difficult task!
  root# top



3. Stop and restart the swapping:

Preconditions:
  • Choose a day time where the server is not in high load.
  • If possible pre-bar the VoIP Switch components on this server via the ConfigCenter
  • Make sure that the redundant VoIP Switch component is running


a) Restart the responsible process:
  root# /etc/init.d/<PROCESS_NAME> restart



b) Stop the swapping:
  • Don't do this during high load!
  • It will take some time until accomplished!
  root# swapoff -a



c) Restart the swapping:
  root# swapon -a



d) Check if the swap is working regularly:
  root# swapon -s





→ Top

Maintenance Due to Supervised CPU Load

Indication "Xymon Event":
Monitor Log, Email or SMTP Trap may contain the following information:

Indication:
<HOST_NAME> cpu "Load is High"



Description:
One or more processes consume extensively CPU power. This may reduce the performance of the server.


Consequences:

Warning This erroneous condition must be handled within reasonable time!


→ For the VoIP Switch telephony service:

  • Reduced performance on the affected server and VoIP Switch component

→ For the operations:

  • None

→ For the user:

  • None


Solution:
The CPU consuming process has to be identified. If a process is identified it has to be checked if it is a regular or erroneous situation.

If it is a regular situation then it has to be investigated if the servers computing power is still sufficient for this VoIP Switch. If the server hosts a VoIP Switch component which offers an configurable load acceptance via the ConfigCenter then it is worth a try to reduce the components workload.

An erroneous situation can mostly be solved by restarting the process.


Action:
1. Identify the responsible process:

a) Check the process situation with:
  root# top
  root# ps aux



b) If a process is suspicious check for multiple processes of the same name:
  root# ps -aef



c) If a process is suspicious check for zombie processes (lists the zombie process id):
  root# ps aux



d) Evaluate with the VoIP Switch Administrator if the suspicious process is in a regular or erroneous state.


2. Handle an erroneous Linux process state.

a)* Restart a Linux process:
  root# /etc/init.d/<PROCESS_NAME> restart



b) Kill a process, e.g. double started process, zombie:

  root# kill -9 <PROCESS_ID>



3. Handle a VoIP Switch component :

a) Restart an erroneous VoIP Switch component:
  root# <COMPONENT_NAME> restart



b) If the VoIP components ServiceCenter or MediaServer produces high load then the VoIP Switch Administrator may reduce their accepted work load via the ConfigCenter.


4. If the erroneous condition remains then contact the "VoIP Switch Supplier Support"!




→ Top

Maintenance Due to Supervised Files Missing or to Big

Indication "Xymon Event":
Monitor Log, Email or SMTP Trap may contain the following information:

Indication:
<HOST_NAME> ????



Description:


Consequences:

→ For the VoIP Switch telephony service:

  • None

→ For the operations:

  • None

→ For the user:

  • None


Solution:


Action:
1. If the erroneous condition remains then contact the "VoIP Switch Supplier Support"!




→ Top

VoIP System Maintenance


→ Top

Best Practice for Handling a "Fraud" Situation  

The Aarenet VoIP Switch Administrator finds here instructions for managing fraud problems.


1. Immediate action:

  • Block call routing to the destination (usually somewhere in the Caribbean, west or central Africa)
  • If only from one source IP address then block this IP address on the FW


2. Investigate if the fraud is due to "Direct Registrations" with correct SIP credentials on the VoIP Switch:

  • Check if the calling number has multiple SIP registrations of a suspicious source IP range or user agent!
→ If YES then:
→ The SIP credentials were not kept secret or hacked from the users CPE
Action:
  • Block this user account for outgoing calls (blocking international calls is usually sufficient)
  • Change the SIP credential in the user account and the user's CPE.
  • Change the CPE administration login credentials


3. Investigate if the fraud is due to "Hacked Users CPE":

a) Analyze the traces of some fraud connections.
Check if the source IP remain the one of a registered user CPE!
→ If YES then:
→ If yes block this user account for outgoing calls
Action:
  • Block this user account for outgoing calls (blocking international calls is usually sufficient)
  • Inform the user about the fraud and its reason
  • Change the SIP credential in the user account and the user's CPE.
  • Change the CPE administration login credentials


4. Post Work:

  • Undo the "immediate action"
  • Enable the customer account when the SIP credentials and CPE administration login credentials are changed




→ Top



© Aarenet Inc 2018

Version: 3.0     Author:  Aarenet     Date: December 2015