Difference between revisions of "Support book"

From help.aarenet.com
Jump to: navigation, search
m (Anadm moved page Support book level-1-2-3 to Support book without leaving a redirect)
(No difference)

Revision as of 14:03, 5 September 2017


Note The features and/or parameters listed in this article may not be available from your telephone service provider.



Home Support

 

 

Download PDF

 



IMG


Introduction

The Aarenet VoIP System supporting personnel find here links to detailed information about:

  • How to support telephony users and solving user problems
  • An introduction to the VoIP signaling protocols
  • The Aarenet VoIP Switch on board support tools
  • The Aarenet VoIP System monitoring and alarming
  • The maintenance and problem solving of the Aarenet VoIP Switch
  • The maintenance and problem solving of DELL server


Contents


Level 1 Support: Check and Solve Subscriber Basic Problems

User support



Level 2 Support: Subscriber Problems

Support user level-2



Level 3 Support: Introduction VoIP Protocols


→ Top

Knowhow Connection Signaling with "Session Initiation Protocol SIP"

The Session Initiation Protocol SIP is a communications protocol for signaling and controlling multimedia communication sessions. One of the most common applications of SIP is in Internet telephony for voice and video calls.

For an extended overview of the SIP protocol visit:

Wikipedia: Session Initiation Protocol SIP



→ Top

Basics: Session Session Protocol SIP

Example of a "SIP dialog" with the minimal needed messages for a connection setup or connection renegotiation:

SIP Connection Establishing



Example of a "SIP dialog" with the minimal needed messages for a connection release:

SIP Connection Release





→ Top

Examples: SIP Signaling Flows

Example of a regular outgoing call into the PSTN:

SIP Flow PSTN Outgoing



Example of a regular incoming call from the PSTN:

SIP Flow PSTN Incoming



Example of an outgoing call into the PSTN with three exceptional signaling situations:

  1. The PSTN Gateway 1 doesn't respond so the VoIP Switch has to re-route to the PSTN Gateway 2
  2. The telephone on side A offers an invalid "Session Time" value which is refused by the PSTN Gateway 2. The telephone on side A has to do a reINVITE with an acceptable "Session Time" value.
  3. End point B is busy.
SIP Flow GW Fail Over



Example of a connection where the VoIP Switch checks the presence of the end points with OPTION messages. The VoIP Switch would release the connection if one end point doesn't respond with "200 OK":

SIP Flow PSTN Outgoing Options





→ Top

SIP Response Codes

A list of SIP response codes and their meaning can be found here:

Wikipedia: List of SIP Response Codes




→ Top

Most Important 1xx—Provisional Responses

100 Trying
Extended search being performed may take a significant time so a forking proxy must send a 100 Trying response.

180 Ringing
Destination user agent received INVITE, and is alerting user of call.

183 Session in Progress
This response may be used to send extra information for a call which is still being set up.




→ Top

Most Important 2xx—Successful Responses

200 OK
Indicates the request was successful.




→ Top

Most Important 3xx—Redirection Responses

302 Moved Temporarily
The client should try at the address in the Contact field. If an Expires field is present, the client may cache the result for that period of time.




→ Top

Most Important 4xx—Client Failure Responses

400 Bad Request
The request could not be understood due to malformed syntax.

401 Unauthorized
The request requires user authentication. This response is issued by UASs and registrars.

403 Forbidden
The server understood the request, but is refusing to fulfil it.

404 Not Found
The server has definitive information that the user does not exist at the domain specified in the Request-URI. This status is also returned if the domain in the Request-URI does not match any of the domains handled by the recipient of the request.

406 Not Acceptable
The resource identified by the request is only capable of generating response entities that have content characteristics but not acceptable according to the Accept header field sent in the request.

408 Request Timeout
Couldn't find the user in time. The server could not produce a response within a suitable amount of time, for example, if it could not determine the location of the user in time. The client MAY repeat the request without modifications at any later time.

410 Gone
The user existed once, but is not available here any more.

480 Temporarily Unavailable
Callee currently unavailable.

486 Busy Here
Callee is busy.

487 Request Terminated
Request has terminated by bye or cancel.

488 Not Acceptable Here
Some aspect of the session description or the Request-URI is not acceptable.




→ Top

Most Important 5xx—Server Failure Responses

503 Service Unavailable
The server is undergoing maintenance or is temporarily overloaded and so cannot process the request. A "Retry-After" header field may specify when the client may reattempt its request.




→ Top

Most Important 6xx—Global Failure Responses

603 Decline
The destination does not wish to participate in the call, or cannot do so, and additionally the destination knows there are no alternative destinations (such as a voicemail server) willing to accept the call.




→ Top

Knowhow Media Stream Signaling with "Session Description Protocol SDP"

The Session Description Protocol SDP describes how during a connection setup the end points negotiate the parameters of this exchange as session announcement, session invitation, and parameter. SDP does not deliver media itself but is used between end points for negotiation of media type, format, and all associated properties for voice, Fax, DTMF, bit transparent data etc..

For an extended overview of the SDP protocol visit Wikipedia.


Note

The VoIP Switch doesn't interfere in the SDP negotiation of the end points! There may be exceptions for certain Customer Premises Equipment CPE devices where interoperation problems are known. Check with the VoIP switch administrator which CPE devices are known with SDP manipulations by the VoIP switch.





→ Top

Basics: Session Description Protocol SDP

The SDP is embedded in the SIP messages during connection setup or connection renegotiation:

SIP with SDP



The following SDP properties and parameters are important for supporting customer problems:

SDP Basic Properties and Parameters



Example of a SDP offer from the calling side A:

SDP Offer Calling Side A



Example of a SDP offer for a Fax transfer with T.38 from the calling side A:

SDP Offer Calling Side A with T.38



Interpretation of the "Media Attributes":

Index Type Attribute Remark
0 PCMU ISDN G.711µlaw Very good quality VoIP codec
8 PCMA ISDN G.711alaw Very good quality VoIP codec
2 G.726-32   Good quality VoIP codec
18 G.729   Low quality VoIP codec
125 x-clear-channel data service bit transparent Echo canceling will be switched off and the data bit by bit transferred
101 telephone-event DTMF, RFC 2833 DTMF will not be transferred inband but as RTP event according RFC 2833
18 annexb=0 Special information for codec with index 18 Special directive for codec G.729
101 0-16 Special information for for telephone-event with index 101 0-15 : DTMF character 0-9, *,#, A,B,C,D

0-16 : DTMF character 0-9, *,#, A,B,C,D, Flash





→ Top

Basics: RTP/RTCP

The Real Time Protocol RTP is used to transfer media data, e.g. speech in VoIP based telephony.

The Real Time Control Protocol RTPC transfers periodically statistical media data between the peers of a connection.

If RTP packets are lost, delayed or jitter then we speak of a Quality of Service QoS problem. For the support it is of interest to know if the number of transferred packets between the peers of a connection and if the numbers in the receive and send paths are reasonable equal, if packets were lost on call leg etc. With these statistical media information it can be possible to identify a path or transfer direction were QoS problems occure.


Note

The media stream must be proxied via the MediaServer of the VoIP Switch in order to compute statistical numbers of a connection.



The Aarenet VoIP Switch supports RTP/RTCP statistic data collection of a connection. How they can be obtained is described in article "Manual of the Aarenet VoIP Switch Support Tools", chapter "The ConfigCenter Call Data"


Overview of "RTP/RTCP" information collection:

Dialog: "RTP/RTCP"



Details of "RTP/RTCP" information collection:

Dialog: "RTP/RTCP Details"





Level 3 Support: VoIP System Support Tools


→ Top

VoIP Switch ConfigCenter Support Tools


→ Top

The ConfigCenter Support Log

The "Support Log" provides the supporter with information from the internal processes of the ServiceCenter:

  • Registration
  • Connection setup, release and exceptions
  • Call Routing
  • Used Ruleset
  • Emergency calls
  • etc


The "Support Log" provides filters for:

  • Time based selection: From – Until, From – Duration
  • Text filter
  • Registration events
  • Call events
  • etc.


The "Support Log" has a limited history. The history may last from a few hours up to some days. The length of the history may be different from VoIP switch to VoIP switch and depends on the length of log files and amount of logging events.


Note

The "Support Log" is tenant sensitive. This means a supporter of tenant A is not able to see events of tenant B!





→ Top

Navigate to the "Support Log"

ConfigCenter:

nav Menu "Support"
nav Menu "Support Log"




→ Top

Get a "Support Log"

Dialog: "Support Log":

Dialog: Support Log



When the dialog "Support Log" opens it contains by default in "From" the actual date/time (-5min) and in "Duration" a duration of 5min:

  1. Click the Button [ Download ]
  2. Via HTTP an ASCII formatted file with the last 5 minutes will be downloaded


Retrieving a "Support Log" in the past:

  1. Insert the in "From" the desired start date/time
  2. Insert in "Duration" the needed length
  3. Press on the PC keyboard the 'Enter' key : The "Until" date/time will be computed
  4. Click the Button [ Download ]

or

  1. Insert the in "From" the desired start date/time
  2. Insert the in "Until" the desired stop date/time
  3. Press on the PC keyboard the 'Enter' key: The "Duration" will be computed
  4. Click the Button [ Download ]


Best Practice

Get the events of a connection in the past:

  1. Search the Call ID of the connection in the "Call Data"
  2. Use the Call ID in the "Text" filter of the Support Log dialog
  3. Make sure that the connection date/time match "From"- "Until"
  4. Download the Support Log


Get the events of a just finished connection:

  1. Set the "Duration" to 5min (or shorter)
  2. Download the Support Log
  3. Search for the connection





→ Top

Interpretation of a "Support Log"

The interpretation of a "Support Log" is quite easy and straight forward. With a little experience one will be soon familiar with the interpretation.

Interpretation and example of a call setup and release:

Example Support Log





→ Top

ConfigCenter Trace

The "Trace" provides the supporter with information from the message traffic between the VoIP switch and external VoIP devices, such as PSTN gateway, SIP CPE, SIP or MGCP telephones.

The "Trace" contains:

  • Session Initiation Protocol SIP registration and connection signaling messages
  • Media Gateway Control Protocol MGCP audit and endpoint control messages
  • Session Description Protocol SDP streaming media initialization parameters


The "Trace" provides filters for:

  • Time based selection: From – Until, From – Duration
  • Text filter


The "Trace" has a limited history. The history may last from a few hours up to some days. The length of the history may be different from VoIP switch to VoIP switch and depends on the length of log files and amount of logging events.


The interpretation of a "Trace" (PCAP formatted file) has to be done in an external application like Wireshark network protocol analyzer. Wireshark offers deep and rich VoIP analysis .


Note

The "Trace" is not tenant sensitive. This means a supporter of tenant A is able to see signaling messages of tenant B!

Due to this open display of information it may be possible that the "Trace" is not available for the supporters and operators on a multi tenant VoIP Switch.





→ Top

Navigate to the "Trace"

ConfigCenter:

nav Menu "Support"
nav Menu "Trace"




→ Top

Get a "Trace"

Dialog: "Trace":

Dialog: Trace



When the dialog "Trace" opens it contains by default in "From" the actual date/time (-5min) and in "Duration" a duration of 5min:

  1. Click the Button [ Download ]
  2. Via HTTP an PCAP formatted file with the last 5 minutes will be downloaded


Retrieving a "Trace" in the past:

  1. Insert the in "From" the desired start date/time
  2. Insert in "Duration" the needed length
  3. Press on the PC keyboard the 'Enter' key: The "Until" date/time will be computed
  4. Click the Button [ Download ]

or

  1. Insert the in "From" the desired start date/time
  2. Insert the in "Until" the desired stop date/time
  3. Press on the PC keyboard the 'Enter' key: The "Duration" will be computed
  4. Click the Button [ Download ]


Best Practice

Get the events of a connection in the past:

  1. Search the connection in the "Call Data"
  2. Click the Button [ Trace ]


Get the events of a just finished connection:

  1. Set the "Duration" to 5min (or shorter)
  2. Download the Trace
  3. Search for the connection





→ Top

Interpretation of a "Trace"

The interpretation of a "Trace" needs experience!

For more information:


Example of a Wireshark call capture, SIP setup and release:

Example "Trace"



Example of a Wireshark call list:

Navigate in Wireshark:

nav Menu "Statistics"
nav Menu "VoIP Calls"


Wireshark dialog where all calls are listed of the actual trace:

Example Trace VoIP Call List



Example of a Wireshark call flow:


Navigate in Wireshark:

nav Menu "Statistics"
nav Menu "VoIP Calls"
nav Select the call of interest
nav Click Button [ Graph ]


Wireshark dialog where the message flow is shown of the selected call:

Example Trace VoIP Call Flow





→ Top

The ConfigCenter Call Data

The "Call Data" lists the CDR of all incoming or outgoing connections or connection attempts. Extended filters enable the supporter to search for specific calls. The filters can be combined with logical AND.

Filter CDRs according:

  • Call start and end date/time
  • Call duration
  • Call charges
  • Telephone number of caller and/or callee.
  • Tenants & account
  • Price list attributes "Destination Type" & "Destination"

The "Call Data" has a limited history. The length of the history may be different from VoIP switch to VoIP switch and depends on the CDR storage length in the date base.


Selected CDR details allow direct access to the information of:

  • SIP Trace:
The SIP message contents of this specific connection or call attempt is shown. For the interpretation of the trace consult the article "Brief Tutorial of the SIP Signaling and SDP Media Protocols", chapter "Knowhow SIP Signaling" .
  • RTP/RTCP Media:
The RTP/RTCP information and statistics of this specific connection or call attempt is shown. For the interpretation of the media information consult the article "Brief Tutorial of the SIP Signaling and SDP Media Protocols", chapter "Knowhow Media Stream" .


Note
  • The "Call Data" has a limited history. The length of the history may be different from VoIP switch to VoIP switch and depends on the CDR storage length in the date base.
  • Not all filter options may be available on the VoIP Switch.
  • The "Call Data" is tenant sensitive. This means a supporter/operator of tenant A is not able to see events of tenant B!



Warning

Depending an the settings of a VoIP system it may be possible to change values in CDR.

Changing a CDR's contents may be a legal violation in the country of operation of the VoIP Switch!





→ Top

Navigate to the "Call Data"

ConfigCenter:

nav Menu "Rating"
nav Menu "Call Data"




→ Top

Get the "Call Data"

Dialog: "Call Data":

Dialog: "Call Data"



By clicking on the line of a CDR a dialog pops up, which provides a) more details of the connection and b) one click access to the call's SIP trace and media RTP/RTCP information and statistics:

Dialog: "Call Data Details"






→ Top

The ConfigCenter Address Registration

The ConfigCenter "Address Registration" displays if a SIP device or MGCP MTA has registered the telephone number. The supporter finds the following information of the registering devices:

  • Type of registration, SIP, notifications, presence, etc
  • IP address
  • SIP user agent
  • Registration time left.

Registrations can be de-registrated on the VoIP Switch by force.

Hint:
The device cannot be informed that it was de-registerd on the VoIP Switch. That means you have to wait until it re-registers automatically or force the device manually to re-register.




→ Top

Navigate to "Registrations"

ConfigCenter:

nav Menu "Addresses"

or

nav Menu "Accounts"
nav Click on the line of the desired account
nav Click on the right arrow at "Addresses"


For details:

nav Click on the line of the desired address
nav Click on the right arrow at "Registration"




→ Top

Interpretation of "Registrations" Information

Display of "Addresses" and registration overview:

Dialog: "Addresses & Registration"



By clicking on the line of an address and then the right arrow at "Registration" a dialog pops up, which provides informations of all registrations of the address:

Dialog: "Registration Details"





→ Top

The ConfigCenter Components

The "Components" displays the state and activity of the VoIP Switch components. The components are the entities of the VoIP Switch that provide all functionality and features. The display is automatically updated every few seconds and shows the actual state and load of every component.


Note

On most VoIP Switches the "Components" display is not available for the supporters and operators.





→ Top

Navigate to "Components"

ConfigCenter:

nav Menu "System"
nav Menu "Components"




→ Top

Interpretation of "Components" Information

Display of "Components":

Dialog: "Components"



By clicking on the line of a component a dialog pops up, which provides more informations or enables to send messages or handle the work load of the component:

Dialog: "Component Details"





→ Top

The ConfigCenter Channels

The ConfigCenter "Channels" is a live display of the current active connections and connection build-up. The administrator can filter an search the connections. If needed a connection can be forced to be released.


Note

On most VoIP Switches the "Channels" display is not available for the supporters and operators.





→ Top

Navigate to "Channels"

ConfigCenter:

nav Menu "Channels"




→ Top

Interpretation of "Channels" Information

Display of "Channels":

Dialog: "Channles"





→ Top

The ConfigCenter System Utilization

The "System Utilization" gives a statistical overview of the VoIP Switch resource utilization:

  • Number of accounts
  • Number of addresses (telephone numbers)
  • Number of registrations
  • etc


Note

On most VoIP Switches the "System Utilization" display is not available for the supporters and operators.





→ Top

Navigate to "System Utilization"

ConfigCenter:

nav Menu "System"
nav Menu "Utilization"




→ Top

Interpretation of the "System Utilization" Information

The "System Utilization" provides the numbers of used resources:

Dialog: "System Utilization"





Level 3 Support: VoIP System Monitoring & Alarming

Level 3 Support: VoIP System Maintenance


→ Top

VoIP Switch Component Handling


Warning

All described actions can jeopardize the VoIP Switch's telephony service or server functionality!

If there are uncertainties the contact the "VoIP Switch Supplier Support"





→ Top

Basic VoIP Switch Component Commands  

The VoIP Switch Administrator finds here instruction for VoIP Switch Component handling on OS console level:

  • Start the VoIP Switch Component
  • Stop the VoIP Switch Component
  • Check the VoIP Switch Component status
  • Restart the VoIP Switch Component
  • etc


The VoIP Switch Component command affects only the instance on this server and can be executed with root rights only!


Command syntax:

root# <AS_COMPONENT> <COMMAND_OPTION>



Example:

root# configcenter status



Warning

Do not use other VoIP Switch Component command options as they can produce heavy problems!



Command Command Option Remark
<AS_COMPONENT>

e.g.:

configcenter
  VoIP Switch Component command
  version Lists the VoIP Switch Component version
  status Lists the VoIP Switch Component status and process ID
  stop Stops the VoIP Switch Component

→ The VoIP Switch Component stops immediately and any activity of the component will be interrupted!

  start Starts the VoIP Switch Component

→ The VoIP Switch Component becomes immediately active and operative!

  startpassive Starts the VoIP Switch Component but it remains passive.

→ For becoming operative the VoIP Switch Component has to be started with the start option.
→ Not all VoIP Switch Components offer this option.

  restart Stops and starts the VoIP Switch Component

→ The VoIP Switch Component becomes immediately active and operative!

  restartpassive Stops and starts the VoIP Switch Component but it remains passive.

→ For becoming operative the VoIP Switch Component has to be started with the start option.
→ Not all VoIP Switch Component offer this option.

  error Opens the error log file of the VoIP Switch Component
  log Opens the actual log file of the VoIP Switch Component





→ Top

Put Out of / Back to Service a VoIP Switch Component in an Operative VoIP Switch

The VoIP Switch Administrator finds here instruction for putting out or back of a VoIP Switch Component.




→ Top

Put Out of Service a VoIP Switch Component

There are two ways to put out of service a VoIP Switch Component:


Variant 1: "Stop it hard"

Action:

A) Stop and check the component via the shell:

root# <AS_COMPONENT> stop
root# <AS_COMPONENT> status


The consequences are that the component stops immediately its operative work and all its running tasks.


The following VoIP Switch components may be stopped this way without jeopardizing the telephony service:

  • ConfigCenter
  • AdminCenter
  • DataAccessCenter
  • MediaCenter
  • RatingCenter
  • DataBase


Note

Make sure that:

  • The second component is active
  • The VoIP Switch administrators, operators and supporters are informed which ConfigCenter, AdminCenter are active
  • The users are able to use the active AdminCenter
  • The provider's CRM is able to use the active DataAccessCenter
  • The active RatingCenter is producing the CDR




Variant 2: "Stop it gracefully"

Action:

A) Stop gracefully the component via the ConfigCenter.

For the following components do flip the "active – passive" role:

  • HealthCenter
  • LoadBalancer
  • CallBalancer

do:

ConfigCenter GUI → Menu "System" → Menu "Components"
→ Click the active component HealthCheck
→ Click the fat right arrow at "Make component passive"
→ Confirm by clicking Button [ Yes ]


For the following components do a "pre-bar":

  • ServiceCenter
  • MediaServer
  • FaxServer
  • CallAgent

do:

ConfigCenter GUI → Menu "System" → Menu "Components"
→ Click the desired VoIP Switch component
→ Change the parameter "Acceptance" to 0


C) Wait until the component displays no activity anymore.

ConfigCenter GUI → Menu "System" → Menu "Components"


D) Stop and check the component via the shell:

root# <AS_COMPONENT> stop
root# <AS_COMPONENT> status





→ Top

Put Back to Service a VoIP Switch Component

There are two ways to put back to service a VoIP Switch Component:


Variant 1: "Start it"

Action:

A) Start and check the component via the shell:

root# <AS_COMPONENT> start
root# <AS_COMPONENT> status


The consequence is that the component starts immediately its operative work.


Variant 2: "Start it gracefully"

This variant may make sense when the following VoIP Switch components shall become active but not operative immediately:

  • ServiceCenter
  • MediaServer
  • FaxServer
  • CallAgent


Action:

A) Start "passive" the component via the ConfigCenter.

root# <AS_COMPONENT> startpassive
root# <AS_COMPONENT> status


B) Make the component operative at the appropriate time:

ConfigCenter GUI → Menu "System" → Menu "Components"
→ Click the desired VoIP Switch component
→ Change the parameter "Acceptance" to 100
The "Acceptance" may by any value >0 according. Choose according the load balancing scheme of the component.


C) Check if the component displays activity:

ConfigCenter GUI → Menu "System" → Menu "Components"




→ Top

Work Flow for Analyzing VoIP Switch Problems  

Note

Not every red alarm jeopardizes the telephony service as a whole but a bulk of yellow warnings may endanger it!



The VoIP Switch Administrator and other service personnel find here a work flow for analyzing VoIP Switch problem indications and find out the appropriate action.

The main task is to find out if:

  1. The situation jeopardizes the telephony service as a whole, e.g.:
    • IP network issues
    • Several VoIP Switch servers failed or off line
       
  2. The database replication is broken
    • IP network issues
    • Server with running database failed
    • Linux service MySQL failed
       
  3. The situation hampers the operation of configuration of customer accounts, addresses etc.
    • Management server failed or off line
    • VoIP Switch component ConfigCenter, AdminCenter DataAccessCenter, RatingCenter stopped working correctly


The VoIP Switch Administrator finds here the work flow for analyzing VoIP Switch problems:



Analysis:

1. Check if it is a single alarm or a bulk alarm situation.

a) Connect to the VoIP Switch monitor Xymon "Main View"
→ As a rule of thumb: It is a single error if only one issue is displayed.



2. Analyze and treat a single alarm situation:

a) Check the contents of the error message.
b) Compare the error description against the Indication "Xymon Event" ones in chapter "VoIP Switch Maintenance"
c) Check if the actual situation is equal or similar as described and the recommended actions suitable.
d) Execute the suitable actions.
→ If you are not sure contact the "VoIP Switch Supplier Support"



3. Analyze the bulk alarm situation:

a) Get a first overview of the situation by analyzing the Xymon Monitor :
Check in the MS-01 Xymon monitor the server, component and IP status:
Xymon GUI → Xymon "Main View"
  1. Which type of server are affected?
    • At least one LoadBalancer LB server must be active that the telephony service can work!
    • At least one ServiceCenter SC server must be active that the telephony service can work!
    • At least one server with the operative database must be active that the telephony service can work!
     
  2. Check the CPE registration statistic :
    • Do drop the CPE registrations?
     
  3. Check the call statistic:
    • Do drop the VoIP Switch number of calls?
      Xymon GUI → Management Server → Column "calls_sys"
    • Do drop the calls on one or more ServiceCenter?
      Xymon GUI → ServiceCenter Server → Column "calls_sc"
    • Do drop the calls on one or more gateways?
      Xymon GUI → Gateway → Column "calls_gw"
     
  4. Do the same check as above on MS-02 Xymon Monitor
     
  5. Does the comparison of the two Xymon Monitor point out that:
    • The same single component on the same server failed?
    • All components of one side failed?
    • The Xymon Monitor sees only the components on its side?
    • The telephony service is running at least on one side


b) Extend the overview by analyzing the ConfigCenter "System Component" Overview :
Check in the MS-01 ConfigCenter the status of the VoIP Switch components:
ConfigCenter GUI → Menu "System" → Menu "Components"
  1. Are actually calls running and new calls can be established?
     
  2. Make test calls:
    • To and from a telephone number in the PSTN
    • On-net test calls
    • Call a well known VoiceMail Box from on-net and from PSTN
     
  3. Is the number of running calls fast dropping and no new calls are established?
     
  4. Which type of VoIP Switch components are affected?
    • At least one LoadBalancer component must be active that the telephony service can work!
    • At least one ServiceCenter component must be active that the telephony service can work!
    • At least one operative database must be active that the telephony service can work!
    • Does this picture correspond to the results of the first overview in the Xymon Monitor ?
     
  5. Do the same check as above on MS-02 ConfigCenter
     
  6. Does the comparison of the two ConfigCenter point out that:
    • The same single component on the same server failed?
    • All components of one side failed?
    • The ConfigCenter sees only the components on its side?
    • The telephony service is running at least on one side



4) Treat bulk alarm situations:

a) Is there a VoIP Switch server hardware, RAID or hard-disk problem?
→ Indications:
Indication:
<HOST_NAME> "snmptrapd" "failure"
<HOST_NAME> "snmptrapd" "degraded"


→ Actions:
For DELL server see: "Treating Problems of Servers from DELL Inc ®"



b) Is the IP connectivity affected to or between VoIP Switch servers?


Note

If VoIP Switch servers are affected then a lot of additional alarming messages of missing VoIP Switch components will pop up!!
This can be one of the most annoying erroneous situations!



→ Indications:
Indication:
<HOST_NAME> conn "Host does not respond to ping" <IP_ADDRESS>
* Dropping CPE registrations !


→ Actions:
See: "Maintenance Due to IP Network Alarm"


c) → If you are not sure what to do then contact the "VoIP Switch Supplier Support"




→ Top

VoIP Switch Server Maintenance


→ Top

Maintenance Due to VoIP Switch Components General Alarms  


→ Top

Maintenance Due to Messages from Java Framework

Indication "Xymon Event":
Monitor Log, Email or SMTP Trap may contain the following information:

Indication:
<HOST_NAME> msgs "Jdbc"



Description:
Java internal exceptions. Mostly due to database accesses which are hopefully handled by the application.


Consequences:
→ For the VoIP Switch telephony service:

  • Mostly none

→ For the operations:

  • Mostly none

→ For the user:

  • Mostly none


Solution:
Observe the frequency of this event


Action:
1. Observe the frequency of this event

2. If the erroneous condition is to frequent then contact the "VoIP Switch Supplier Support"!




→ Top

Maintenance Due to Messages from VoIP Switch Components Internals

Indication "Xymon Event":
Monitor Log, Email or SMTP Trap may contain the following information:

Indication:
<HOST_NAME> msgs "EventQueue"
<HOST_NAME> msgs "SysCompDatabase - Cannot evalute status"



Description:
These events may happen on all VoIP Switch servers and are VoIP Switch component internal notes.


Consequences:
→ For the VoIP Switch telephony service:

  • Mostly none

→ For the operations:

  • Mostly none

→ For the user:

  • Mostly none


Solution:
Observe the frequency of this event


Action:
1. Observe the frequency of this event

2. If the erroneous condition is to frequent then contact the "VoIP Switch Supplier Support"!




→ Top

Maintenance Due to Messages from LoadBalancer Server


→ Top

Maintenance Due to HealthCheck Message

Indication "Xymon Event":
Monitor Log, Email or SMTP Trap may contain the following information:

Indication:
<HOST_NAME> msgs "HealthCheck"



Description:
The HealthCheck supervises the status of virtual IP addresses and their associated physical IP addresses. If the HealthCheck on one server doesn't see the peer physical IP address it takes over the virtual IP address. It most probably points out an IP network problem in the "Public Voice Segment"


Consequences:

Warning

This erroneous condition must be checked within reasonable time!


→ For the VoIP Switch telephony service:

  • None if concurrently no other IP network problems arise

→ For the operations:

  • None

→ For the user:

  • None


Solution:
Solve the IP network if needed.

Check status the VoIP Switch component with an active-passive scheme:

  • LoadBalancer
  • CallBalancer
  • RatingCenter


Action:
1. Check if the IP network is OK


2. Check the status of the LoadBalancer components

→ Confirm if the active LoadBalancer swapped, e.g. from *-lb-01 to *-lb-02


3. Check the status of the CallBalancer components

→ Confirm if the active CallBalancer swapped, e.g. from *-lb-01 to *-lb-02


4. Check the status of the RatingCenter components

→ Confirm if the active CallBalancer swapped, e.g. from *-ms-01 to *-ms-02
→ Confirm if the active RatingCenter is processing the CDR's


5. Treat the problem:

a) If there are IP network problems
→ Actions see: "Maintenance Due to IP Network Alarm"


b) If there is a LoadBalancer problem try to restart the component:
  root# loadbalancer restart


c) If there is a CallBalancer problem try to restart the component:
  root# callbalancer restart


d) If there is a RatingCenter problem try to restart the component:
  root# ratingcenter restart


e) If the RatingCenter swapped make sure that the CDR are processed:
  1. ConfigCenter GUI → Menu "System" → Menu "Components"
    → Click line at "active" RatingCenter -> In dialog select "Process CDRs"
    → Click button [ Close ]
  2. The CDR CSV-Files are processed:
  root# cd /home/servicecenter/cdrs


Check if the CSV files have an actual time stamp which indicates that new CDRs where written:
  root# ls -ltra


Open a CSV file and check for new entries, e.g.:
  root# less monthly.csv



6. If the erroneous condition remains then contact the "VoIP Switch Supplier Support"!

If those events are logged subsequently then rapport it to the "VoIP Switch Supplier Support"!




→ Top

Maintenance Due to LoadBalancer Message

Indication "Xymon Event":
Monitor Log, Email or SMTP Trap may contain the following information:

Indication:
<HOST_NAME> msgs "Balancer"



Description:
LoadBalancer internal problem that is treated internally by the component. The LoadBalancer has an "active-passive" redundancy scheme.


Consequences:
→ For the VoIP Switch telephony service:

  • None

→ For the operations:

  • None

→ For the user:

  • None


Solution:
Not defined yet


Action:
1. If the erroneous condition remains then contact the "VoIP Switch Supplier Support"!

If those events are logged subsequently then rapport it to the "VoIP Switch Supplier Support"!




→ Top

Maintenance Due to LoadBalancer Message "Missing ServiceCenter"

Indication "Xymon Event":
Monitor Log, Email or SMTP Trap may contain the following information:

Indication:
<HOST_NAME> msgs "BalancerSwitch" <SERVICECENTER> "not available anymore"



Description:
The LoadBalancer indicates that it doesn't see a certain ServiceCenter.

This happens when:

  • the ServiceCenter has restarted
→ the event will be transient
  • the ServiceCenter is stopped
→ the event will remain until the ServiceCenter is started again
  • no IP connectivity
→ the event will remain until the IP connectivity is reestablished


Consequences:

Warning This erroneous condition must be handled within reasonable time!


→ For the VoIP Switch telephony service:

  • None, the other ServiceCenter take over the work load
  • If a ServiceCenter is missing then the VoIP Switch looses redundancy capability

→ For the operations:

  • None

→ For the user:

  • None


Solution:
Solve the IP network problems if needed:

→ Actions see: "Maintenance Due to IP Network Alarm"

Solve the server problem if needed

→ Actions see: "Treating Server Hardware Problems"


Action:
1. Check if the IP network is OK


2. Check the status of the ServiceCenter components

→ Confirm that the reported ServiceCenter server is affected


3. Check the reported ServiceCenter server with the "Server Administrator (OMSA)"


4. Treat the problem:

a) If there are IP network problems
→ Actions see: "Maintenance Due to IP Network Alarm"


b) If there is a ServiceCenter problem try to restart the component:
  root# servicecenter restart



5. If the erroneous condition remains then contact the "VoIP Switch Supplier Support"!




→ Top

Maintenance Due to CallBalancer Message

Indication "Xymon Event":
Monitor Log, Email or SMTP Trap may contain the following information:

Indication:
<HOST_NAME> msgs



Description:
The CallAgent dispatches MGCP messages to the CallAgent components.

The CallAgent has an "active-passive" redundancy scheme.


Consequences:

Warning

This erroneous condition must be checked within short time!


→ For the VoIP Switch telephony service:

  • None

→ For the operations:

  • None

→ For the user:

  • Users with MGCP MTA as telephone adapter may not be able to telephone


Solution:
Check status the CallBalancer active-passive scheme and if the MGCP messages are processed.


Action:
1. Check if the IP network is OK


2. Check the status of the CallBalancer components:

a) Confirm if the active CallBalancer swapped , e.g. from *-ms-01 to *-ms-02


b) Confirm if the active CallBalancer is processing the MGCP messages
→ Check if the CallAgent treat MGCP connections and that the total number of MGCP connections is not dropping.


3. Check if the MGCP audits are not dropping:

a) Connect to a Xymon monitor and check in Xymon Column "regs" the numbers of MGCP-Active and MGCP-Brocken


b) Check the questions:
  • Do drop the number of MGCP-Active?
→ If yes => There may be a IP backbone problem or CallBalancer, CallAgent outage!


4. Treat the problem:

a) If there are IP network problems
→ Actions see: "Maintenance Due to IP Network Alarm"
b) If there is a CallBalancer problem try to restart the component:
  root# callbalancer restart



5. If the erroneous condition remains then contact the "VoIP Switch Supplier Support"!




→ Top

Maintenance Due to MediaServer Message

Indication "Xymon Event":
Monitor Log, Email or SMTP Trap may contain the following information:

Indication:
<HOST_NAME> msgs "MediaConnection (06) Cannot handle outgoing message"
<HOST_NAME> msgs "MediaServerProvider (MS) refreshing mediaserver mc1ms2 failed"



Description:
The MediaServer records or plays back announcements and VoiceMail messages. Occasionally it may not correctly record a message and transfer it to the MediaCenter or play back an announcement or message.

The MediaServer can act as media proxy for active connections and transcode media streams.


Consequences:

Warning

If in this VoIP Switch the MediaServer acts as media proxy then the erroneous situation must be checked soon!


→ For the VoIP Switch telephony service:

  • None

→ For the operations:

  • None

→ For the user:

  • A VoiceMail Box message or announcement couldn't correctly record or played back.
  • User may not hear the other side or vica versa.


Solution:
Depends on the situation.


Action:
1. If the erroneous condition remains or happens to often then contact the "VoIP Switch Supplier Support"!




→ Top

Maintenance Due to Messages from Management Server  


→ Top

Maintenance Due to AdminCenter Message "Missing FMC Application Server"

Indication "Xymon Event":
Monitor Log, Email or SMTP Trap may contain the following information:

Indication:
<HOST_NAME> msgs "FmcRequest - Cannot post request"
<HOST_NAME> msgs "FmcProvider - could not provision pbx"



Description:
The AdminCenter tried to configure the FMC application.

Consequences:

Warning

This erroneous condition is sporadic or must be handled within reasonable time!


→ For the VoIP Switch telephony service:

  • None

→ For the operations:

  • A configuration on a FMC server failed

→ For the user:

  • A user "an MC-Phone" is not working


Solution:
Check the state of the FMC servers and their IP connectivity toward the VoIP Switch servers.


Action:
1. Check if the IP network is OK


2. Check the status of the FMC server


3. Treat the problem:

a) If there are IP network problems
→ Actions see: "Maintenance Due to IP Network Alarm"


b) If there is a FMC server problem
→ Contact the "VoIP Switch Supplier Support"!


4. If the erroneous condition remains then contact the "VoIP Switch Supplier Support"!




→ Top

Maintenance Due to AdminCenter Message "Missing Redirection Server"

Indication "Xymon Event":
Monitor Log, Email or SMTP Trap may contain the following information:

Indication:
<HOST_NAME> msgs "FmcProvider - could not provision user" <USER_TELEPHONE_NUMBER>



Description:
The mobile app "an MC-Phone" couldn't get the information from the associated redirection server (by default a Comdasys server located in Europe) where its responsible configuration server is located. Therefore the users "an MC-Phone" couldn't obtain its configuration and will not work.


Consequences:

→ For the VoIP Switch telephony service:

  • None

→ For the operations:

  • None

→ For the user:

  • The mobile app "an MC-Phone" will not work


Solution:
Make sure to have good IP connectivity to the Internet


Action:
1. The user must find a reliable Internet connection and restart the app "an MC-Phone" until it gets its configuration




→ Top

Maintenance Due to ConfigCenter Message "Wrong User Login"

Indication "Xymon Event":
Monitor Log, Email or SMTP Trap may contain the following information:

Indication:
<HOST_NAME> msgs "msgsAccessLogger - ADMIN:login; user" <USERNAME> "-> User Blocked"



Description:
A VoIP Switch Administrator, Operator, Supporter tried to login to the ConfigCenter with wrong credentials. The user will be blocked for several minutes.


Consequences:

→ For the VoIP Switch telephony service:

  • None

→ For the operations:

  • The user will be blocked from the ConfigCenter for several minutes.

→ For the user:

  • None


Solution:
Wait


Action:
1. Retry after a few minutes with the correct login credentials.


2. If the erroneous condition remains then contact the "VoIP Switch Supplier Support"!




→ Top

Maintenance Due to ConfigCenter Message "DB Replication Check"

Indication "Xymon Event":
Monitor Log, Email or SMTP Trap may contain the following information:

Indication:
<HOST_NAME> msgs JdbcReplicationMonitor "Replication" '<BROKEN_REPLICATION_DIRECTION>' "is broken!"



Description:
The database replication check was not successful. This can happen from time to time when the database has to process heavy load.

In most cases the database replication recovers automatically even after several hours of failed replication. If it is not recovering then this is a severe problem and must be treated.


Consequences:

Warning If this erroneous condition remains then this is a SEVERE erroneous condition and must be treated within short time!


→ For the VoIP Switch telephony service:

  • The database redundancy is endangered

→ For the operations:

  • None

→ For the user:

  • None


Solution:
Restore the MySQl DB replication if the erroneous condition remains.


Action:
1. Check periodically (ca. every half hour) the Xymon monitor for this error condition.

2. If the erroneous condition remains then contact the "VoIP Switch Supplier Support"!




→ Top

Maintenance Due to DataAccessCenter Message

Indication "Xymon Event":
Monitor Log, Email or SMTP Trap may contain the following information:

Indication:
<HOST_NAME> msgs "Jdbc" "SQL-Exception during statement"



Description:
A configuration via the DataAccessCenter may have failed.

This may happen if the database is under heavy load.


Consequences:

Warning This erroneous condition must be checked within reasonable time!


→ For the VoIP Switch telephony service:

  • None

→ For the operations:

  • A customer configuration may have failed (which is hopefully covered by the CRM application).

→ For the user:

  • None


Solution:
Inter-working between the DataAccessCenter and database must be optimized.


Action:
1. If this Java event is logged subsequently then rapport it to the "VoIP Switch Supplier Support"!




→ Top

Maintenance Due to RatingCenter Message

Indication "Xymon Event":
Monitor Log, Email or SMTP Trap may contain the following information:

Indication:
<HOST_NAME> msgs



Description:
The RatingCenter has an "active-passive" scheme. Every RatingCenter event has to be checked if the active RatingCenter is working correctly and is processing the CDRs.


Consequences:

Warning

This erroneous condition must be checked within short time!



→ For the VoIP Switch telephony service:

  • None

→ For the operations:

  • A CDR may be not written correctly into the CDR database and/or CSV files.
  • The customer billing contains not all CDR


→ For the user:

  • None


Solution:
Check status the RatingCenter active-passive scheme and if the CDR are processed.


Action:
1. Check the status of the RatingCenter component

→ Confirm if the active RatingCenter is processing the CDR's


2. Treat the problem:

a) If the RatingCenter swapped make sure that the CDR are processed:
Open the ConfigCenter Menu "Components"
→ Click line at "active" RatingCenter -> In dialog select "Process CDRs"
→ Click button [ Close ]


b) Check if the CDR CSV-Files are processed:
Open the CDR directory:
  root# cd /home/ratingcenter/cdrs


Check if the CSV files have an actual time stamp which indicates that new CDRs where written:
  root# ls -ltra


Open a CSV file and check for new entries, e.g.:
  root# less monthly.csv


3. If the erroneous condition remains then contact the "VoIP Switch Supplier Support"!




→ Top

Maintenance Due to Messages from ServiceCenter Server  


→ Top

Maintenance Due to FaxServer Message

Indication "Xymon Event":
Monitor Log, Email or SMTP Trap may contain the following information:

Indication:
<HOST_NAME> msgs



Description:
Fax may not received correctly. The mailing of the PDF file may fail.


Consequences:

→ For the VoIP Switch telephony service:

  • None

→ For the operations:

  • None

→ For the user:

  • A received Fax may not be correctly received and transferred to the user. This situation is usually handled by the Fax device either automatically or manually.


Solution:
Restart the FaxServer component.


Action:
1. Check if no Fax at all are received.

→ Send test fax.


2. Restart the FaxServer:

  root# faxserver restart



3. If the FaxServer logs subsequently then rapport it to the "VoIP Switch Supplier Support"!




→ Top

Maintenance Due to MediaCenter Message

Indication "Xymon Event":
Monitor Log, Email or SMTP Trap may contain the following information:

Indication:
<HOST_NAME> msgs MediaCenterCall
<HOST_NAME> msgs MediaServer
<HOST_NAME> msgs "file not found"



Description:
The MediaCenter handles the WAV files from announcements and VoiceMail messages. Occasionally it may not correctly record a message, loose a message file. Also an order to the MediaServer may fail to replay a message or announcement.


Consequences:

→ For the VoIP Switch telephony service:

  • None

→ For the operations:

  • None

→ For the user:

  • A VoiceMail Box message or announcement couldn't correctly recorded or played back


Solution:
Clean up the VioceMail message date base.

Optimize the inter-working of MediaCenter and MediaServer


Action:
1. If those events are logged subsequently then rapport it to the "VoIP Switch Supplier Support"!




→ Top

Maintenance Due to ServiceCenter Message

Indication "Xymon Event":
Monitor Log, Email or SMTP Trap may contain the following information:

Indication:
<HOST_NAME> msgs



Description:
The ServiceCenter is the main component of the VoIP Switch. It computes the connections signaling and telephony features.

The ServiceCenter has an all active redundancy scheme. If one ServiceCenter fails the remaining ServiceCenter take over the work load.


Consequences:

Warning This erroneous condition must be checked and treated within short time!


→ For the VoIP Switch telephony service:

  • As long one ServiceCenter remains the VoIP Switch works!

→ For the operations:

  • None

→ For the user:

  • None


Solution:
Depends on the analyzed problem.


Action:
1. Check how acute the problem is:

a) Check if the IP network is OK


b) Check the status of the ServiceCenter component
  • Are enough ServiceCenter active that the work load can be treated?
→ If NO then there is a most SEVERE erroneous situation


c) Check in the ConfigCenter Menu "Components" if the active ServiceCenter is processing the connections:
  • Do drop the total number of connections?
→ If YES then there is a most SEVERE erroneous situation:
→ There may be a IP backbone problem!


d) Check in the Xymon Column "regs" the number of registered SIP-Devices:
  • Do drop the number of SIP-Devices?
→ If YES then there is a most SEVERE erroneous situation:
→ There may be a IP backbone problem!


e) Check the reported ServiceCenter server with the "Server Administrator (OMSA)"
  • Are problems signaled?


2. Treat the problem:

a) If there are IP network problems
→ Actions see: "Maintenance Due to IP Network Alarm"


b) If there is a ServiceCenter problem try to restart the component:
  root# servicecenter restart



c) If there is a hardware problem:
→ Actions see: "Treating Server Hardware Problems"


3. If the erroneous condition remains then contact the "VoIP Switch Supplier Support"!




→ Top

Maintenance Due to ServiceCenter Message "License Violation"

Indication "Xymon Event":
Monitor Log, Email or SMTP Trap may contain the following information:

Indication:
<HOST_NAME> msgs License "License Violation"
<HOST_NAME> msgs License "grace-period remaining:"



Description:
This ServiceCenter has a license problem and will work only for the remaining grace period.


Consequences:

Warning

This erroneous condition must be checked and treated within the remaining grace period!


→ For the VoIP Switch telephony service:

  • As long one ServiceCenter remains the VOIP Switch works
  • The telephony service will be stopped on this ServiceCenter after passing of the grace period

→ For the operations:

  • None

→ For the user:

  • None


Solution:
Get actual licenses from the VoIP Switch Supplier.


Action:
1. Check in the ConfigCenter Menu "Components" which ServiceCenter component has a license problem and how long the grace period is.


2. Contact the "VoIP Switch Supplier Support"!




→ Top

Maintenance Due to ServiceCenter Message "Failed Emergency Call"

Indication "Xymon Event":
Monitor Log, Email or SMTP Trap may contain the following information:

Indication:
<HOST_NAME> msgs ServicePrioCallControl "Could not establish priority-call". Call from Connection/<SIP_CALL_ID>/<CALLING_NUMBER> to <CALLED_EMERGENCY_NUMBER>



Description:
A user's emergency call failed!


Consequences:

Warning Severe legal condition that must be handled!

This case can have legal consequences for the provider!


→ For the VoIP Switch telephony service:

  • None

→ For the operations:

  • None

→ For the user:

  • The emergency call did not work


Solution:
Check if the call routing failed due to a VoIP Switch emergency call treating or routing. If yes fix them.

Check if the PSTN provider did reject the emergency call. If yes contact the PSTN provider.


Action:
1. Archive traces for legal responsibilities:

  • Save the trace of this emergency call and all subsequent calls from this user toward emergency numbers


2. Check where the call was rejected.

  • If the call was rejected at the PSTN provider side contact the PSTN provider and let investigate into this case.


3. Check the VoIP Switch's emergency routing:

  • Emergency numbers
  • Emergency number rewriter
  • Routing Tables toward the PSTN
  • RuleSet that may tag outgoing calls toward emergency numbers


4. Check if any IP network devices may interfere with the SIP signaling:

  • If there are external Session Board Controller SBC or SIP-SS7 Gateway involved check their behavior concerning the emergency calls
  • If a firewall FW is involved check that no SIP ALG or "SIP Helpers" are active


5. Treat the problem:

a) Adjust the emergency routing of the VoIP Switch if needed


b) Fix the IP network devices if needed


6. If the erroneous condition remains then contact the "VoIP Switch Supplier Support"!




→ Top

Maintenance Due to ServiceCenter Message "TopStop"

Indication "Xymon Event":
Monitor Log, Email or SMTP Trap may contain the following information:

Indication:
<HOST_NAME> msgs ServiceRatingControl (01) <CALLING_NUMBER> "max available charges reached for account:"
<HOST_NAME> msgs AlarmLogger "[TOPSTOP][ALARM] tenant" <TENANT> "topstop limit nearly reached for account"



Description:
A user's TopStop limit was reached!


Note

A TopStop alarm early in the month or for a lot of users indicates a possible fraud case!



Consequences:

→ For the VoIP Switch telephony service:

  • None

→ For the operations:

  • A TopStop alarm early in the month indicates a possible fraud case

→ For the user:

  • No outgoing calls except emergency call will work when the TopStop limit is reached


Solution:
If it is a regular TopStop then contact the user and enhance the monthly TopStop limit.

If it is a fraud situation handle according "Best Practice: Fraud"


Action:
1. Check if it is a regular TopStop situation.


2. Check if it is a possible fraud case:

  • Reached TopStop limit early in the month?
  • Concurrently a lot of TopStop limits reached?
  • High call peak during the night or weekend?
→ Check at Xymon Column " calls_sys " .


3. Treat according " Best Practice for "Fraud Situation"


4. If the erroneous condition remains then contact the "VoIP Switch Supplier Support"!




→ Top

Maintenance Due to Nimbus Message

Indication "Xymon Event":
Monitor Log, Email or SMTP Trap may contain the following information:

Indication:
<HOST_NAME> msgs "NimbusLink (ue) Cannot subscribe"



Description:
The Nimbus component is a VoIP Switch internal bus that connects the various VoIP Switch components on the servers. If a Nimbus endpoint on one server is missing the other Nimbus endpoints start to complain.

If a Nimbus endpoint is missing then the component may be stopped, the server not on line or an IP network problem.

→ This error is often displayed during VoIP Switch software upgrades of the servers. In this situation just wait until the upgrade is finished.


Consequences:

Warning

This erroneous condition must be checked and treated within reasonable time!


→ For the VoIP Switch telephony service:

  • Usually none

→ For the operations:

  • None

→ For the user:

  • None


Solution:
Solve the IP network problems or server problems if needed.


Action:
1. Check if the IP network is OK


2. Check the status of the VoIP Switch components located on the server where the Nimbus is missing:

→ Is only Nimbus missing or other components to on this server?


3. Treat the problem:

a) If there are IP network problems
→ Actions see: "Maintenance Due to IP Network Alarm"


b) If there is not a planned outage then try to solve the server problem


c) If there is not a planned outage then try to restart the Nimbus on this server:
  root# nimbus restart



4. If the erroneous condition remains then contact the "VoIP Switch Supplier Support"!




→ Top

Maintenance Due to Messages from CallAgent Server


→ Top

Maintenance Due to CallAgent Message

Indication "Xymon Event":
Monitor Log, Email or SMTP Trap may contain the following information:

Indication:
<HOST_NAME> msgs



Description:
The CallAgent treats the message exchange with the MGCP MTA. The CallAgent has an all active redundancy scheme. If one CallAgent fails the remaining CallAgent take over the work load.


Consequences:

Warning

This erroneous condition must be checked within short time!


→ For the VoIP Switch telephony service:

  • As long one CallAgent remains the VOIP Switch works

→ For the operations:

  • None

→ For the user:

  • Single MGCP MTA at the user's premises is not working correctly. The telephone service may not always work for this users.


Solution:
Depends on the analyzed problem.


Action:
1. Check if the IP network is OK


2. Check the status of the CallAgent components

→ Confirm that the reported CallAgent server is affected

3. Check the reported CallAgent server with the "Server Administrator (OMSA)"


4. Treat the problem:

a) If there are IP network problems
→ Actions see: "Maintenance Due to IP Network Alarm"


b) If there is a CallAgent problem try to restart the component:
  root# callagent restart



5. If the erroneous condition remains then contact the "VoIP Switch Supplier Support"!




→ Top

Maintenance Due to Messages from CPECenter Server


→ Top

Maintenance Due to CpeCenterMessage

Indication "Xymon Event":
Monitor Log, Email or SMTP Trap may contain the following information:

Indication:
<HOST_NAME> msgs
<HOST_NAME> msgs "DevAdmProvider (-1) duplicated devicetype:" <DEVICE_TYPE>



Description:
During the preparation of a device configuration file two device configuration templates were found. If a CPE loads a device configuration file which was produced under these conditions it may not work correctly.


Consequences:

→ For the VoIP Switch telephony service:

  • None

→ For the operations:

  • None

→ For the user:

  • The CPE may not work with the produced configuration file


Solution:
One device configuration template has to be deleted.


Action:
1. Contact the "VoIP Switch Supplier Support"!




→ Top

Maintenance Due to IP Network Alarms

Indication "Xymon Event":
Monitor Log, Email or SMTP Trap may contain the following information:

Indication:
<HOST_NAME> conn "Host does not respond to ping" <IP_ADDRESS>



Description:
This test performs a "ping" toward the IP address of the host. If the "ping" is not answered then there is a problem with the IP network, e.g.:

  • Pinged host defect or off line
  • Layer2 IP Switch defect or off line
  • Brocken IP backbone network


Consequences:

Warning

MOST SEVERE condition if several VoIP Switch server are affected for a longer duration (ca 15min)!


→ For the VoIP Switch telephony service:

  • The telephone service may be interrupted

→ For the operations:

  • The MySQL databases may loose their replication

→ For the user:

  • The telephone service may be interrupted for the users!


Solution:
Solve the IP network problems!

Check the IP network devices:

  • Pinged host
  • Layer 2 IP switches
  • IP Routes
  • Firewalls

Check the VoIP Switch server IP connectivity.


Action:
1. Evaluate the severity of the IP network outage:

a) Check if it is a occasional ping failure:
  • Only one host doesn't respond
  • Only 1 or 2 poll cycle fail
→ Type "Occasional Failure":
  • In this situation the erroneous situation may be neglected.


b) Check if it is only a single host:
  • One host doesn't respond anymore
→ Type "Host Failure":
  • Check the hardware condition and IP connectivity of this device
  • Check with the VoIP Switch Administrator in the ConfigCenter Menu "Components" how the VoIP Switch is affected


c) Check if more than one VoIP Switch server is affected:
  • More than one VoIP Switch server don't respond anymore
→ Type "VoIP Switch Failure":
1. Check with the VoIP Switch Administrator how the VoIP Switch is affected:
a) Connect to both (*-ms-01, *-ms-02) ConfigCenter Menu "Components" and check the component status
b) Check the questions:
  • Which VoIP Switch servers are not visible?
  • Are they the same on both ConfigCenter?
  • Does one ConfigCenter see only the servers on its side? E.g.:
Side A components complain that they doesn't see their peers on Side B?
Side B components complain that they doesn't see their peers on Side A?
→ If yes => There is a heavy IP backbone problem
c) Check in the ConfigCenter Menu Channles if new connections were established since the IP outage
→ If yes => Some users still can make phone calls


2. Check with the VoIP Switch Administrator how the users are affected:
a) Connect to both (*-ms-01, *-ms-02) Xymon Column "regs" and check the CPE and MTA registrations status.
b) Check the questions:
  • Check: Do drop the user's CPE registration?
→ If yes => There is a heavy IP backbone problem some users cannot use the telephony service anymore!


3. Treat the Type "VoIP Switch Failure":

a) VoIP Switch Administrator:
In this situation the erroneous situation may be neglected. Observe if the situation remains.


2. Treat the Type " Occasional Failure ":

a) VoIP Switch Administrator:
If possible pre-bar the VoIP Switch component on this server
b) Solve the IP or hardware issue with the failed host


3. Treat the Type "VoIP Switch Failure":

a) VoIP Switch Administrator:
Contact the "VoIP Switch Supplier Support"


4. If the erroneous condition remains then contact the "VoIP Switch Supplier Support"!




→ Top

Maintenance Due to Operating System Alarms

The VoIP Switch Administrator and/or server service personnel find here instructions for managing problems indicated by the operating system supervision.




→ Top

Maintenance Due to Supervised Processes Missing

Indication "Xymon Event":
Monitor Log, Email or SMTP Trap may contain the following information:

Indication:
<HOST_NAME> procs "Processes not OK" <MISSING_PROCESS>



Description:
One or more supervised process of a Linux service or VoIP Switch component is missing.


Consequences:

Warning

SEVERE erroneous condition that must be handled!


→ For the VoIP Switch telephony service:

  • Depends If a VoIP Switch component is missing then the VoIP Switch looses redundancy capability
  • If a Linux service is missing the VoIP Switch may be hampered or the server is not working correctly

→ For the operations:

  • Depends on the VoIP Switch components or Linux service

→ For the user:

  • Depends on the VoIP Switch components or Linux service


Solution:
Restart the VoIP Switch component or Linux service.


Action:
1. Check with the VoIP Switch Administrator if it is possible to restart the component or service without endangering the VoIP Switch telephony service.

→ If possible pre-bar the VoIP Switch component via the ConfigCenter!


2. Restart the VoIP Switch component or Linux service:

a) Restart the VoIP Switch component
  root# <COMPONENT> restart


  • Example:
  root# servicecenter restart



b) Restart the service:
  root# /etc/init.d/<SERVICE> restart


  • Example:
  root# monit restart



3. If the erroneous condition remains then contact the "VoIP Switch Supplier Support"!




→ Top

Maintenance Due to Supervised IP Ports

Indication "Xymon Event":
Monitor Log, Email or SMTP Trap may contain the following information:

Indication:
<HOST_NAME> ports "Ports not OK" <MISSING_PROCESS_PORTS>



Description:
One or more supervised IP port of a Linux service or VoIP Switch component is missing.


Consequences:

Warning

SEVERE erroneous condition that must be handled!


→ For the VoIP Switch telephony service:

  • Depends If a VoIP Switch component is missing then the VoIP Switch looses redundancy capability
  • If a Linux service is missing the VoIP Switch may be hampered or the server is not working correctly

→ For the operations:

  • Depends on the VoIP Switch components or Linux service

→ For the user:

  • Depends on the VoIP Switch components or Linux service


Solution:
Restart the VoIP Switch component or Linux service.


Action:
1. Check with the VoIP Switch Administrator if it is possible to restart the component or service without endangering the VoIP Switch telephony service.

→ If possible pre-bar the VoIP Switch component via the ConfigCenter!


2. Restart the VoIP Switch component or Linux service:

a) Restart the VoIP Switch component
  root# <COMPONENT> restart


  • Example:
  root# servicecenter restart



b) Restart the service:
  root# /etc/init.d/<SERVICE> restart


  • Example:
  root# monit restart



3. If the erroneous condition remains then contact the "VoIP Switch Supplier Support"!




→ Top

Maintenance Due to Supervised Hard-Disk Usage

Indication "Xymon Event":
Monitor Log, Email or SMTP Trap may contain the following information:

Indication:
<HOST_NAME> disk "File systems not OK"



Description:
A hard-disk or hard-disk partition is full. If a hard-disk is full then the Linux operating system behaves unpredictable and the server will most probably crash.


Consequences:

Warning SEVERE erroneous condition that must be handled!


→ For the VoIP Switch telephony service:

  • Depends on the VoIP Switch components running on the server

→ For the operations:

  • Depends on the VoIP Switch components running on the server

→ For the user:

  • Depends on the VoIP Switch components running on the server


Solution:
Identify big files or directories. Delete or archive files externally.


Action:
1. Check hard-disk usage:

  root# df -h



2. Find fat files:

  root# ls -lahS $(find / -type f -size +100k)



  • Example find file sizes >60MByte:
  root# ls -lahS $(find /opt/backup/ -type f -size +60000k)



  • Check for fat files in the following suspicious directories:
    /opt/backup/
  • Do not touch big files in:
    /var/lib/mysql/


3. Find big directories:

  root# du -hs



Example of a more specific search → find directory sizes >1GByte:
  root# du -hs /home/ratingcenter/* | grep G
  root# du -hs /home/*/* | grep G



  • Check the following suspicious directories:
    /opt/backup/
    /home/mediacenter/messages
    //home/ratingcenter/cdrs


4. Prior of deleting files or directories check with the VoIP Switch Administrator if they are not needed anymore!

→ If you are suspicious but not sure if it is wise to delete a certain file or directory then contact the "VoIP Switch Supplier Support"!




→ Top

Maintenance Due to Supervised Memory Usage

Indication "Xymon Event":
Monitor Log, Email or SMTP Trap may contain the following information:

Indication:
<HOST_NAME> memory "Memory low"



Description:
One or more processes consume a lot of memory space. If the memory becomes low the operating system Linux start to swap memory to and from hard-disk. This reduces the performance of the server.


Consequences:

Warning

This erroneous condition must be handled within reasonable time!


→ For the VoIP Switch telephony service:

  • Depends on the VoIP Switch components running on the server

→ For the operations:

  • None

→ For the user:

  • None


Solution:
Identify which process or consumes the memory. Restart the process in order to free memory. Stop and restart the swapping on the server.


Action:

1. If a LoadBalancer *-lb-* or ServiceCenter *-sc-* server is affected:

→ Contact the "VoIP Switch Supplier Support"!


2. Find which processes use the memory:

  • This is a difficult task!
  root# top



3. Stop and restart the swapping:

Preconditions:
  • Choose a day time where the server is not in high load.
  • If possible pre-bar the VoIP Switch components on this server via the ConfigCenter
  • Make sure that the redundant VoIP Switch component is running


a) Restart the responsible process:
  root# /etc/init.d/<PROCESS_NAME> restart



b) Stop the swapping:
  • Don't do this during high load!
  • It will take some time until accomplished!
  root# swapoff -a



c) Restart the swapping:
  root# swapon -a



d) Check if the swap is working regularly:
  root# swapon -s





→ Top

Maintenance Due to Supervised CPU Load

Indication "Xymon Event":
Monitor Log, Email or SMTP Trap may contain the following information:

Indication:
<HOST_NAME> cpu "Load is High"



Description:
One or more processes consume extensively CPU power. This may reduce the performance of the server.


Consequences:

Warning This erroneous condition must be handled within reasonable time!


→ For the VoIP Switch telephony service:

  • Reduced performance on the affected server and VoIP Switch component

→ For the operations:

  • None

→ For the user:

  • None


Solution:
The CPU consuming process has to be identified. If a process is identified it has to be checked if it is a regular or erroneous situation.

If it is a regular situation then it has to be investigated if the servers computing power is still sufficient for this VoIP Switch. If the server hosts a VoIP Switch component which offers an configurable load acceptance via the ConfigCenter then it is worth a try to reduce the components workload.

An erroneous situation can mostly be solved by restarting the process.


Action:
1. Identify the responsible process:

a) Check the process situation with:
  root# top
  root# ps aux



b) If a process is suspicious check for multiple processes of the same name:
  root# ps -aef



c) If a process is suspicious check for zombie processes (lists the zombie process id):
  root# ps aux



d) Evaluate with the VoIP Switch Administrator if the suspicious process is in a regular or erroneous state.


2. Handle an erroneous Linux process state.

a)* Restart a Linux process:
  root# /etc/init.d/<PROCESS_NAME> restart



b) Kill a process, e.g. double started process, zombie:

  root# kill -9 <PROCESS_ID>



3. Handle a VoIP Switch component :

a) Restart an erroneous VoIP Switch component:
  root# <COMPONENT_NAME> restart



b) If the VoIP components ServiceCenter or MediaServer produces high load then the VoIP Switch Administrator may reduce their accepted work load via the ConfigCenter.


4. If the erroneous condition remains then contact the "VoIP Switch Supplier Support"!




→ Top

Maintenance Due to Supervised Files Missing or to Big

Indication "Xymon Event":
Monitor Log, Email or SMTP Trap may contain the following information:

Indication:
<HOST_NAME> ????



Description:


Consequences:

→ For the VoIP Switch telephony service:

  • None

→ For the operations:

  • None

→ For the user:

  • None


Solution:


Action:
1. If the erroneous condition remains then contact the "VoIP Switch Supplier Support"!




→ Top

VoIP System Maintenance


→ Top

Best Practice for Handling a "Fraud" Situation  

The Aarenet VoIP Switch Administrator finds here instructions for managing fraud problems.


1. Immediate action:

  • Block call routing to the destination (usually somewhere in the Caribbean, west or central Africa)
  • If only from one source IP address then block this IP address on the FW


2. Investigate if the fraud is due to "Direct Registrations" with correct SIP credentials on the VoIP Switch:

  • Check if the calling number has multiple SIP registrations of a suspicious source IP range or user agent!
→ If YES then:
→ The SIP credentials were not kept secret or hacked from the users CPE
Action:
  • Block this user account for outgoing calls (blocking international calls is usually sufficient)
  • Change the SIP credential in the user account and the user's CPE.
  • Change the CPE administration login credentials


3. Investigate if the fraud is due to "Hacked Users CPE":

a) Analyze the traces of some fraud connections.
Check if the source IP remain the one of a registered user CPE!
→ If YES then:
→ If yes block this user account for outgoing calls
Action:
  • Block this user account for outgoing calls (blocking international calls is usually sufficient)
  • Inform the user about the fraud and its reason
  • Change the SIP credential in the user account and the user's CPE.
  • Change the CPE administration login credentials


4. Post Work:

  • Undo the "immediate action"
  • Enable the customer account when the SIP credentials and CPE administration login credentials are changed




→ Top



Level 3 Support: Treating Problems of Servers from DELL Inc ®


→ Top

Best Practice When a Hardware HW Problem is Indicated

It is assumed that from any source a hardware problem of a server is indicated, e.g.:

  • Monitor Log
  • Alerting email
  • SMTP trap
  • system engineer observation
  • etc


Best Practice
  1. Access the server's "OpenManage Server Administrator (OMSA)" GUI nav Show me how ...
     
  2. Check the server's hardware problem nav Show me how ...
     
  3. Prepare documentation for a ticket at the DELL support:
     
  4. Organize the hardware part replacement if needed nav Show me how ...
     
  5. Treat the hardware problem:





→ Top

Server Monitoring


→ Top

Manual Server Monitoring With DELL's "Server Administrator (OMSA)"

DELL OpenManage Server Administrator (OMSA) is a software agent that provides a comprehensive, one-to-one systems management solution in two ways: from an integrated, Web browser-based graphical user interface (GUI) and from a command line interface (CLI) through the operating system.


Note

In this chapter enough information is given for being dangerous!

If there are uncertainties contact the "DELL Support" or the "VoIP Switch Supplier Support".





→ Top

Access the "OpenManage Server Administrator (OMSA)"

Connect with any Web browser to the server's "OpenManage Server Administrator (OMSA)" GUI:

  1. Insert the following URI:
    https://<IP_ADDRESS>:1311
    Example:
    https://172.100.100.100:1311
  2. Insert the user "root" login credentials:
    • Username: root
    • Password: the server root password




→ Top

Check the Type of Server and Service Tags

Access the server's "OpenManage Server Administrator (OMSA)" GUI.

Check the server type:

  • In the OMSA home page menu bar at the top the server type is listed, e.g.: "PowerEdge620"
or
  • Menu "System" → Tab "Properties" → Tab "Summary"


Check the Service Tag:

  • Menu "System" → Tab "Properties" → Tab "Summary"
In frame "Main System Chassis" the Service Tag is displayed, e.g. : 47X....
In frame "Main System Chassis" the "Express Service Code" is displayed, e.g. : 9187....




→ Top

Check the Server's Hardware Status

Access the server's "OpenManage Server Administrator (OMSA)" GUI.

Check the Server's Hardware Status:

  • Menu "System" → Tab "Properties" → Tab "Health"
  • Click "Main System Chassis"
The status of all server hardware components is displayed and can be checked in detail.




→ Top

Check the Server's and RAID and Hard-Disk HD Status

Access the server's "OpenManage Server Administrator (OMSA)" GUI.

Check the RAID Controller Type:

  • Menu "System" → Tab "Properties" → Tab "Health"
  • Click "Storage"
In frame "RAID Controller(s)" the RAID controller type is displayed, e.g. : "PERC 6/i integrated"


Check the RAID Controller Status:

  • Menu "System" → Tab "Properties" → Tab "Health"
  • Click "Storage"
In frame "RAID Controller(s)" the name and status of the RAID is displayed: "Virtual Disk 0 RAID-1"




→ Top

Check the Hard-Disk HD Replication Status

Access the server's "OpenManage Server Administrator (OMSA)" GUI.

Check the Hard-Disk HD Status:
You have to dig in via the left navigation tree:

  • Menu "Storage" → Menu "PERC ..." → Menu "Connector ..." → Menu "Enclosure ..." → Menu "Physical Disks ..."
Check the disk state: Column "State"

States:

  • Online:
  • The disk is online and productive working in the RAID. The replication is working.
  • Ready:
  • The disk is ready for integration into a RAID. The replication is not active.
  • Rebuilding:
  • The disc is currently integrated into the RAID. The progress is displayed in %.


If there is an indication of a hard-disk replication problematic then check in chapter "Treating RAID and Hard-Disk Problems" about further maintenance actions.




→ Top

Get the Server's Log Data

Access the server's "OpenManage Server Administrator (OMSA)" GUI.

Get the OMSA log:

  • Menu "System" → Tab "Logs"
  • Save the "Embedded System Management (ESM) Log" on the server:
Click "Save AS" and follow the instructions
  • Copy the saved EMS Log file to the support directory of the case




→ Top

Server Monitoring by Xymon

The VoIP Switch default monitor Xymon is described in "VoIP Switch Monitoring"




→ Top

Indication of a Server Hardware Defect

Indication "Xymon Event":
Monitor Log, Email or SMTP Trap may contain the following information:

Indication:
<HOST_NAME> "snmptrapd" "failure"



Description:
The server indicates any hardware failure:

  • Failed power module
  • Failed main board
  • Failed RAID controller
  • Failed hard-disk
  • Any other hardware problem


Consequences:

Warning

It may be a SEVERE server condition that must be immediately investigated and treated!


→ For the VoIP Switch telephony service:

  • Depends on the VoIP Switch components running on the server

→ For the operations:

  • Depends on the VoIP Switch components running on the server

→ For the user:

  • Depends on the VoIP Switch components running on the server


Solution:
The server must be repaired or exchanged.


Action:

  1. Check the details on the server with the "Server Administrator (OMSA)"
  2. Organize DELL repair parts according the maintenance agreement with your "VoIP Switch Supplier"
  3. Repair the server:
    • Fix main board
    • Fix RAID controller
    • Fix or wear out batteries
    • Fix fan
    • Fix RAM modules
    or
    • Processing of hardware problems that can be done hot, e.g.:




→ Top

Indication of a Server Hard-Disk or RAID Controller Problem

Indication "Xymon Event":
Monitor Log, Email or SMTP Trap may contain the following information:

Indication:
<HOST_NAME> "snmptrapd" "degraded"



Description:
The server indicates a problem with the virtual disk:

  • Failed RAID controller
  • Failed hard-disk
  • Failed hard-disk replication


Consequences:

Warning

SEVERE server condition that must be immediately investigated and treated!


→ For the VoIP Switch telephony service:

  • Depends on the VoIP Switch components running on the server

→ For the operations:

  • Depends on the VoIP Switch components running on the server

→ For the user:

  • Depends on the VoIP Switch components running on the server


Solution:
The RAID controller must be repaired or a hard-disk exchanged.


Action:

  1. Check the details on the server with the "Server Administrator (OMSA)"
  2. Organize DELL repair parts according the maintenance agreement with your "VoIP Switch Supplier"
  3. Repair the server:
    or
    • Processing of hardware problems that can be done hot, e.g.:




→ Top

Procedure for Replacing Defect HW Parts with DELL

The procedure for exchanging defect hardware HW of DELL servers' is different from country to country and may also change from time to time.

The following basic procedure for HW exchange seems more or less stable:

  1. Detect the HW problem
  2. Make sure to have ready the DELL server details:
    • Server Type
    • Service-Tag number or the "ExpressService Code"
    • Check the guaranty time of the server
  3. Report DELL support
    • DELL will analyze the case and order more information if needed
  4. DELL will organize and send the exchange part
  5. The VoIP Switch Administrator has to organize the replacing of the part
    Usually this has to be done within 1 - 3 working days
  6. The VoIP Switch Administrator has to make ready the defect part for returning it to DELL
    • Do not dispose the defect part!
    Either the defect part will be picked up at the location or it has to be send back to DELL.




→ Top

Treating Server Hardware Problems

The VoIP Switch Administrator and/or server service personnel find here instructions for managing HW defects.




→ Top

Default Process for Fixing Hardware Problems

Indication:

  • Xymon Event either email and/or SNMP trap:
  • The provider's system monitoring indicates no access to the server
  • Server Administrator (OMSA): Displays the error condition
  • Server Display: The server front display is yellow and indicates the error condition
  • Server Console: The server doesn't respond to console input


Description:
Any hardware problem.
Most probably:

  • Defect main board
  • Defect RAID controller
  • Defect or wear out batteries
  • Defect fan
  • Defect power module


Note

The telephony service for the customers is not endangered as long only one server fails!
It becomes disastrous if the two LoadBalancer servers or all ServiceCenter servers are not working anymore.



Consequences:

Warning

It may be a SEVERE server condition that must be immediately investigated and treated!


→ For the VoIP Switch telephony service:

  • Depends on the VoIP Switch components running on the server
  • If a ServiceCenter server fails the capability of concurrent connection handling may decline.


→ For the operations:

  • Depends on the VoIP Switch components running on the server


→ For the user:

  • Depends on the VoIP Switch components running on the server


Solution:
The server must be repaired or exchanged.


Action:
Analyze the situation and organize spare parts:

  1. Check the details on the server with the "Server Administrator (OMSA)"
  2. Organize DELL repair parts according the maintenance agreement with your "VoIP Switch Supplier"


Treat the VoIP Switch operation if the defect stops the proper server functionality :

  1. Disable Xymon Alarming
  2. Stop provider alarming
  3. Graceful pre-bar the VoIP Switch component


Repair the server:
If the main board or RAID controller had to be replaced then follow these special instructions:


If the power-module or hard-disk have to be replaced, see:


Warning For the following actions the server casing has to be opened!


The effects of EMC must be considered and the appropriate precautions must be taken to prevent further hard ware damage.


  1. Shut down and power off the server if the part has to be replaced on the main board
  2. Repair the server → Follow the server manufacturer's instructions!


Put back the server to normal working state:

  1. Start the server (if needed):
    → This automatically starts the VoIP Switch components!
  2. Checks:
    1. Check the server status with "Server Administrator (OMSA)"
    2. Check in the ConfigCenter if all VoIP Switch components on the sever are ok:
      ConfigCenter GUI → Menu "System" → Menu "Components"
    3. Check if the Xymon monitor doesn't show any error


If the VoIP Switch doesn't get back to normal telephony service operation:

  1. Investigate what is wrong and solve it
  2. Contact the "VoIP Switch Supplier Support" for helping setting up the server and recovering the missing VoIP Switch functionality


Enable the alarming again:

  1. Enable Xymon Alarming
  2. Start provider alarming




→ Top

Fix Defect Main Board or RAID Controller

See section "Default Process for Fixing Hardware Problems" for the general description of the problem.


Actions:

Repair the server:

  1. Shut down and power off the server if the part has to be replaced on the main board
  2. Repair the server hardware → Follow the server manufacturer's instructions
  3. Connect a VGA monitor to the console port of the server


If the RAID controller was repaired then there will be still a RAID problem continue at "Default Process for Fixing RAID Problems", Case 2


If the main board was repaired continue here:

  1. Insert the original hard-disk 1 in bay 0 (do not insert the hard-disk 2 yet)


Put back the server to normal working state:

  1. Power on and start the server
    → This automatically starts the VoIP Switch components!
  2. Checks:
    1. Check the console output on the VGA monitor if any exceptions are displayed during the BIOS booting
      → If the booting stucks during virtual hard disk initialization (RAID controller) then check the replication issues .
    2. Check the server status with "Server Administrator (OMSA)"
    3. Check in the ConfigCenter if all VoIP Switch components on the sever are ok:
      ConfigCenter GUI → Menu "System" → Menu "Components"
    4. Check if the Xymon monitor doesn't show any error:
      → After a certain time all supervised objects should get green except the missing hard-disk 2


If the VoIP Switch doesn't get back to normal telephony service operation:

  1. Investigate what is wrong and solve it
  2. Contact the "VoIP Switch Supplier Support" for helping setting up the server and recovering the missing VoIP Switch functionality


When the server and the telephony service are working correctly again then:

  1. Insert the original hard-disk 2 in bay 1


Enable the alarming again:

  1. Enable Xymon Alarming
  2. Start provider alarming




→ Top

Fix Defect Power Module

Indication:

  • Xymon Event either email and/or SNMP trap:
  • Server Administrator (OMSA): Displays the error condition
  • Server Display: The server front display is yellow and indicates the error condition


Description:
Defect power module


Consequences:

Note

This erroneous condition must be checked and treated within reasonable time!


→ For the VoIP Switch telephony service:

  • No immediate consequences
  • The server is running just with one power module


→ For the operations:

  • No immediate consequences


→ For the user:

  • No immediate consequences


Solution:
The power module must be replaced


Actions:

Analyze the situation and organize spare parts:

  1. Check the details on the server with the "Server Administrator (OMSA)"
  2. Organize DELL repair parts according the maintenance agreement with your "VoIP Switch Supplier"


Treat the VoIP Switch operation if the defect stops the proper server functionality :

  1. Disable Xymon Alarming
  2. Stop provider alarming


Replace the power module:

  1. Remove the defect power module (hot plug out possible)
  2. Insert the new power module (hot plug in possible)
  3. Connect the power cord


Put back the server to normal working state:

  1. Checks:
    1. Check the server status with "Server Administrator (OMSA)"
    2. Check if the Xymon monitor doesn't show any error


If the server doesn't go back to normal operation:

  1. Investigate what is wrong and solve it
  2. Contact the "VoIP Switch Supplier Support" for helping recovering the server


Enable the alarming again:

  1. Enable Xymon Alarming
  2. Start provider alarming




→ Top

Treating RAID and Hard-Disk Problems

All servers of the VoIP Switch run a RAID type 1 which mirrors the contents of the two installed hard-disks. The "RAID controller" manages the replication between the two hard-disks.


Several conditions may interrupt the hard-disk replication and/or degrade the RAID virtual disk:

  • Main board defect
  • RAID controller defect
  • Hard-disk defect


The consequences are that the server is not running at all or only with one hard-disk. The good news is as long one hard-disk is running the server will work as expected.


Note

These types of defect have to be solved as fast as possible!





→ Top

Fix Defect Hard Disk

Indication:

  • Xymon Event either email and/or SNMP trap:
  • Server Administrator (OMSA): Displays the error condition
  • Server Display: The server front display is yellow and indicates the error condition


Description:
Defect hard-disk


Consequences:

Note

This erroneous condition must be checked and treated within reasonable time!


→ For the VoIP Switch telephony service:

  • No immediate consequences
  • The server is running just with one hard-disk


→ For the operations:

  • No immediate consequences


→ For the user:

  • No immediate consequences


Solution:
The hard-disk must be replaced


Actions:

Analyze the situation and organize spare parts:

  1. Check the details on the server with the "Server Administrator (OMSA)"
  2. Organize DELL repair parts according the maintenance agreement with your "VoIP Switch Supplier"


Treat the VoIP Switch operation if the defect stops the proper server functionality :

  1. Disable Xymon Alarming
  2. Stop provider alarming


Replace the hard-disk:

  1. Remove the defect hard-disk (hot plug out possible)
  2. Insert the new hard-disk (hot plug in possible):
    → If the hard-disk is brand-new the replication starts immediately
    → If the hard-disk was already used then the replication may not start automatically then check the instructions at " Default Process for Fixing RAID Problems", Case 1 .


Put back the server to normal working state:

  1. Checks:
    1. Check if the hard-disk replication is in progress
    2. Check the server status with "Server Administrator (OMSA)"
    3. Check if the Xymon monitor doesn't show any error


If the server doesn't go back to normal operation:

  1. Investigate what is wrong and solve it
  2. Contact the "VoIP Switch Supplier Support" for helping setting up the hard-disk replication


Enable the alarming again:

  1. Enable Xymon Alarming
  2. Start provider alarming



→ Top

Default Process for Fixing RAID Problems

Indication:

  • Xymon Event either email and/or SNMP trap:
  • The provider's system monitoring may indicate no access to the server
  • Server Administrator (OMSA): Displays the error condition
  • Server Display: The server front display is yellow and indicates the error condition
  • Server Console: The server may not respond to console input


Description:
Any hardware problem.
Most probably:

  • Defect RAID controller
  • Defect hard-disk


Consequences:

Warning

It may be a SEVERE server condition that must be immediately investigated and treated!


→ For the VoIP Switch telephony service:

  • Depends on the VoIP Switch components running on the server
  • If a ServiceCenter server fails the capability of concurrent connection handling may decline.


→ For the operations:

  • Depends on the VoIP Switch components running on the server


→ For the user:

  • Depends on the VoIP Switch components running on the server


Solution:
The server must be repaired or exchanged.


Action:

A) Analyze the degrade situation and organize spare parts:

  1. Check the details on the server with the "Server Administrator (OMSA)"
  2. Check the VoIP Switch documentation for the server type and used RAID controller
  3. Organize DELL repair parts according the maintenance agreement with your "VoIP Switch Supplier"


B) Treat the VoIP Switch operation if the defect stops the proper server functionality :

  1. Disable Xymon Alarming
  2. Stop provider alarming
  3. :support_switch#supportSwitchPreBar Graceful pre-bar the VoIP Switch component


C) Evaluate the repair case for DELL RAID controller type: PERC5 / PERC 6 / H310 Mini / H320 Mini / H330 Mini:


Case 1: "One Hard-Disk Defect"
Precondition:
  • Main board is ok
  • RAID controller is ok
  • 1 operative hard-disk is ok
  • Server is still operative within the VoIP Switch
  • The replacement hard-disk has the same form factor and size of bytes


To-Do:
  1. Remove the defect hard-disk (hot plug-out is no problem)
  2. Insert the new hard-disk (hot plug-in is no problem) either:
    • a brand-new hard-disk
    • an already used spare hard-disk
  3. Check the hard-disk replication status
→ If the replication did not start automatically then start the replication manually !


Case 2: "Main Board or RAID Controller Defect:
Precondition:
  • The main board RAID controller are repaired according description above
  • 2 operative hard-disks are ok
  • Server is shut down
  • Disconnect all Ethernet patch cables from the server GB ports.
  • Connect a VGA monitor and USB keyboard and mouse tot the console port of the server


To-Do:
  1. Insert the original hard-disk 1 in bay 0 (do not insert the hard-disk 2 yet)
  2. Power up the server
  3. Check the console output on the VGA monitor:
    During the BIOS startup the following message may be displayed:
    Foreign configuration(n) found on adapter.
    Press any key … or 'F' to import foreign configuration and continue.
  4. If requested press key F on the keyboard!
    Note:
    If you miss to press F then restart the BIOS booting by pressing the keys [Ctrl Alt Delete] else the server booting stops after the BIOS start up.
  5. Check the console output on the VGA monitor:
    A security question may be displayed which enables you to stop the procedure:
    All of the disk from your previous configuration are gone. If this is an unexpected message ...
  6. Do not press any key!
    Note:
    If no key is pressed then the RAID controller takes over the hard-disk as part of its new virtual disk.
     
    → Wait until the server has booted!
     
  7. Insert the original hard-disk 2 in bay 1
  8. Check the hard-disk replication status
    Note:
    It is very probable that the replication did not start automatically!
    Then:
    At Menu "Storage" a yellow warning triangle is displayed
    Upon click on "Storage" the status is displayed:
    Virtual Disk 0: degraded
→ If the replication did not start automatically then start the replication manually !


For all other cases:


C) Put back the server to normal working state:

  1. If needed connect all Ethernet patch cables to the correct server GB ports
  2. Checks:
    1. Check the server status with "Server Administrator (OMSA)"
    2. Check in the ConfigCenter if all VoIP Switch components on the sever are ok:
      ConfigCenter GUI → Menu "System" → Menu "Components"
    3. Check if the Xymon monitor doesn't show any error


D) If the VoIP Switch doesn't get back to normal telephony service operation:

  1. Investigate what is wrong and solve it
  2. Contact the "VoIP Switch Supplier Support" for helping setting up the server and recovering the missing VoIP Switch functionality


E) Enable the alarming again:

  1. Enable Xymon Alarming
  2. Start provider alarming




→ Top

Manually Restart the Hard-Disk Replication

In this situation the RAID's virtual disk is in state degraded (only one hard-disk is operative, but two are expected). The RAID controller will automatically grab a free "hot spare" hard-disk and associate it with its degraded virtual disk and start the replication.


Restart the hard-disk replication manually:

  1. Connect with any Web browser to the server's "Server Administrator (OMSA)" GUI:
    • Login as user "root"
     
  2. From the inserted 2nd hard-disk the foreign RAID configuration has to be deleted:
    → Menu "Storage" → Menu "PERC xxxxx"
    → Select at [ Available Task ]: "Clear Foreign Configuration"
    <tt>→ Click button [ Execute ]
    <tt>→ Confirm the security check click button [ Clear ]
     
  3. The inserted 2nd hard-disk has to be declared as "hot spare":
    <tt>→ Menu "Storage" → Menu "PERC xxxxx" → "Connector 0" → Menu "Enclosure (Backplane)" → Menu "Physical Disks"
    → Select at [ Available Task ]: "Assign Global Hot Spare"
    <tt>→ Click button [ Execute ]
     
  4. Check the virtual disk replication state:
    <tt>→ Column "State"


If the hard-disk replication is not starting then contact the appropriate DELL Support or the "VoIP Switch Supplier Support".





Level 3 Support: VoIP System Maintenance


→ Top

VoIP Switch Component Handling


Warning

All described actions can jeopardize the VoIP Switch's telephony service or server functionality!

If there are uncertainties the contact the "VoIP Switch Supplier Support"





→ Top

Basic VoIP Switch Component Commands  

The VoIP Switch Administrator finds here instruction for VoIP Switch Component handling on OS console level:

  • Start the VoIP Switch Component
  • Stop the VoIP Switch Component
  • Check the VoIP Switch Component status
  • Restart the VoIP Switch Component
  • etc


The VoIP Switch Component command affects only the instance on this server and can be executed with root rights only!


Command syntax:

root# <AS_COMPONENT> <COMMAND_OPTION>



Example:

root# configcenter status



Warning

Do not use other VoIP Switch Component command options as they can produce heavy problems!



Command Command Option Remark
<AS_COMPONENT>

e.g.:

configcenter
  VoIP Switch Component command
  version Lists the VoIP Switch Component version
  status Lists the VoIP Switch Component status and process ID
  stop Stops the VoIP Switch Component

→ The VoIP Switch Component stops immediately and any activity of the component will be interrupted!

  start Starts the VoIP Switch Component

→ The VoIP Switch Component becomes immediately active and operative!

  startpassive Starts the VoIP Switch Component but it remains passive.

→ For becoming operative the VoIP Switch Component has to be started with the start option.
→ Not all VoIP Switch Components offer this option.

  restart Stops and starts the VoIP Switch Component

→ The VoIP Switch Component becomes immediately active and operative!

  restartpassive Stops and starts the VoIP Switch Component but it remains passive.

→ For becoming operative the VoIP Switch Component has to be started with the start option.
→ Not all VoIP Switch Component offer this option.

  error Opens the error log file of the VoIP Switch Component
  log Opens the actual log file of the VoIP Switch Component





→ Top

Put Out of / Back to Service a VoIP Switch Component in an Operative VoIP Switch

The VoIP Switch Administrator finds here instruction for putting out or back of a VoIP Switch Component.




→ Top

Put Out of Service a VoIP Switch Component

There are two ways to put out of service a VoIP Switch Component:


Variant 1: "Stop it hard"

Action:

A) Stop and check the component via the shell:

root# <AS_COMPONENT> stop
root# <AS_COMPONENT> status


The consequences are that the component stops immediately its operative work and all its running tasks.


The following VoIP Switch components may be stopped this way without jeopardizing the telephony service:

  • ConfigCenter
  • AdminCenter
  • DataAccessCenter
  • MediaCenter
  • RatingCenter
  • DataBase


Note

Make sure that:

  • The second component is active
  • The VoIP Switch administrators, operators and supporters are informed which ConfigCenter, AdminCenter are active
  • The users are able to use the active AdminCenter
  • The provider's CRM is able to use the active DataAccessCenter
  • The active RatingCenter is producing the CDR




Variant 2: "Stop it gracefully"

Action:

A) Stop gracefully the component via the ConfigCenter.

For the following components do flip the "active – passive" role:

  • HealthCenter
  • LoadBalancer
  • CallBalancer

do:

ConfigCenter GUI → Menu "System" → Menu "Components"
→ Click the active component HealthCheck
→ Click the fat right arrow at "Make component passive"
→ Confirm by clicking Button [ Yes ]


For the following components do a "pre-bar":

  • ServiceCenter
  • MediaServer
  • FaxServer
  • CallAgent

do:

ConfigCenter GUI → Menu "System" → Menu "Components"
→ Click the desired VoIP Switch component
→ Change the parameter "Acceptance" to 0


C) Wait until the component displays no activity anymore.

ConfigCenter GUI → Menu "System" → Menu "Components"


D) Stop and check the component via the shell:

root# <AS_COMPONENT> stop
root# <AS_COMPONENT> status





→ Top

Put Back to Service a VoIP Switch Component

There are two ways to put back to service a VoIP Switch Component:


Variant 1: "Start it"

Action:

A) Start and check the component via the shell:

root# <AS_COMPONENT> start
root# <AS_COMPONENT> status


The consequence is that the component starts immediately its operative work.


Variant 2: "Start it gracefully"

This variant may make sense when the following VoIP Switch components shall become active but not operative immediately:

  • ServiceCenter
  • MediaServer
  • FaxServer
  • CallAgent


Action:

A) Start "passive" the component via the ConfigCenter.

root# <AS_COMPONENT> startpassive
root# <AS_COMPONENT> status


B) Make the component operative at the appropriate time:

ConfigCenter GUI → Menu "System" → Menu "Components"
→ Click the desired VoIP Switch component
→ Change the parameter "Acceptance" to 100
The "Acceptance" may by any value >0 according. Choose according the load balancing scheme of the component.


C) Check if the component displays activity:

ConfigCenter GUI → Menu "System" → Menu "Components"




→ Top

Work Flow for Analyzing VoIP Switch Problems  

Note

Not every red alarm jeopardizes the telephony service as a whole but a bulk of yellow warnings may endanger it!



The VoIP Switch Administrator and other service personnel find here a work flow for analyzing VoIP Switch problem indications and find out the appropriate action.

The main task is to find out if:

  1. The situation jeopardizes the telephony service as a whole, e.g.:
    • IP network issues
    • Several VoIP Switch servers failed or off line
       
  2. The database replication is broken
    • IP network issues
    • Server with running database failed
    • Linux service MySQL failed
       
  3. The situation hampers the operation of configuration of customer accounts, addresses etc.
    • Management server failed or off line
    • VoIP Switch component ConfigCenter, AdminCenter DataAccessCenter, RatingCenter stopped working correctly


The VoIP Switch Administrator finds here the work flow for analyzing VoIP Switch problems:



Analysis:

1. Check if it is a single alarm or a bulk alarm situation.

a) Connect to the VoIP Switch monitor Xymon "Main View"
→ As a rule of thumb: It is a single error if only one issue is displayed.



2. Analyze and treat a single alarm situation:

a) Check the contents of the error message.
b) Compare the error description against the Indication "Xymon Event" ones in chapter "VoIP Switch Maintenance"
c) Check if the actual situation is equal or similar as described and the recommended actions suitable.
d) Execute the suitable actions.
→ If you are not sure contact the "VoIP Switch Supplier Support"



3. Analyze the bulk alarm situation:

a) Get a first overview of the situation by analyzing the Xymon Monitor :
Check in the MS-01 Xymon monitor the server, component and IP status:
Xymon GUI → Xymon "Main View"
  1. Which type of server are affected?
    • At least one LoadBalancer LB server must be active that the telephony service can work!
    • At least one ServiceCenter SC server must be active that the telephony service can work!
    • At least one server with the operative database must be active that the telephony service can work!
     
  2. Check the CPE registration statistic :
    • Do drop the CPE registrations?
     
  3. Check the call statistic:
    • Do drop the VoIP Switch number of calls?
      Xymon GUI → Management Server → Column "calls_sys"
    • Do drop the calls on one or more ServiceCenter?
      Xymon GUI → ServiceCenter Server → Column "calls_sc"
    • Do drop the calls on one or more gateways?
      Xymon GUI → Gateway → Column "calls_gw"
     
  4. Do the same check as above on MS-02 Xymon Monitor
     
  5. Does the comparison of the two Xymon Monitor point out that:
    • The same single component on the same server failed?
    • All components of one side failed?
    • The Xymon Monitor sees only the components on its side?
    • The telephony service is running at least on one side


b) Extend the overview by analyzing the ConfigCenter "System Component" Overview :
Check in the MS-01 ConfigCenter the status of the VoIP Switch components:
ConfigCenter GUI → Menu "System" → Menu "Components"
  1. Are actually calls running and new calls can be established?
     
  2. Make test calls:
    • To and from a telephone number in the PSTN
    • On-net test calls
    • Call a well known VoiceMail Box from on-net and from PSTN
     
  3. Is the number of running calls fast dropping and no new calls are established?
     
  4. Which type of VoIP Switch components are affected?
    • At least one LoadBalancer component must be active that the telephony service can work!
    • At least one ServiceCenter component must be active that the telephony service can work!
    • At least one operative database must be active that the telephony service can work!
    • Does this picture correspond to the results of the first overview in the Xymon Monitor ?
     
  5. Do the same check as above on MS-02 ConfigCenter
     
  6. Does the comparison of the two ConfigCenter point out that:
    • The same single component on the same server failed?
    • All components of one side failed?
    • The ConfigCenter sees only the components on its side?
    • The telephony service is running at least on one side



4) Treat bulk alarm situations:

a) Is there a VoIP Switch server hardware, RAID or hard-disk problem?
→ Indications:
Indication:
<HOST_NAME> "snmptrapd" "failure"
<HOST_NAME> "snmptrapd" "degraded"


→ Actions:
For DELL server see: "Treating Problems of Servers from DELL Inc ®"



b) Is the IP connectivity affected to or between VoIP Switch servers?


Note

If VoIP Switch servers are affected then a lot of additional alarming messages of missing VoIP Switch components will pop up!!
This can be one of the most annoying erroneous situations!



→ Indications:
Indication:
<HOST_NAME> conn "Host does not respond to ping" <IP_ADDRESS>
* Dropping CPE registrations !


→ Actions:
See: "Maintenance Due to IP Network Alarm"


c) → If you are not sure what to do then contact the "VoIP Switch Supplier Support"




→ Top

VoIP Switch Server Maintenance


→ Top

Maintenance Due to VoIP Switch Components General Alarms  


→ Top

Maintenance Due to Messages from Java Framework

Indication "Xymon Event":
Monitor Log, Email or SMTP Trap may contain the following information:

Indication:
<HOST_NAME> msgs "Jdbc"



Description:
Java internal exceptions. Mostly due to database accesses which are hopefully handled by the application.


Consequences:
→ For the VoIP Switch telephony service:

  • Mostly none

→ For the operations:

  • Mostly none

→ For the user:

  • Mostly none


Solution:
Observe the frequency of this event


Action:
1. Observe the frequency of this event

2. If the erroneous condition is to frequent then contact the "VoIP Switch Supplier Support"!




→ Top

Maintenance Due to Messages from VoIP Switch Components Internals

Indication "Xymon Event":
Monitor Log, Email or SMTP Trap may contain the following information:

Indication:
<HOST_NAME> msgs "EventQueue"
<HOST_NAME> msgs "SysCompDatabase - Cannot evalute status"



Description:
These events may happen on all VoIP Switch servers and are VoIP Switch component internal notes.


Consequences:
→ For the VoIP Switch telephony service:

  • Mostly none

→ For the operations:

  • Mostly none

→ For the user:

  • Mostly none


Solution:
Observe the frequency of this event


Action:
1. Observe the frequency of this event

2. If the erroneous condition is to frequent then contact the "VoIP Switch Supplier Support"!




→ Top

Maintenance Due to Messages from LoadBalancer Server


→ Top

Maintenance Due to HealthCheck Message

Indication "Xymon Event":
Monitor Log, Email or SMTP Trap may contain the following information:

Indication:
<HOST_NAME> msgs "HealthCheck"



Description:
The HealthCheck supervises the status of virtual IP addresses and their associated physical IP addresses. If the HealthCheck on one server doesn't see the peer physical IP address it takes over the virtual IP address. It most probably points out an IP network problem in the "Public Voice Segment"


Consequences:

Warning

This erroneous condition must be checked within reasonable time!


→ For the VoIP Switch telephony service:

  • None if concurrently no other IP network problems arise

→ For the operations:

  • None

→ For the user:

  • None


Solution:
Solve the IP network if needed.

Check status the VoIP Switch component with an active-passive scheme:

  • LoadBalancer
  • CallBalancer
  • RatingCenter


Action:
1. Check if the IP network is OK


2. Check the status of the LoadBalancer components

→ Confirm if the active LoadBalancer swapped, e.g. from *-lb-01 to *-lb-02


3. Check the status of the CallBalancer components

→ Confirm if the active CallBalancer swapped, e.g. from *-lb-01 to *-lb-02


4. Check the status of the RatingCenter components

→ Confirm if the active CallBalancer swapped, e.g. from *-ms-01 to *-ms-02
→ Confirm if the active RatingCenter is processing the CDR's


5. Treat the problem:

a) If there are IP network problems
→ Actions see: "Maintenance Due to IP Network Alarm"


b) If there is a LoadBalancer problem try to restart the component:
  root# loadbalancer restart


c) If there is a CallBalancer problem try to restart the component:
  root# callbalancer restart


d) If there is a RatingCenter problem try to restart the component:
  root# ratingcenter restart


e) If the RatingCenter swapped make sure that the CDR are processed:
  1. ConfigCenter GUI → Menu "System" → Menu "Components"
    → Click line at "active" RatingCenter -> In dialog select "Process CDRs"
    → Click button [ Close ]
  2. The CDR CSV-Files are processed:
  root# cd /home/servicecenter/cdrs


Check if the CSV files have an actual time stamp which indicates that new CDRs where written:
  root# ls -ltra


Open a CSV file and check for new entries, e.g.:
  root# less monthly.csv



6. If the erroneous condition remains then contact the "VoIP Switch Supplier Support"!

If those events are logged subsequently then rapport it to the "VoIP Switch Supplier Support"!




→ Top

Maintenance Due to LoadBalancer Message

Indication "Xymon Event":
Monitor Log, Email or SMTP Trap may contain the following information:

Indication:
<HOST_NAME> msgs "Balancer"



Description:
LoadBalancer internal problem that is treated internally by the component. The LoadBalancer has an "active-passive" redundancy scheme.


Consequences:
→ For the VoIP Switch telephony service:

  • None

→ For the operations:

  • None

→ For the user:

  • None


Solution:
Not defined yet


Action:
1. If the erroneous condition remains then contact the "VoIP Switch Supplier Support"!

If those events are logged subsequently then rapport it to the "VoIP Switch Supplier Support"!




→ Top

Maintenance Due to LoadBalancer Message "Missing ServiceCenter"

Indication "Xymon Event":
Monitor Log, Email or SMTP Trap may contain the following information:

Indication:
<HOST_NAME> msgs "BalancerSwitch" <SERVICECENTER> "not available anymore"



Description:
The LoadBalancer indicates that it doesn't see a certain ServiceCenter.

This happens when:

  • the ServiceCenter has restarted
→ the event will be transient
  • the ServiceCenter is stopped
→ the event will remain until the ServiceCenter is started again
  • no IP connectivity
→ the event will remain until the IP connectivity is reestablished


Consequences:

Warning This erroneous condition must be handled within reasonable time!


→ For the VoIP Switch telephony service:

  • None, the other ServiceCenter take over the work load
  • If a ServiceCenter is missing then the VoIP Switch looses redundancy capability

→ For the operations:

  • None

→ For the user:

  • None


Solution:
Solve the IP network problems if needed:

→ Actions see: "Maintenance Due to IP Network Alarm"

Solve the server problem if needed

→ Actions see: "Treating Server Hardware Problems"


Action:
1. Check if the IP network is OK


2. Check the status of the ServiceCenter components

→ Confirm that the reported ServiceCenter server is affected


3. Check the reported ServiceCenter server with the "Server Administrator (OMSA)"


4. Treat the problem:

a) If there are IP network problems
→ Actions see: "Maintenance Due to IP Network Alarm"


b) If there is a ServiceCenter problem try to restart the component:
  root# servicecenter restart



5. If the erroneous condition remains then contact the "VoIP Switch Supplier Support"!




→ Top

Maintenance Due to CallBalancer Message

Indication "Xymon Event":
Monitor Log, Email or SMTP Trap may contain the following information:

Indication:
<HOST_NAME> msgs



Description:
The CallAgent dispatches MGCP messages to the CallAgent components.

The CallAgent has an "active-passive" redundancy scheme.


Consequences:

Warning

This erroneous condition must be checked within short time!


→ For the VoIP Switch telephony service:

  • None

→ For the operations:

  • None

→ For the user:

  • Users with MGCP MTA as telephone adapter may not be able to telephone


Solution:
Check status the CallBalancer active-passive scheme and if the MGCP messages are processed.


Action:
1. Check if the IP network is OK


2. Check the status of the CallBalancer components:

a) Confirm if the active CallBalancer swapped , e.g. from *-ms-01 to *-ms-02


b) Confirm if the active CallBalancer is processing the MGCP messages
→ Check if the CallAgent treat MGCP connections and that the total number of MGCP connections is not dropping.


3. Check if the MGCP audits are not dropping:

a) Connect to a Xymon monitor and check in Xymon Column "regs" the numbers of MGCP-Active and MGCP-Brocken


b) Check the questions:
  • Do drop the number of MGCP-Active?
→ If yes => There may be a IP backbone problem or CallBalancer, CallAgent outage!


4. Treat the problem:

a) If there are IP network problems
→ Actions see: "Maintenance Due to IP Network Alarm"
b) If there is a CallBalancer problem try to restart the component:
  root# callbalancer restart



5. If the erroneous condition remains then contact the "VoIP Switch Supplier Support"!




→ Top

Maintenance Due to MediaServer Message

Indication "Xymon Event":
Monitor Log, Email or SMTP Trap may contain the following information:

Indication:
<HOST_NAME> msgs "MediaConnection (06) Cannot handle outgoing message"
<HOST_NAME> msgs "MediaServerProvider (MS) refreshing mediaserver mc1ms2 failed"



Description:
The MediaServer records or plays back announcements and VoiceMail messages. Occasionally it may not correctly record a message and transfer it to the MediaCenter or play back an announcement or message.

The MediaServer can act as media proxy for active connections and transcode media streams.


Consequences:

Warning

If in this VoIP Switch the MediaServer acts as media proxy then the erroneous situation must be checked soon!


→ For the VoIP Switch telephony service:

  • None

→ For the operations:

  • None

→ For the user:

  • A VoiceMail Box message or announcement couldn't correctly record or played back.
  • User may not hear the other side or vica versa.


Solution:
Depends on the situation.


Action:
1. If the erroneous condition remains or happens to often then contact the "VoIP Switch Supplier Support"!




→ Top

Maintenance Due to Messages from Management Server  


→ Top

Maintenance Due to AdminCenter Message "Missing FMC Application Server"

Indication "Xymon Event":
Monitor Log, Email or SMTP Trap may contain the following information:

Indication:
<HOST_NAME> msgs "FmcRequest - Cannot post request"
<HOST_NAME> msgs "FmcProvider - could not provision pbx"



Description:
The AdminCenter tried to configure the FMC application.

Consequences:

Warning

This erroneous condition is sporadic or must be handled within reasonable time!


→ For the VoIP Switch telephony service:

  • None

→ For the operations:

  • A configuration on a FMC server failed

→ For the user:

  • A user "an MC-Phone" is not working


Solution:
Check the state of the FMC servers and their IP connectivity toward the VoIP Switch servers.


Action:
1. Check if the IP network is OK


2. Check the status of the FMC server


3. Treat the problem:

a) If there are IP network problems
→ Actions see: "Maintenance Due to IP Network Alarm"


b) If there is a FMC server problem
→ Contact the "VoIP Switch Supplier Support"!


4. If the erroneous condition remains then contact the "VoIP Switch Supplier Support"!




→ Top

Maintenance Due to AdminCenter Message "Missing Redirection Server"

Indication "Xymon Event":
Monitor Log, Email or SMTP Trap may contain the following information:

Indication:
<HOST_NAME> msgs "FmcProvider - could not provision user" <USER_TELEPHONE_NUMBER>



Description:
The mobile app "an MC-Phone" couldn't get the information from the associated redirection server (by default a Comdasys server located in Europe) where its responsible configuration server is located. Therefore the users "an MC-Phone" couldn't obtain its configuration and will not work.


Consequences:

→ For the VoIP Switch telephony service:

  • None

→ For the operations:

  • None

→ For the user:

  • The mobile app "an MC-Phone" will not work


Solution:
Make sure to have good IP connectivity to the Internet


Action:
1. The user must find a reliable Internet connection and restart the app "an MC-Phone" until it gets its configuration




→ Top

Maintenance Due to ConfigCenter Message "Wrong User Login"

Indication "Xymon Event":
Monitor Log, Email or SMTP Trap may contain the following information:

Indication:
<HOST_NAME> msgs "msgsAccessLogger - ADMIN:login; user" <USERNAME> "-> User Blocked"



Description:
A VoIP Switch Administrator, Operator, Supporter tried to login to the ConfigCenter with wrong credentials. The user will be blocked for several minutes.


Consequences:

→ For the VoIP Switch telephony service:

  • None

→ For the operations:

  • The user will be blocked from the ConfigCenter for several minutes.

→ For the user:

  • None


Solution:
Wait


Action:
1. Retry after a few minutes with the correct login credentials.


2. If the erroneous condition remains then contact the "VoIP Switch Supplier Support"!




→ Top

Maintenance Due to ConfigCenter Message "DB Replication Check"

Indication "Xymon Event":
Monitor Log, Email or SMTP Trap may contain the following information:

Indication:
<HOST_NAME> msgs JdbcReplicationMonitor "Replication" '<BROKEN_REPLICATION_DIRECTION>' "is broken!"



Description:
The database replication check was not successful. This can happen from time to time when the database has to process heavy load.

In most cases the database replication recovers automatically even after several hours of failed replication. If it is not recovering then this is a severe problem and must be treated.


Consequences:

Warning If this erroneous condition remains then this is a SEVERE erroneous condition and must be treated within short time!


→ For the VoIP Switch telephony service:

  • The database redundancy is endangered

→ For the operations:

  • None

→ For the user:

  • None


Solution:
Restore the MySQl DB replication if the erroneous condition remains.


Action:
1. Check periodically (ca. every half hour) the Xymon monitor for this error condition.

2. If the erroneous condition remains then contact the "VoIP Switch Supplier Support"!




→ Top

Maintenance Due to DataAccessCenter Message

Indication "Xymon Event":
Monitor Log, Email or SMTP Trap may contain the following information:

Indication:
<HOST_NAME> msgs "Jdbc" "SQL-Exception during statement"



Description:
A configuration via the DataAccessCenter may have failed.

This may happen if the database is under heavy load.


Consequences:

Warning This erroneous condition must be checked within reasonable time!


→ For the VoIP Switch telephony service:

  • None

→ For the operations:

  • A customer configuration may have failed (which is hopefully covered by the CRM application).

→ For the user:

  • None


Solution:
Inter-working between the DataAccessCenter and database must be optimized.


Action:
1. If this Java event is logged subsequently then rapport it to the "VoIP Switch Supplier Support"!




→ Top

Maintenance Due to RatingCenter Message

Indication "Xymon Event":
Monitor Log, Email or SMTP Trap may contain the following information:

Indication:
<HOST_NAME> msgs



Description:
The RatingCenter has an "active-passive" scheme. Every RatingCenter event has to be checked if the active RatingCenter is working correctly and is processing the CDRs.


Consequences:

Warning

This erroneous condition must be checked within short time!



→ For the VoIP Switch telephony service:

  • None

→ For the operations:

  • A CDR may be not written correctly into the CDR database and/or CSV files.
  • The customer billing contains not all CDR


→ For the user:

  • None


Solution:
Check status the RatingCenter active-passive scheme and if the CDR are processed.


Action:
1. Check the status of the RatingCenter component

→ Confirm if the active RatingCenter is processing the CDR's


2. Treat the problem:

a) If the RatingCenter swapped make sure that the CDR are processed:
Open the ConfigCenter Menu "Components"
→ Click line at "active" RatingCenter -> In dialog select "Process CDRs"
→ Click button [ Close ]


b) Check if the CDR CSV-Files are processed:
Open the CDR directory:
  root# cd /home/ratingcenter/cdrs


Check if the CSV files have an actual time stamp which indicates that new CDRs where written:
  root# ls -ltra


Open a CSV file and check for new entries, e.g.:
  root# less monthly.csv


3. If the erroneous condition remains then contact the "VoIP Switch Supplier Support"!




→ Top

Maintenance Due to Messages from ServiceCenter Server  


→ Top

Maintenance Due to FaxServer Message

Indication "Xymon Event":
Monitor Log, Email or SMTP Trap may contain the following information:

Indication:
<HOST_NAME> msgs



Description:
Fax may not received correctly. The mailing of the PDF file may fail.


Consequences:

→ For the VoIP Switch telephony service:

  • None

→ For the operations:

  • None

→ For the user:

  • A received Fax may not be correctly received and transferred to the user. This situation is usually handled by the Fax device either automatically or manually.


Solution:
Restart the FaxServer component.


Action:
1. Check if no Fax at all are received.

→ Send test fax.


2. Restart the FaxServer:

  root# faxserver restart



3. If the FaxServer logs subsequently then rapport it to the "VoIP Switch Supplier Support"!




→ Top

Maintenance Due to MediaCenter Message

Indication "Xymon Event":
Monitor Log, Email or SMTP Trap may contain the following information:

Indication:
<HOST_NAME> msgs MediaCenterCall
<HOST_NAME> msgs MediaServer
<HOST_NAME> msgs "file not found"



Description:
The MediaCenter handles the WAV files from announcements and VoiceMail messages. Occasionally it may not correctly record a message, loose a message file. Also an order to the MediaServer may fail to replay a message or announcement.


Consequences:

→ For the VoIP Switch telephony service:

  • None

→ For the operations:

  • None

→ For the user:

  • A VoiceMail Box message or announcement couldn't correctly recorded or played back


Solution:
Clean up the VioceMail message date base.

Optimize the inter-working of MediaCenter and MediaServer


Action:
1. If those events are logged subsequently then rapport it to the "VoIP Switch Supplier Support"!




→ Top

Maintenance Due to ServiceCenter Message

Indication "Xymon Event":
Monitor Log, Email or SMTP Trap may contain the following information:

Indication:
<HOST_NAME> msgs



Description:
The ServiceCenter is the main component of the VoIP Switch. It computes the connections signaling and telephony features.

The ServiceCenter has an all active redundancy scheme. If one ServiceCenter fails the remaining ServiceCenter take over the work load.


Consequences:

Warning This erroneous condition must be checked and treated within short time!


→ For the VoIP Switch telephony service:

  • As long one ServiceCenter remains the VoIP Switch works!

→ For the operations:

  • None

→ For the user:

  • None


Solution:
Depends on the analyzed problem.


Action:
1. Check how acute the problem is:

a) Check if the IP network is OK


b) Check the status of the ServiceCenter component
  • Are enough ServiceCenter active that the work load can be treated?
→ If NO then there is a most SEVERE erroneous situation


c) Check in the ConfigCenter Menu "Components" if the active ServiceCenter is processing the connections:
  • Do drop the total number of connections?
→ If YES then there is a most SEVERE erroneous situation:
→ There may be a IP backbone problem!


d) Check in the Xymon Column "regs" the number of registered SIP-Devices:
  • Do drop the number of SIP-Devices?
→ If YES then there is a most SEVERE erroneous situation:
→ There may be a IP backbone problem!


e) Check the reported ServiceCenter server with the "Server Administrator (OMSA)"
  • Are problems signaled?


2. Treat the problem:

a) If there are IP network problems
→ Actions see: "Maintenance Due to IP Network Alarm"


b) If there is a ServiceCenter problem try to restart the component:
  root# servicecenter restart



c) If there is a hardware problem:
→ Actions see: "Treating Server Hardware Problems"


3. If the erroneous condition remains then contact the "VoIP Switch Supplier Support"!




→ Top

Maintenance Due to ServiceCenter Message "License Violation"

Indication "Xymon Event":
Monitor Log, Email or SMTP Trap may contain the following information:

Indication:
<HOST_NAME> msgs License "License Violation"
<HOST_NAME> msgs License "grace-period remaining:"



Description:
This ServiceCenter has a license problem and will work only for the remaining grace period.


Consequences:

Warning

This erroneous condition must be checked and treated within the remaining grace period!


→ For the VoIP Switch telephony service:

  • As long one ServiceCenter remains the VOIP Switch works
  • The telephony service will be stopped on this ServiceCenter after passing of the grace period

→ For the operations:

  • None

→ For the user:

  • None


Solution:
Get actual licenses from the VoIP Switch Supplier.


Action:
1. Check in the ConfigCenter Menu "Components" which ServiceCenter component has a license problem and how long the grace period is.


2. Contact the "VoIP Switch Supplier Support"!




→ Top

Maintenance Due to ServiceCenter Message "Failed Emergency Call"

Indication "Xymon Event":
Monitor Log, Email or SMTP Trap may contain the following information:

Indication:
<HOST_NAME> msgs ServicePrioCallControl "Could not establish priority-call". Call from Connection/<SIP_CALL_ID>/<CALLING_NUMBER> to <CALLED_EMERGENCY_NUMBER>



Description:
A user's emergency call failed!


Consequences:

Warning Severe legal condition that must be handled!

This case can have legal consequences for the provider!


→ For the VoIP Switch telephony service:

  • None

→ For the operations:

  • None

→ For the user:

  • The emergency call did not work


Solution:
Check if the call routing failed due to a VoIP Switch emergency call treating or routing. If yes fix them.

Check if the PSTN provider did reject the emergency call. If yes contact the PSTN provider.


Action:
1. Archive traces for legal responsibilities:

  • Save the trace of this emergency call and all subsequent calls from this user toward emergency numbers


2. Check where the call was rejected.

  • If the call was rejected at the PSTN provider side contact the PSTN provider and let investigate into this case.


3. Check the VoIP Switch's emergency routing:

  • Emergency numbers
  • Emergency number rewriter
  • Routing Tables toward the PSTN
  • RuleSet that may tag outgoing calls toward emergency numbers


4. Check if any IP network devices may interfere with the SIP signaling:

  • If there are external Session Board Controller SBC or SIP-SS7 Gateway involved check their behavior concerning the emergency calls
  • If a firewall FW is involved check that no SIP ALG or "SIP Helpers" are active


5. Treat the problem:

a) Adjust the emergency routing of the VoIP Switch if needed


b) Fix the IP network devices if needed


6. If the erroneous condition remains then contact the "VoIP Switch Supplier Support"!




→ Top

Maintenance Due to ServiceCenter Message "TopStop"

Indication "Xymon Event":
Monitor Log, Email or SMTP Trap may contain the following information:

Indication:
<HOST_NAME> msgs ServiceRatingControl (01) <CALLING_NUMBER> "max available charges reached for account:"
<HOST_NAME> msgs AlarmLogger "[TOPSTOP][ALARM] tenant" <TENANT> "topstop limit nearly reached for account"



Description:
A user's TopStop limit was reached!


Note

A TopStop alarm early in the month or for a lot of users indicates a possible fraud case!



Consequences:

→ For the VoIP Switch telephony service:

  • None

→ For the operations:

  • A TopStop alarm early in the month indicates a possible fraud case

→ For the user:

  • No outgoing calls except emergency call will work when the TopStop limit is reached


Solution:
If it is a regular TopStop then contact the user and enhance the monthly TopStop limit.

If it is a fraud situation handle according "Best Practice: Fraud"


Action:
1. Check if it is a regular TopStop situation.


2. Check if it is a possible fraud case:

  • Reached TopStop limit early in the month?
  • Concurrently a lot of TopStop limits reached?
  • High call peak during the night or weekend?
→ Check at Xymon Column " calls_sys " .


3. Treat according " Best Practice for "Fraud Situation"


4. If the erroneous condition remains then contact the "VoIP Switch Supplier Support"!




→ Top

Maintenance Due to Nimbus Message

Indication "Xymon Event":
Monitor Log, Email or SMTP Trap may contain the following information:

Indication:
<HOST_NAME> msgs "NimbusLink (ue) Cannot subscribe"



Description:
The Nimbus component is a VoIP Switch internal bus that connects the various VoIP Switch components on the servers. If a Nimbus endpoint on one server is missing the other Nimbus endpoints start to complain.

If a Nimbus endpoint is missing then the component may be stopped, the server not on line or an IP network problem.

→ This error is often displayed during VoIP Switch software upgrades of the servers. In this situation just wait until the upgrade is finished.


Consequences:

Warning

This erroneous condition must be checked and treated within reasonable time!


→ For the VoIP Switch telephony service:

  • Usually none

→ For the operations:

  • None

→ For the user:

  • None


Solution:
Solve the IP network problems or server problems if needed.


Action:
1. Check if the IP network is OK


2. Check the status of the VoIP Switch components located on the server where the Nimbus is missing:

→ Is only Nimbus missing or other components to on this server?


3. Treat the problem:

a) If there are IP network problems
→ Actions see: "Maintenance Due to IP Network Alarm"


b) If there is not a planned outage then try to solve the server problem


c) If there is not a planned outage then try to restart the Nimbus on this server:
  root# nimbus restart



4. If the erroneous condition remains then contact the "VoIP Switch Supplier Support"!




→ Top

Maintenance Due to Messages from CallAgent Server


→ Top

Maintenance Due to CallAgent Message

Indication "Xymon Event":
Monitor Log, Email or SMTP Trap may contain the following information:

Indication:
<HOST_NAME> msgs



Description:
The CallAgent treats the message exchange with the MGCP MTA. The CallAgent has an all active redundancy scheme. If one CallAgent fails the remaining CallAgent take over the work load.


Consequences:

Warning

This erroneous condition must be checked within short time!


→ For the VoIP Switch telephony service:

  • As long one CallAgent remains the VOIP Switch works

→ For the operations:

  • None

→ For the user:

  • Single MGCP MTA at the user's premises is not working correctly. The telephone service may not always work for this users.


Solution:
Depends on the analyzed problem.


Action:
1. Check if the IP network is OK


2. Check the status of the CallAgent components

→ Confirm that the reported CallAgent server is affected

3. Check the reported CallAgent server with the "Server Administrator (OMSA)"


4. Treat the problem:

a) If there are IP network problems
→ Actions see: "Maintenance Due to IP Network Alarm"


b) If there is a CallAgent problem try to restart the component:
  root# callagent restart



5. If the erroneous condition remains then contact the "VoIP Switch Supplier Support"!




→ Top

Maintenance Due to Messages from CPECenter Server


→ Top

Maintenance Due to CpeCenterMessage

Indication "Xymon Event":
Monitor Log, Email or SMTP Trap may contain the following information:

Indication:
<HOST_NAME> msgs
<HOST_NAME> msgs "DevAdmProvider (-1) duplicated devicetype:" <DEVICE_TYPE>



Description:
During the preparation of a device configuration file two device configuration templates were found. If a CPE loads a device configuration file which was produced under these conditions it may not work correctly.


Consequences:

→ For the VoIP Switch telephony service:

  • None

→ For the operations:

  • None

→ For the user:

  • The CPE may not work with the produced configuration file


Solution:
One device configuration template has to be deleted.


Action:
1. Contact the "VoIP Switch Supplier Support"!




→ Top

Maintenance Due to IP Network Alarms

Indication "Xymon Event":
Monitor Log, Email or SMTP Trap may contain the following information:

Indication:
<HOST_NAME> conn "Host does not respond to ping" <IP_ADDRESS>



Description:
This test performs a "ping" toward the IP address of the host. If the "ping" is not answered then there is a problem with the IP network, e.g.:

  • Pinged host defect or off line
  • Layer2 IP Switch defect or off line
  • Brocken IP backbone network


Consequences:

Warning

MOST SEVERE condition if several VoIP Switch server are affected for a longer duration (ca 15min)!


→ For the VoIP Switch telephony service:

  • The telephone service may be interrupted

→ For the operations:

  • The MySQL databases may loose their replication

→ For the user:

  • The telephone service may be interrupted for the users!


Solution:
Solve the IP network problems!

Check the IP network devices:

  • Pinged host
  • Layer 2 IP switches
  • IP Routes
  • Firewalls

Check the VoIP Switch server IP connectivity.


Action:
1. Evaluate the severity of the IP network outage:

a) Check if it is a occasional ping failure:
  • Only one host doesn't respond
  • Only 1 or 2 poll cycle fail
→ Type "Occasional Failure":
  • In this situation the erroneous situation may be neglected.


b) Check if it is only a single host:
  • One host doesn't respond anymore
→ Type "Host Failure":
  • Check the hardware condition and IP connectivity of this device
  • Check with the VoIP Switch Administrator in the ConfigCenter Menu "Components" how the VoIP Switch is affected


c) Check if more than one VoIP Switch server is affected:
  • More than one VoIP Switch server don't respond anymore
→ Type "VoIP Switch Failure":
1. Check with the VoIP Switch Administrator how the VoIP Switch is affected:
a) Connect to both (*-ms-01, *-ms-02) ConfigCenter Menu "Components" and check the component status
b) Check the questions:
  • Which VoIP Switch servers are not visible?
  • Are they the same on both ConfigCenter?
  • Does one ConfigCenter see only the servers on its side? E.g.:
Side A components complain that they doesn't see their peers on Side B?
Side B components complain that they doesn't see their peers on Side A?
→ If yes => There is a heavy IP backbone problem
c) Check in the ConfigCenter Menu Channles if new connections were established since the IP outage
→ If yes => Some users still can make phone calls


2. Check with the VoIP Switch Administrator how the users are affected:
a) Connect to both (*-ms-01, *-ms-02) Xymon Column "regs" and check the CPE and MTA registrations status.
b) Check the questions:
  • Check: Do drop the user's CPE registration?
→ If yes => There is a heavy IP backbone problem some users cannot use the telephony service anymore!


3. Treat the Type "VoIP Switch Failure":

a) VoIP Switch Administrator:
In this situation the erroneous situation may be neglected. Observe if the situation remains.


2. Treat the Type " Occasional Failure ":

a) VoIP Switch Administrator:
If possible pre-bar the VoIP Switch component on this server
b) Solve the IP or hardware issue with the failed host


3. Treat the Type "VoIP Switch Failure":

a) VoIP Switch Administrator:
Contact the "VoIP Switch Supplier Support"


4. If the erroneous condition remains then contact the "VoIP Switch Supplier Support"!




→ Top

Maintenance Due to Operating System Alarms

The VoIP Switch Administrator and/or server service personnel find here instructions for managing problems indicated by the operating system supervision.




→ Top

Maintenance Due to Supervised Processes Missing

Indication "Xymon Event":
Monitor Log, Email or SMTP Trap may contain the following information:

Indication:
<HOST_NAME> procs "Processes not OK" <MISSING_PROCESS>



Description:
One or more supervised process of a Linux service or VoIP Switch component is missing.


Consequences:

Warning

SEVERE erroneous condition that must be handled!


→ For the VoIP Switch telephony service:

  • Depends If a VoIP Switch component is missing then the VoIP Switch looses redundancy capability
  • If a Linux service is missing the VoIP Switch may be hampered or the server is not working correctly

→ For the operations:

  • Depends on the VoIP Switch components or Linux service

→ For the user:

  • Depends on the VoIP Switch components or Linux service


Solution:
Restart the VoIP Switch component or Linux service.


Action:
1. Check with the VoIP Switch Administrator if it is possible to restart the component or service without endangering the VoIP Switch telephony service.

→ If possible pre-bar the VoIP Switch component via the ConfigCenter!


2. Restart the VoIP Switch component or Linux service:

a) Restart the VoIP Switch component
  root# <COMPONENT> restart


  • Example:
  root# servicecenter restart



b) Restart the service:
  root# /etc/init.d/<SERVICE> restart


  • Example:
  root# monit restart



3. If the erroneous condition remains then contact the "VoIP Switch Supplier Support"!




→ Top

Maintenance Due to Supervised IP Ports

Indication "Xymon Event":
Monitor Log, Email or SMTP Trap may contain the following information:

Indication:
<HOST_NAME> ports "Ports not OK" <MISSING_PROCESS_PORTS>



Description:
One or more supervised IP port of a Linux service or VoIP Switch component is missing.


Consequences:

Warning

SEVERE erroneous condition that must be handled!


→ For the VoIP Switch telephony service:

  • Depends If a VoIP Switch component is missing then the VoIP Switch looses redundancy capability
  • If a Linux service is missing the VoIP Switch may be hampered or the server is not working correctly

→ For the operations:

  • Depends on the VoIP Switch components or Linux service

→ For the user:

  • Depends on the VoIP Switch components or Linux service


Solution:
Restart the VoIP Switch component or Linux service.


Action:
1. Check with the VoIP Switch Administrator if it is possible to restart the component or service without endangering the VoIP Switch telephony service.

→ If possible pre-bar the VoIP Switch component via the ConfigCenter!


2. Restart the VoIP Switch component or Linux service:

a) Restart the VoIP Switch component
  root# <COMPONENT> restart


  • Example:
  root# servicecenter restart



b) Restart the service:
  root# /etc/init.d/<SERVICE> restart


  • Example:
  root# monit restart



3. If the erroneous condition remains then contact the "VoIP Switch Supplier Support"!




→ Top

Maintenance Due to Supervised Hard-Disk Usage

Indication "Xymon Event":
Monitor Log, Email or SMTP Trap may contain the following information:

Indication:
<HOST_NAME> disk "File systems not OK"



Description:
A hard-disk or hard-disk partition is full. If a hard-disk is full then the Linux operating system behaves unpredictable and the server will most probably crash.


Consequences:

Warning SEVERE erroneous condition that must be handled!


→ For the VoIP Switch telephony service:

  • Depends on the VoIP Switch components running on the server

→ For the operations:

  • Depends on the VoIP Switch components running on the server

→ For the user:

  • Depends on the VoIP Switch components running on the server


Solution:
Identify big files or directories. Delete or archive files externally.


Action:
1. Check hard-disk usage:

  root# df -h



2. Find fat files:

  root# ls -lahS $(find / -type f -size +100k)



  • Example find file sizes >60MByte:
  root# ls -lahS $(find /opt/backup/ -type f -size +60000k)



  • Check for fat files in the following suspicious directories:
    /opt/backup/
  • Do not touch big files in:
    /var/lib/mysql/


3. Find big directories:

  root# du -hs



Example of a more specific search → find directory sizes >1GByte:
  root# du -hs /home/ratingcenter/* | grep G
  root# du -hs /home/*/* | grep G



  • Check the following suspicious directories:
    /opt/backup/
    /home/mediacenter/messages
    //home/ratingcenter/cdrs


4. Prior of deleting files or directories check with the VoIP Switch Administrator if they are not needed anymore!

→ If you are suspicious but not sure if it is wise to delete a certain file or directory then contact the "VoIP Switch Supplier Support"!




→ Top

Maintenance Due to Supervised Memory Usage

Indication "Xymon Event":
Monitor Log, Email or SMTP Trap may contain the following information:

Indication:
<HOST_NAME> memory "Memory low"



Description:
One or more processes consume a lot of memory space. If the memory becomes low the operating system Linux start to swap memory to and from hard-disk. This reduces the performance of the server.


Consequences:

Warning

This erroneous condition must be handled within reasonable time!


→ For the VoIP Switch telephony service:

  • Depends on the VoIP Switch components running on the server

→ For the operations:

  • None

→ For the user:

  • None


Solution:
Identify which process or consumes the memory. Restart the process in order to free memory. Stop and restart the swapping on the server.


Action:

1. If a LoadBalancer *-lb-* or ServiceCenter *-sc-* server is affected:

→ Contact the "VoIP Switch Supplier Support"!


2. Find which processes use the memory:

  • This is a difficult task!
  root# top



3. Stop and restart the swapping:

Preconditions:
  • Choose a day time where the server is not in high load.
  • If possible pre-bar the VoIP Switch components on this server via the ConfigCenter
  • Make sure that the redundant VoIP Switch component is running


a) Restart the responsible process:
  root# /etc/init.d/<PROCESS_NAME> restart



b) Stop the swapping:
  • Don't do this during high load!
  • It will take some time until accomplished!
  root# swapoff -a



c) Restart the swapping:
  root# swapon -a



d) Check if the swap is working regularly:
  root# swapon -s





→ Top

Maintenance Due to Supervised CPU Load

Indication "Xymon Event":
Monitor Log, Email or SMTP Trap may contain the following information:

Indication:
<HOST_NAME> cpu "Load is High"



Description:
One or more processes consume extensively CPU power. This may reduce the performance of the server.


Consequences:

Warning This erroneous condition must be handled within reasonable time!


→ For the VoIP Switch telephony service:

  • Reduced performance on the affected server and VoIP Switch component

→ For the operations:

  • None

→ For the user:

  • None


Solution:
The CPU consuming process has to be identified. If a process is identified it has to be checked if it is a regular or erroneous situation.

If it is a regular situation then it has to be investigated if the servers computing power is still sufficient for this VoIP Switch. If the server hosts a VoIP Switch component which offers an configurable load acceptance via the ConfigCenter then it is worth a try to reduce the components workload.

An erroneous situation can mostly be solved by restarting the process.


Action:
1. Identify the responsible process:

a) Check the process situation with:
  root# top
  root# ps aux



b) If a process is suspicious check for multiple processes of the same name:
  root# ps -aef



c) If a process is suspicious check for zombie processes (lists the zombie process id):
  root# ps aux



d) Evaluate with the VoIP Switch Administrator if the suspicious process is in a regular or erroneous state.


2. Handle an erroneous Linux process state.

a)* Restart a Linux process:
  root# /etc/init.d/<PROCESS_NAME> restart



b) Kill a process, e.g. double started process, zombie:

  root# kill -9 <PROCESS_ID>



3. Handle a VoIP Switch component :

a) Restart an erroneous VoIP Switch component:
  root# <COMPONENT_NAME> restart



b) If the VoIP components ServiceCenter or MediaServer produces high load then the VoIP Switch Administrator may reduce their accepted work load via the ConfigCenter.


4. If the erroneous condition remains then contact the "VoIP Switch Supplier Support"!




→ Top

Maintenance Due to Supervised Files Missing or to Big

Indication "Xymon Event":
Monitor Log, Email or SMTP Trap may contain the following information:

Indication:
<HOST_NAME> ????



Description:


Consequences:

→ For the VoIP Switch telephony service:

  • None

→ For the operations:

  • None

→ For the user:

  • None


Solution:


Action:
1. If the erroneous condition remains then contact the "VoIP Switch Supplier Support"!




→ Top

VoIP System Maintenance


→ Top

Best Practice for Handling a "Fraud" Situation  

The Aarenet VoIP Switch Administrator finds here instructions for managing fraud problems.


1. Immediate action:

  • Block call routing to the destination (usually somewhere in the Caribbean, west or central Africa)
  • If only from one source IP address then block this IP address on the FW


2. Investigate if the fraud is due to "Direct Registrations" with correct SIP credentials on the VoIP Switch:

  • Check if the calling number has multiple SIP registrations of a suspicious source IP range or user agent!
→ If YES then:
→ The SIP credentials were not kept secret or hacked from the users CPE
Action:
  • Block this user account for outgoing calls (blocking international calls is usually sufficient)
  • Change the SIP credential in the user account and the user's CPE.
  • Change the CPE administration login credentials


3. Investigate if the fraud is due to "Hacked Users CPE":

a) Analyze the traces of some fraud connections.
Check if the source IP remain the one of a registered user CPE!
→ If YES then:
→ If yes block this user account for outgoing calls
Action:
  • Block this user account for outgoing calls (blocking international calls is usually sufficient)
  • Inform the user about the fraud and its reason
  • Change the SIP credential in the user account and the user's CPE.
  • Change the CPE administration login credentials


4. Post Work:

  • Undo the "immediate action"
  • Enable the customer account when the SIP credentials and CPE administration login credentials are changed




→ Top




→ Top


© Aarenet Inc 2018

Version: 3.0     Author:  Aarenet     Date: July 2017