Professional Documents
Culture Documents
Issue Date
1.0 2012-12-05
Notice
The purchased products, services and features are stipulated by the contract made between Huawei and the customer. All or part of the products, services and features described in this document may not be within the purchase scope or the usage scope. Unless otherwise specified in the contract, all statements, information, and recommendations in this document are provided "AS IS" without warranties, guarantees or representations of any kind, either express or implied. The information in this document is subject to change without notice. Every effort has been made in the preparation of this document to ensure accuracy of the contents, but all statements, information, and recommendations in this document do not constitute a warranty of any kind, express or implied.
Issue 01 (2012-12-05)
Change History
Date 2012-10-25 2012-10-30 Revision version 0.9 1.0 change Description Completed the initial draft Completed the Modification Author Zeng Wei (employee ID: 00161296) Zeng Wei (employee ID: 00161296)
Keywords
Paging, PCH congestion, paging delay
Issue 01 (2012-12-05)
ii
Abstract
This document describes how to troubleshoot paging failure problems by checking and analyzing all rounds in different network operation scenarios, guides frontline personnel to analyze networks, solve paging failure problems, and separate work of network optimization personnel, technical service personnel, and Research & Development (R&D) personnel from each other, and provides fixed actions to discover factors affecting paging success rates. The primary goal of this document is to guide frontline personnel to quickly handle simple problems and effectively feed back difficult problems to R&D personnel, and improve the troubleshooting efficiency.
Issue 01 (2012-12-05)
iii
Contents
Contents
About This Document.......................................................................ii
Issue 01 (2012-12-05)
iv
Overview
This document focuses on analyzing and resolving paging failures during the initial optimization of new and swapped networks and guides frontline personnel to rapidly analyze and optimize networks to ensure that paging problems can be rapidly located and solved, improving efficiency. Differences may come with tools, traffic statistics, parameters, and versions. However, analysis methods described in this document are applicable to various base station controller (BSC) versions.
Issue 01 (2012-12-05)
Source of Paging Failures Unqualified paging success rate of swapped or new networks
Data Source Analysis KPI mapping relationships between peer vendors and Huawei before and after swapping Traffic statistics KPIs between peer vendors and Huawei before and after swapping Paging success rates on the CN side before swapping
Criteria 1. Analyze KPI mapping accuracy including denominators and numerators) and check that KPIs are proper. 2. Compare traffic statistics for at least one week before and after swapping and check whether the traffic statistics meet related requirements. 1. Analyze impacts from upgrading and feature enabling scenarios on paging success rates. 2. Determine deterioration by comparing the corresponding days. 3. Compare KPIs three days before and after deterioration and exclude impacts of traffic periods.
Signaling analysis shows that paging failures occur, which is due to causes on the wireless network rather than the CN or UEs.
Issue 01 (2012-12-05)
3
Dimension Application scenario
Issue 01 (2012-12-05)
3.1.2 Categorizing Common Causes and Corresponding Actions of Paging Failures by Application Scopes
Paging Success Rates on Swapped Networks Do Not Meet Related Requirements.
Paging success rate KPIs may not be available on the UTRAN side of peer vendors' networks, therefore, take paging success rates on the CN side to verify whether they meet the standard. After swapping, paging success rates are lower than those on the original networks. Common reasons for this problem are as follows: 1. Algorithm/Parameter mapping problems: including incorrect mappings of RNC-level and site-level parameters. In addition, some algorithms and parameters of Huawei devices are not accurately mapped with those of original networks. This may result in changes of traffic models, calling control processes, resource allocation, wireless coverage, and network traffic distribution, affecting paging performance. (action 2) Hardware and transmission faults: alarms of device board exceptions and transmission faults. (action 3) NE versions: NE versions are not properly selected (an early version has many problems) or version defects result in low paging success rates. (action 5) Network planning: neighboring cell inheritance errors, LAC/RAC area planning differences, and insufficient paging channels over air interfaces result in low paging success rates. (action 4) RF channel problems: Antenna system faults, RF connection faults, antenna match, antenna tilt adjustment, and internal and external interference of the antenna system result in low paging success rates. (action 6) Long-term impact: In the scenarios with a long swapping period of time, seasonal changes in traffic models and KPIs take place. In this case, increased traffic volumes result in air interface capacity limitation, causing paging success rate deterioration. (action 9) Impacts of external unexpected factors: For example, during swapping or after swapping, user actions and traffic distribution greatly change due to external factors such as charging adjustments of operators, holidays, activities, and climates. Therefore, paging success rates after swapping are affected. (action 10) Exceptions on the CN side: In a short period of time, paging success rates may deteriorate suddenly or fluctuate dramatically. After the RAN-side impacts are eliminated, this problem may be caused by exceptions on the CN side, including exceptions in operations and terminal servers (for example, faults of iTunes and BlackBerry servers may increase the number of paging users), and IP or port scanning in networks in the case of allocating IP addresses to UEs by the CN. (action 11) Abnormal UEs and users: Poor compatibility of UEs, business users, and malicious users result in paging failures. (action 11)
2. 3. 4.
5.
6.
7.
8.
9.
policy planning, and improper neighboring cell and frequency planning result in low paging success rates. (action 2) 11. Hardware and link faults: Devices, board, and transmission faults result in low paging success rates. (action 3) 12. NE version problems: NE versions are not properly selected (an early version has many problems) or precautions actions are not performed, which results in low paging success rates. (action 4) 13. Network planning optimization problems: Inappropriate site planning, networking inheritance policy planning, LAC/RAC planning, air interface resources, and board resources cause coverage problems, affecting paging success rates. (action 5) 14. RF channel problems: Internal and external inference and RF channel connection faults cause uplink and downlink coverage problems, affect paging success rates. (action 6) 15. Impacts of external unexpected factors: For example, during swapping or after swapping, user actions and traffic distribution greatly change due to external factors such as charging adjustments of operators, holidays, activities, and climates. Therefore, paging success rates on new networks are affected. (action 9) 16. Exceptions on the CN side: In a short period of time, paging success rates may deteriorate suddenly or fluctuate dramatically. After the RAN-side impacts are eliminated, this problem may be caused by exceptions on the CN side, including exceptions in operations and terminal servers (for example, faults of iTunes and BlackBerry servers may increase the number of paging users), and IP or port scanning in networks in the case of allocating IP addresses to UEs by the CN. (action 10) 17. Long-term trend impact: Seasonal changes in traffic models and KPIs affect paging success rates to a certain extent. For example, based on historical experience, the time when new networks are deployed, paging success rates are the lowest throughout the year. (action 8) 18. Abnormal UEs and users: Poor compatibility of UEs, business users, and malicious users result in paging failures. (action 11)
user actions and traffic distribution greatly change due to external factors such as charging adjustments of operators, holidays, activities, and climates. Therefore, paging success rates deteriorate after swapping. 25. Exceptions on the CN side: In a short period of time, paging success rates may deteriorate suddenly or fluctuate dramatically. After the UTRAN-side impacts are eliminated, this problem may be caused by exceptions on the CN side, including exceptions in operations and terminal servers (for example, faults of iTunes and BlackBerry servers may increase the number of paging users), and IP or port scanning in networks in the case of allocating IP addresses to UEs by the CN. (action 10) 26. NE and version changes: NE replacement and version upgrades result in sudden paging success rate deterioration.(action 4) 27. Abnormal UEs and users: Poor compatibility of UEs, business users, and malicious users result in paging failures. (action 11)
Issue 01 (2012-12-05)
Type 1: Paging Success Rates Drop Dramatically on the CN Side But Those in Idle Mode on the RNC Side Remain the Same.
Common causes are as follows: Paging messages are discarded over the Iu interface. 1. 2. RRC setup success rates of the called party decrease. RRC is successfully set up but no Initial Direct Transfer message is received on the CN.
Type 2: Paging Success Rates on the CN Side and Those in Idle Mode on the RNC Side Drop Dramatically.
Common causes are as follows: 3. 4. 5. 6. 7. 8. 9. Paging messages are not sent over air interfaces due to internal problems within the RNC. Paging congestions occur because of air interface capability limitation (VS.RRC.Paging1.Loss.PCHCong.Cell). Paging success rate decrease may also result from weak coverage, paging black hole cells, NodeB faults, and cell out-of-service problems. Registered RRC setup success rates decrease. RRC is successfully set up but LAU/RAU procedure fails. Location upgrade of UEs in idle mode or inter-RAT cell reselection is frequent. UEs in CELL-PCH mode frequently reselect cells. Some UEs receive no paging response messages from the air interfaces in shared networks.
10. Some parameters are improperly set, therefore, UEs cannot respond to paging messages in a certain period.
Type 3: Paging Success Rates are Normal on the CN Side But Those on the UTRAN Side Deteriorate.
Common causes are as follows: 11. UEs are not located in areas under the local RNC. 12. The first paging success rate on the CN side decreases but repeated paging success rate increases. 13. The CELL-PCH/FD/EFD feature is enabled on the RNC.
Issue 01 (2012-12-05)
Provide the analysis report to PSEs and R&D personnel for review and specify analysis conclusion, handling measures and optimization suggestions. Analysis and checking in 11 required actions are described in the following several chapters.
Issue 01 (2012-12-05)
4.1.3 Results
Verify whether the current paging problem is one of the followings: 1. 2. 3. Type 1: If paging success rates on the CN side decrease but hardly change on the UTRAN side, it is an Iu interface paging problem. Type 2: If paging success rates decrease on both the CN and UTRAN side, it is a Uu interface paging problem. Type 3: Paging success rates hardly change on the CN side but decrease dramatically on the UTRAN side, it is an improper parameter setting problem.
problems are preceding type 1 and type 2 problems. Criteria: Check parameter mapping if paging success rates on the CN or UTRAN side after swapping do not meet the requirements.
Results
3. 4. Provide parameter mapping review result and list parameters that cannot be accurately mapped. Provide the consistency check result between the parameter mapping review and actually configured parameters and list inconsistent parameters.
Closed Actions
5. 6. Analyze the impact of parameters that cannot be accurately mapped and adjust parameters to observe the effect. Analyze the impact of actual parameters that are not consistent with mapping parameters and adjust parameters to observe the effect.
Issue 01 (2012-12-05)
11
Results
Based on the rule for checking parameter reasonableness, provide the check result of pagingrelated parameters and list the current parameter values and values specified in the check rule. After enabling FMA function for checking parameters, a file with a name extension of .csv is created in the target folder. The comparison results between parameter values of live networks and baseline values are listed in column I and comparison results between parameter values of live networks and parameter reasonableness results are listed in column J.
Closed Actions
Modify parameters based on check results of improper paging-related parameters in the provided .csv file.
Issue 01 (2012-12-05)
12
2.
Results
3. 4. Provide a check list of operation logs and list non-query operation and their impacts. Provide comparison results of all-parameter differences. CMD Name OP CMD Name OP Object OpCmd& Message Impact
Issue 01 (2012-12-05)
13
The following table lists examples: OP Time 2012-10-20 11:05 CMD Name Set the DPU configuration data. OP CMD SET UDPUCF GDATA OP Object RNC OpCmd& Message SET UDPUCFG DATA: MaccPageR epeatTimes =2; Impact Paging messages over air interfaces may increase and paging congestions may occur.
Closed Actions
5. 6. Analyze the impact of operation and parameter differences and provide corresponding solutions. Perform action 3 if no obvious operation causes paging success rate deterioration.
4.3 Action 3: Checking Device and Transmission Faults (Performed by Frontline Engineers)
4.3.1 Checking Alarms
Descriptions
Scenario: Check all scenarios where paging success rates decrease, including deterioration or optimization scenarios. The problems are preceding type 1 and type 2 problems. Criteria: According to the problem type confirmed in action 1, analyze site alarms in LAC areas (including site alarms and site-related RNC alarms) and entire RNC alarms (such as interface board alarms and SPU alarms).
optimization engineers need only to submit device and link alarms to device maintenance engineers.
Results
Provide analysis results of device faults and alarms, including alarm lists, cell lists, and alarm checking results.
Closed Actions
Rectify device faults and clear alarms. Observe KPI recovery status.
Issue 01 (2012-12-05)
15
Confirm configuration problems onsite as soon as possible. Check the transmission link status. Check link status of the control plane by using the link status checking function of the maintenance SOP. Link status of the user plane needs manual checking. The following table lists transmission link status by using the maintenance SOP.
Confirm whether the link status is normal onsite as soon as possible. Check the transmission link load. Evaluate and analyze the link load by using the link load evaluation function of the maintenance SOP. The following table lists the evaluation results of transmission link load.
Issue 01 (2012-12-05)
16
The evaluation result shows that transmission congestion exists in many Iub interfaces.
Confirm the link congestion status onsite as soon as possible. Check transmission link quality The maintenance SOP supports IP transmission quality check and exports corresponding results. If transmission is in IP mode, enable the ping checking function of the IPPATH and evaluate transmission link quality using the IP transmission QoS checking function of the maintenance SOP. For the ATM transmission quality, only Iub interface has the PM function and the quality of the Iu/Iur interface cannot be checked. The following table lists the link load evaluation result of the maintaining SOP link.
Issue 01 (2012-12-05)
17
The following table shows that the IPPATH transmission quality of some NodeBs is poor.
Confirm whether the intermediate transmission network has problems onsite. The following document provides the guide to manually checking transmission. See RAN12 IP QoS Transmission Trouble Shooting Guide.
Results
Provide transmission checking items and results.
Closed Actions
Rectify transmission faults and observe KPIs.
Issue 01 (2012-12-05)
18
Results
Provide values of preceding counters in all cells.
Closed Actions
Take optimization measures by referring to UMTS Access and Paging Principles + Troubleshooting Methods + Cases + Deliverables. Some measures and actions have both positive and negative gains. Therefore, confirm the impact with R&D personnel before taking actions.
2.
Results
3. 4. Provide impact analysis results of version incorporation problems, algorithm changes, and new features. Provide analysis results of version parameter difference impacts.
Closed Actions
5.
Issue 01 (2012-12-05)
Provide a solution to version problems affecting the paging success rate and observe the
Huawei Proprietary and Confidential Copyright Huawei Technologies Co., Ltd. 19
results after implementing measures. 6. If you cannot directly draw any conclusion from version differences, analyze the specific failure range and cause.
4.5.2 Checking RNC and NodeB Versions for Known Problems (Performed by Frontline Engineers)
Descriptions
Scenario: deterioration and optimization scenarios Criteria: Confirm the problem range and cause and eliminate impacts from parameter changes based on actions 1 and 2. Eliminate the impacts of device and link defaults. Check the known version problems.
3.
Results
4. 5. Provide the version RN checking result: List the name of the checked RNs and check result. (List the RN defect descriptions if related defects exist.) Provide case database searching results: List keywords for searching and checking result. (List the link and content of cases if there are any related cases.)
Closed Actions
6. 7. Confirm related RN defects, provide and perform corresponding solutions. Record results. Confirm associated cases and perform solutions. Record results.
Issue 01 (2012-12-05)
20
CMA scripts and methods for running the scripts are described in the following attachment:
4.6.4 Results
Provide evaluation results in Excel using related tools. SPU loads are described in the following figure:
Issue 01 (2012-12-05)
21
4.7 Action 7: Checking RRC Setup Success Rates of Called Parties and Registered Users
4.7.1 Descriptions
Scenario: Paging failures of type 1 are due to low RRC success rates of the called parties and failures of type 2 are caused by low RRC success rates of the registered users. Criteria: Traffic statistics show that RRC success rates of the called parties or registered users are low.
Issue 01 (2012-12-05)
22
4.8.1 Descriptions
Scenario: all deterioration and optimization scenarios where paging failures of type 2 occur. Criteria: According to action 1, it is a paging problem of type 2. No exceptions are found in previous checking. In respect of poor uplink and downlink coverage, check the coverage after checking parameters, devices, and RF channels.
Checking Paramete rs
Issue 01 (2012-12-05)
23
Trigger conditions: mandatory. Check coverage problems caused by missing neighboring cell configuration and improper parameter setting (delayed handover). Identify areas with pilot pollution and high RTWP. Solve pilot pollution problems by combining parameter checking. For high RTWP problems, check RF quality, external interference, and parameters by combing channel check and capacity check. Trigger conditions: This action is required after network swapping or engineering adjustments or when problems must be precisely located. Coverage problems basically can be identified. The prerequisite for identifying coverage problems is to check RF quality, external interference, and parameters by combing channel check and capacity check. Trigger conditions: This action is required when the MR function is enabled on live networks. Combine checking for insufficient sites and parameter problems (pilot pollution) to identify areas with high traffic volume. Trigger conditions: This action is required in engineering adjustment. Check coverage problems caused by parameter problems, for example, parameter adjustment.
Analyzing MRs
4.8.4 Results
Provide an analysis report based on the coverage-related analysis in UMTS Network Planning and Optimization Principles + Troubleshooting Methods + Cases + Deliverables.
4.9 Action 9: Comparing Long-Term Change Trends of Traffic Volumes (Seasonal Changes and Increase in the Number of Users)
Paging success rates may come with similar fluctuation or long-term deterioration throughout the year or in two years. If the deterioration trends of paging success rate are similar for two years, this is related to seasons, grand festivals, gatherings, and traveling seasons, which can be clarified by analysis. If paging success rate deterioration persists for long period of time, analyze whether this is related to persistent increase in the number of users. If yes, provide corresponding measures.
4.9.1 Descriptions
Scenario: Paging success rates slowly deteriorate or fluctuate in a long term.
Issue 01 (2012-12-05) Huawei Proprietary and Confidential Copyright Huawei Technologies Co., Ltd. 24
Criteria: Paging success rates have long-term deterioration or slow deterioration instead of obvious deterioration in a long period.
4.9.4 Results
Provide long-term change trend results and an association analysis result of the preceding counters, specifically, provide the relationship between paging success rates and the following information: Number of RRC attempts of the called party VS.RRC.Paging1.Loss.PCHCong.Cell VS.RANAP.CsPaging.Loss VS.RANAP.PsPaging.Loss Season change trend Network adjustment time Increase time in the number of abnormal terminals Networks accessing time of new users
Issue 01 (2012-12-05)
25
4.10 Action 10: Checking Unexpected Factors (Charging, Weather, and Gatherings)
In a short period of time, the paging success rate may deteriorate suddenly or fluctuate obviously. If the paging success rate changes are not caused by network adjustments, the changes may be caused by unexpected factors, such as charging policy adjustment, bad weathers, festivals and gathering activities. Provide measures based on whether these unexpected factors can be expected or controlled on networks.
4.10.1 Descriptions
Scenario: Paging success rate deteriorates suddenly in a short period of time. Criteria: Network adjustment impacts, device faults and channel faults are eliminated in actions 2, 3, and 8 respectively.
4.10.4 Results
Provide relationships between the trend of paging success rates before and after deterioration and time of network adjustment and unexpected changes.
2.
3.
Issue 01 (2012-12-05)
26
4.11.1 Descriptions
Scenario: Paging success rate deteriorates suddenly in a short period of time. Criteria: Eliminate network adjustment impacts in action 2, and device faults, low RRC success rates, and unexpected factors in actions 3, 7, and 10 respectively.
4.
4.11.4 Results
1. 2. Provide the relationship between the paging success rate change trend before and after deterioration and the time points of network operations on the CN side. Provide distribution results of UEs contributing to sudden increased paging failures If a top UE exists, obtain terminal server information.
Issue 01 (2012-12-05)
b c d
Export the data of online users from the GU HLR, and save the data in a board of the DTL. The data file will be automatically uploaded to an FTP or SFPT server.
Huawei Proprietary and Confidential Copyright Huawei Technologies Co., Ltd. 28
Issue 01 (2012-12-05)
The exported file is compressed into a .tar.gz packet, saved in /opt/uscdb/MemUserExp, and uploaded to an FTP or SFTP server. Exported table contains the mapping between IMSIs and MSISDN. W_COMMON_SER_DATA (table ID is 12), data is shown in the following table: SID 1 2 3 IMSI 460009900088001 460009900088002 460009900088003 MSISDN 8613590208001 8613590208002 8613590208003
For details, see USCDB V100R002C06 Guide to Exporting Subscriber Data Online. http://support.huawei.com/support/pages/kbcenter/view/product.do? actionFlag=detailProductSimple&web_doc_id=SC0000692445&doc_type=VersionDoc&doc _type=VersionDoc
This operation comes with high risks. Therefore, confirm whether to perform this operation based on specific sites and number of users. For a small number of users, export data through the PGW.
This operation is performed with coordination between R&D personnel and frontline personnel.
4.12.4 Results
Provide records of top user paging analysis.
Issue 01 (2012-12-05)
29
Perform action 2 to check whether parameter configurations are proper. The results are shown in the following figure;
Issue 01 (2012-12-05)
30
Closed actions and result evaluation Enable the SPU load sharing function by running the following command. After the function is enabled, flow control on the paging messages is eliminated and paging success rates on the CN side are normal again. SET URRCTRLSWITCH: PROCESSSWITCH=RNC_SHARE_SWITCH-1;
5.3 Cases of Paging Success Rate Deterioration Caused by Incorrect Configurations on the CN Side
Problem
Issue 01 (2012-12-05) Huawei Proprietary and Confidential Copyright Huawei Technologies Co., Ltd. 31
In country I, paging success rates on one of the two RNCs are low, only 50%. By checking paging counters on the CN side, it is found that this paging success rate in LAC areas is about 90%. Analysis process According to action 1, the problem is identified to be a paging problem of type 3. This problem may be caused by incorrect configurations on the CN side. Check the paging-related configurations on the MSC. The result shows that the number of configured LACs for the RNC is greater than the number of actual LACs. Therefore, paging objects are not within the LACs under this RNC and local paging messages fail to be received, which results in low paging success rates. Closed actions and result evaluation Solve this problem by using the following solution: Delete LACs not used by the RNC on the MSC. After the invalid LACs are deleted, paging success rates on the RNC side are gradually normal and basically equal to those on the CN side.
Issue 01 (2012-12-05)
32