The Problem.
Communications service providers (CSPs), network equipment providers (NEPs) and telecom-focused independent software vendors (ISVs) host and operate dedicated network labs to validate new services, evaluate and test the readiness of new technologies and solutions and integrate vendor solutions and devices. Network labs are constantly evolving, driven by network transformation, 5G, cloud and edge computing technologies.
To remain competitive in such a dynamic environment, labs need to be more efficient, deliver services faster and take advantage of automation, all while managing costs. Altran internal research has identified lab operations and management as a critical area of focus for CSPs, NEPs and ISVs for five reasons:
- Challenges managing both short- and long-term costs and resources. For example, higher OPEX costs are driven primarily by manual tasks, power and cooling costs for unused equipment. Higher CAPEX costs are driven by improper capacity planning, often resulting in unused or idle equipment, tools, associated spares and physical cabling.
- Loss of both staff and equipment productivity due to manual test environment setup and inefficient inventory tracking. For example, most of our customers still use a manual approach for test environment setup and teardown tasks. In one of our customer’s network labs, the lab team spent an average of 45 to 60 minutes for a manual test environment setup, with an overall planned test execution cycle of 4 to 5 hours inclusive. This translated to an estimated 20% loss in productivity.
- Labs suffer unnecessary downtime due to reactive maintenance and incomplete condition monitoring. For example, our findings across multiple labs regarding root cause analysis include such causes as insufficient power backup systems, overloaded air conditioning units, improper cabling and lack of load capacity planning. All these factors lead to system and equipment shutdowns, which affect overall productivity.
- Inefficient asset and inventory tracking puts a strain on productivity and results in cost overruns. For example, one CSP customer had multiple collocated labs with no asset tagging where assets were freely moved across labs with no physical checks or tracking. This resulted in a significant amount of time being spent on tracking down lab equipment and tools that further delayed the test environment setup and overall execution cycle.
- Non-optimized utilization of equipment and tools. Improper capacity planning, untagged assets, lack of reservation and unplanned test setup are just some of the factors leading to lab utilization issues.
To address these pain points, companies need to increase their investment in automation, implement X-as-a-Service models, apply data analytics and leverage artificial intelligence (AI) and machine learning (ML) technologies. In the last few years, AI and ML have evolved significantly and are now being used in countless applications across many industries.
In the telecom sector, a significant number of CSPs and NEPs have been deploying AI and ML as part of network management, operations and infrastructure. Some leading CSPs have started implementing AI-based use cases, including network optimization, preventive maintenance, fraud and security, battery CAPEX optimization, trouble ticket prioritization and virtual assistant, according to a Heavy Reading report[1].
There is an opportunity for CSPs, NEPs and ISVs to utilize insights of already deployed AI use cases to improve and optimize network lab operations. For example, preventive maintenance AI use cases implemented for network cell locations can be leveraged to perform preventive maintenance in the lab environment.
Recently, Altran performed an AI-driven root cause analysis for a European Tier-1 CSP that included three different fixed networks. The goal was to identify issues in the network, describe the reason and map critical alarms and tickets for the part of the network affected and the end-users impacted. The analysis predicted over 700 issues every five minutes using an AI algorithm.
Key Use Cases
While AI and ML are gaining traction in core network operations, product development and testing, their application in the lab environment has been limited. Working with CSPs, NEPs and ISVs, Altran has identified many high-impact use cases for labs. We detail four below and will cover additional use cases in the next blog.
Lab Service Intelligence: End-to-end automation of service management lifecycle workflows and lab processes is necessary to scale user request workloads, ensure high productivity and improve user experience. Critical use cases include incident, problem, knowledge and change management. Robotic process automation helps automate repetitive tasks for auto assignment of incidents and virtual agents, such as voice-activated chatbots, that can process user requests more quickly and efficiently.
For example, with an AI/ML approach, user workload requests for the test environment can be predicted and change requests can be auto-generated and assigned to respective service teams with dynamic SLAs.
Lab Environment Monitoring: With changing workloads, dynamic test environment requirements and constant upgrades, ensuring high availability and zero downtime is essential. Proactive monitoring, along with an AI/ML approach for automated issue identification and root cause analysis, smart filtering of alarms and events, and event-based and alarm-based correlation/anomaly detection, can be predicted and addressed.
For example, analyzing event patterns generated by the power distribution unit connected to the rack and validating events against historical data helps identify potential issues and speed corrective action.
Predictive Lab Maintenance: An often overlooked area in lab management is predictive maintenance, which reduces both the impact and cost of downtime. Using AI and ML allows operations teams to analyze maintenance issues over time, enable just-in-time maintenance, schedule inspections more effectively and plan repairs and replacements at the optimal time.
For example, applying an AI/ML model to analyze the historical data, such as total uptime and downtime of a UPS backup system, and the occurrence and fixes of historical issues, faults and event data would help predict the occurrence of problems and optimize maintenance planning.
Lab Incident Management: Proactive identification and prevention of incidents associated with problem resolution is an important area where AI and ML can substantially improve efficiency, user experience and system reliability. With actionable insights into historical data generated by human intervention during the handling of incidents, associated problem resolution, SLA/KPIs and other areas, deep-learning models can be trained to take appropriate actions based on reported incidents.
For example, a router or switch that continuously generates alarms would generate automated incidents that trigger the AI/ML model to generate a response for the operations team to fix the issue proactively. A collaborative human and ML approach can also be used to speed up repair while the operations team manages other tasks.
“While AI and ML are gaining traction in core network operations, product development and testing, their application in the lab environment has been limited.”
Maximizing the value of AI/ML technologies requires access to large data sets—the more historical data, the better—and implementation of the right infrastructure. Altran provides powerful AI/ML software frameworks for CSP, NEP and ISV customers that deliver tangible, real-world value. Altran believes AI/ML use-case implementation in the lab is the key to cost and resource optimization and lower financial risk.
The first step is to perform an initial gap analysis that identifies areas for improvement and optimization, maps relevant AI/ML use cases and defines the roadmap. Conducting a proof-of-concept exercise for selective use cases is a good starting point to gain confidence and showcase the benefits.
Altran Solutions
Altran offers four AI/ML solutions to help telecommunications companies improve efficiency and reduce CAPEX and OPEX.
TANTEM: The TANTEM framework provides an end-to-end solution for CSPs and NEPs to deliver test-as-a-service to local and remote users by automatically building and deploying test environments and test automation. Virtual environments can be generated as needed and do not need to remain in place after tests have been run. TANTEM uses AI/ML test analytics for continuous optimization of the testing ecosystem.
ATLAS: The ATLAS framework is an intelligent testing solution that uses AI/ML techniques to prioritize test cases, select test cases based on code changes and perform automated root cause analysis to reduce the time needed for regression testing and speed up software delivery while improving software quality.
AVERT: The AVERT security software framework future-proofs your business with state-of-the-art security. The shift to software-defined networking and virtualization, the increased use of mobile devices and other factors demand a new network security approach to manage the ever-evolving cyber-threat landscape.
NetAnticipate: NetAnticipate is the ultimate network AI platform that enables the artificial intelligence software security of tomorrow. It is an intent-based prescriptive AI platform to realize self-learning networks for zero human touch network operation that predicts network anomalies and takes preventive measures in real-time using a cognitive feedback loop.
Connect with an Altran expert for more information about the advantages of a cloud-native architecture for 5G implementations.
RELATED ARTICLES:
- How AI and ML can boost network lab performance
- Altran Lab – 5G Services and Solution Innovations
- 5G Solutions for Next-Generation Carrier Networks
[1] Crawshaw, James “AI in Telecom Operations: Opportunities and Obstacles”, Heavy Reading, Sep. 2018“.