The Magazine for Developers of Open Communication, Industrial, and Rugged Systems
ARTICLES
PRODUCTS
NEWSWIRE
VENDORS
E-LETTER
E-CAST SCHEDULE
 

High availability

Make my platform continuously available

By
GoAhead Software

While acknowledging that standardization has come a long way, Dr. Naseem points out there are still hurdles to overcome to achieve successful integration and testing of a system whose primary ingredients are Commerical Off-the-Shelf (COTS) building blocks.

Standardization of various building blocks - hardware, operating system, middleware, and application services - is greatly facilitating the ability for telecom systems developers to put together carrier-grade, application-ready platforms using COTS components. The emergence of multiple COTS suppliers for each of the building blocks is helping create a viable and vibrant ecosystem that provides a compelling alternative for the Telecom Equipment Manufacturers (TEMs) who have traditionally developed such vertically integrated systems in-house using proprietary, often expensive, technologies. A slow but steady shift has been occurring in the telecommunications industry toward creating application-ready platforms using best-of-class COTS building blocks and focusing TEMs’ increasingly shrinking, precious resources on their core expertise, that is, revenue-generating applications and services.

Commercial adoption of standards and requirements published by industry consortia such as PICMG, the Linux Foundation, and the Service Availability Forum (SA Forum) addresses the key layers of functionality in a standards-based, application-ready platform and is helping to catalyze such a shift. In addition, the proliferation of commercial building blocks that address a variety of other important areas such as protocol stacks, configuration management, and element and network management, minimizes - if not eliminates - the need for in-house development of such functionality.

While all this is encouraging, significant challenges still remain in the area of integration and testing of a system that is primarily put together using COTS building blocks. Often TEMs have to assume the system integration responsibility, which in turn results in significant effort on their part to integrate components they have little or no involvement in developing. This not only increases their R&D costs, but also lengthens their time to revenue. So, increasingly TEMs are looking for someone among their suppliers to take the responsibility of assuring the delivery of an integrated and tested vertical stack. In other words, TEMs would prefer a single point of contact for the delivery and support of a carrier-grade, application-ready platform. The industry is recognizing this need, and key alliances are being formed to address this challenge.

GoAhead Software and Continuous Computing have joined hands to put together a unique and compelling system that integrates their respective hardware and software components into a carrier-grade, application-ready platform that can be readily employed in a variety of network applications (more on that later). Let us first look at various building blocks and functionality required in delivering such a system.

HA: Seven ingredients

Consider the anatomy of a highly available and manageable application-ready platform (Figure 1). The wide-angle picture reveals seven categories of services needed to build such a system.

Figure1
Figure 1
(click graphic to zoom by 1.9x)

Availability management services

The centerpiece of any high availability middleware is its availability manage-ment services. Commercially available service availability middleware products have begun to support the Availability Management Framework (AMF) as de- fined by the SA Forum Application Interface Specification (AIS), including the full set of AMF logical entities: AMF state model, component life-cycle management; automated error recovery/repair actions; and administrative opera-tions for AMF logical entities.

Systems management services

Systems management services enable the creation of both external and internal management functionality. External management includes such things as configuration management; performance; and Operations, Administration, Main-tenance, and Provisioning (OAM&P). Internal management includes a system configuration repository that can be used by other services and the application, X.73x style notification generation and reception, system-level logging, and alarm management services.

Management interfaces

Management interfaces are used to pro-vision, monitor, manage, troubleshoot, and upgrade the entire system. Ecosystem choices vary, ranging from loose inte-gration, where one interface is used for each layer of the system stack, to tight integration, where a single, Unified Management Interface (UMI) is used to manage all stack layers from one source. From an operator standpoint, the UMI approach is preferable in that it allows one common portal through which to access all the elements of the system while eliminating the need for multiple portals that must be compared and synchronized constantly.

Application services

Application services are targeted pri-marily at developers to simplify the de- velopment of applications for highly available systems. Functionality provided includes communication services (Event and Message) for high-speed intra-node and inter-node communication between applications, cluster node membership notifications, and services that provide application checkpoint facilities between associated applications distributed across a system. In COTS middleware, most of these services are based on the related SA Forum AIS service definitions.

Platform management services

Platform management services interface with a particular hardware platform’s management capabilities (for example, those provided by the HPI services) to provide automatic hardware resource discovery and population in the availability management system model, propagation of HPI events, access to HPI sensors and controls, and management of hot swap sequences for Field Replaceable Units (FRUs).

Kernel and foundational services

Kernel and foundational services provide a variety of functionality that system developers can utilize to build highly available systems. Features include embedded scripting, embedded WebServer, a data service that can provide information about underlying operating system parameters, and a service that enables starting and running of programs using operating system-agnostic API functions. Furthermore, often a small, reliable, cross-platform foundation that helps abstract platform-specific capabilities into generic, platform-independent characteristics can be very useful.

These services must come together in a coherent way on a hardware platform that is preferably based on industry standards such as AdvancedTCA, running a standard operating system such as Carrier Grade Linux. Gluing these pieces together such that the resulting platform is network-ready is nontrivial and requires specialized expertise. Each of these sets of services must address a variety of functional needs, described as follows.

Availability management services

This set of services is the basic workhorse of a highly available system. The AMF automatically directs fault management cycle activities - fault detection, isolation, diagnosis, repair, and recovery; enables component modeling on any node within the system; manages the core behaviors driven by the system model configuration such as instantiation and life-cycle management of objects; implements various redundancy models; and implements error recovery and repair policies.

Systems management services


Information Model Management (IMM)

The Information Model Management (IMM) provides information model access to dynamically manipulate the objects. This includes creation/deletion of objects instances, administrative operations on such objects, object search using predefined criteria, and access to and management of the replication of all information model updates to the standby manager node.

Notification (NTF) service

The Notification (NTF) service provides cluster-wide reliable notifications for the numerous events occurring at any given time in the system.

Log (LOG) service

Log (LOG) service provides reliable logging services to the cluster. Logged information is high-level cluster significant, function-based information suited primarily for network or system administrators, or automated tools to review current and historical logged information.

These services must include various Management Information Bases (MIBs) required to support effective management of the system.

Alarm Management Service (ALMS)

A system-wide Alarm Management Service allows detection, annunciation, and ac- knowledgement of alarms from a variety of different sources within the system. This service needs to be configurable such that predefined and custom alarm management policies can be implemented.

SNMP Agent

SNMP Agent supports the MIBs required for the system by processing incoming requests related to the supported MIBs, as well as generating traps and notifications described in the MIBs.

Management interface services

The methods by which a user may interact with the system are just as important as the system services themselves. Having a choice of Command Line Interface (CLI), SNMP, and/or XML is important for appealing to a broad base of deployment scenarios. Ideally the ability to provision, monitor, manage, troubleshoot, and upgrade the entire system exists through a single UMI rather than a collection of separate interfaces for separate functions or stack layers.

Application services

Cluster membership service (CLM) is used to determine when cluster nodes have entered or left the cluster. When a node leaves the cluster, the applications could use the node left notification as a trigger to clean up any switching configuration related to the applications that are no longer available due to the node exiting the cluster.

Checkpoint service (CKPT) is used to checkpoint and preserve the state of the application between the active and standby nodes.

Event service (EVT) is used to send and receive management requests/responses to and from the various nodes within the system. The event service can also be used to checkpoint the state of the relevant ap- plications between the nodes.

Message service (MSG) is used to send and receive management requests/responses to and from various nodes, where incoming management requests would be retained in a designated message queue even if the application(s) are currently unavailable to receive the request.

Management datastore provides an ef- ficient, small footprint and performance optimized in-memory data repository for management information.

A virtual IP capability allows the external world to communicate with the active resources within the system without having to know their physical addresses.

Platform management services

Platform Resource Management Service (PRMS) represents the state of the system’s hardware resources - blades, shelf managers, switches, power supplies, and fans - in the AMF system model and propagates hardware events by way of the NTF service. PRMS also facilitates the implementation of custom hot swap and alarm management policies for the system. PRMS relies on a HPI client library to access the HPI APIs. This service is required so that the system OA&M functions can receive hardware resource related events and to leverage the representation of the hardware resources in the system model to drive the availability management policies for the system.

Hot Swap Management Service (HSMS) implements default and custom policies governing the insertion and extraction sequences for FRU units in the system.

Kernel and foundational services

A variety of key services are required to facilitate the design and integration of various building blocks to construct an application-ready platform. Some key examples include a distributed message engine (DMS) used as the infrastructure for both intra-node and inter-node communication, as well as a Cluster Management Service (CMS) that enables functionality such as auto-node discovery, health monitoring, custom cluster policies, and virtual IP address management. Other services include a development and operational console, memory management, a loader, and the like.

A COTS-based, vertically integrated system

GoAhead Software and Continuous Computing have worked together and put in significant effort to build an application-ready platform for TEMs who must ensure carrier-grade - 99.999% and higher - service availability. This integration brings together Continuous Computing’s carrier-class FlexTCA system and GoAhead’s SAFfire software platform that provides SA Forum compliant high availability middleware services. The FlexTCA system targets a variety of demanding network ap- plications including 10 GbE Deep Packet Inspection (DPI), security, and IPTV and includes fault-tolerant Trillium control and data plane protocols to make it truly application-ready.

This integration includes a sophistica-ted UMI to provision, monitor, manage, troubleshoot, and upgrade the entire system, which is based on the suite of high availability and platform management services available in GoAhead SAFfire, (Figure 2). The critical integration elements to create this platform are depicted in Figure 3.

Figure2
Figure 2
(click graphic to zoom by 1.9x)

Figure3
Figure 3
(click graphic to zoom by 1.9x)

PRMS (along with HPI), HSMS, and ALMS form key components of this integration of SAFfire within FlexTCA. PRMS has been integrated with the HPI libraries implemented in the hardware. Additionally HPI resource discovery capabilities enumerate all the resources to create an initial system model, and to dynamically update this model as the various resources’ roles and states change while different system events take place. These resources - or managed objects - can either be local to the cluster or remote, in which case they are represented by a remote adapter. A Remote Adapter Factory (RAF) is employed to maintain current state of such managed objects. A state engine - depicted as a role - establishes if a PRMS instance is active, standby, or unassigned. A Hardware Abstraction Layer (HAL) translates standard HPI calls into actions that are specific to the underlying hardware. Conversely it translates hardware events such as hot swap events into appropriate HPI notifications that are sent upstream to the management application. HAL also uses an error-handling component to identify and recover from errors that may occur during HPI calls. These error-handling policies are customizable through an XML file.

A final word

The ecosystem of COTS suppliers of hardware and software components for the telecom market is steadily maturing and becoming more vibrant. TEMs now have choices to build network gear based on best-of-breed components that are commercially available. Preintegration and testing is still a significant task that can mean the difference between success and failure for a network element put together using COTS components. TEMs, both large and small, are looking to the industry to provide integrated, tested and quality-assured systems that are field-proven - and the industry is responding. The FlexTCA system is a proof point that it is possible to put together a carrier-grade application-ready platform - using commercially available building blocks - that is preintegrated, field tested, and ready for network deployment.

Dr. Asif Naseem is President and Chief Operating Officer, GoAhead Software, Inc. in Seattle, WA. He has more than 20 years of experience in the computer and communications industry. Asif is a veteran speaker and frequent author of papers published in several technical journals and magazines. He has an MS in Electrical Engineering and a PhD in Computer Engineering from Michigan State University.

Related articles

"Core functions of the highly available application ready-platform: Don’t promise availability without them" Naseem, Asif. CompactPCI and AdvancedTCA Systems. June 2006.

"Achieving High Availability and Management through latest standard COTS technologies" - CompactPCI and AdvancedTCA Systems. February 2007.

GoAhead Software
www.goahead.com
anaseem@goahead.com

CompactPCI and AdvancedTCA News with RSS Link
Related: compactpci and advancedtca systems,high availability compactpci and advancedtca systems, high availability

©MMX CompactPCI AdvancedTCA & MicroTCA Systems. An OpenSystems Media, LLC publication.
Last updated: 07/29/10 10:01 America/Phoenix
ARTICLES   PRODUCTS   PREFERRED VENDORS   NEWSWIRE   EMBEDDED FORUM   eLETTER   SUBSCRIBE FREE >
About this Magazine and Website | Contact Us | Media Kits | Reload this page