Comparing Ethernet and RapidIO
| By |
|
The amount of bandwidth available in embedded applications has grown as multigigabit interfaces become more commonplace. In embedded systems, two of the most common interconnects to take advantage of this bandwidth explosion are Ethernet and RapidIO. Here Barry compares the advantages offered by each technology in the context of chip-to-chip, board-to-board, backplane, and inter-chassis connections up to 100 meters long. These connections are commonly found in applications that have adopted RapidIO, such as wireless baseband signal processing, military compute platforms, video processing, and industrial control. They are also found in applications that have adopted other interconnects, such as data centers.
RapidIO, similar to Ethernet, exchanges packets over a variety of media. RapidIO packets have a maximum payload of 256 bytes and a maximum overall packet size of 276 bytes. The most efficient RapidIO packet formats transfer 256 bytes of data using 12 bytes of overhead for a protocol efficiency of 95 percent. RapidIO supports messaging and read/write semantics, enabling control and data plane operations to use the same physical interconnect – a feature that improves efficiency and simplifies system design. Designed from the ground up to minimize latency, RapidIO technology has been optimized to be implemented in hardware.
RapidIO links guarantee the delivery of packets, even in the presence of congestion and transmission errors, through simple flow control and error-recovery mechanisms. These mechanisms are implemented with short, link-specific quantities called “control symbols.” Most control symbols can be embedded within packets, ensuring that the control loop latency for reliable packet exchange is minimized.
The RapidIO Trade Association released the RapidIO Gen2 specification in 2008. RapidIO Gen2 doubled the speed of the RapidIO links to 6.25 gigabaud (Gbaud) and added support for eight new virtual channels to its physical layer. A four-lane port operating at 20 gigabits per second (Gbps) is the fastest common port in the RapidIO Gen2 ecosystem. RapidIO Gen2 enhanced the logical layer with support for rate- and credit-based flow control for data streaming packets and installed a Virtual Output Queue backpressure mechanism.
RapidIO technology dominates in telecom, especially wireless baseband signal processing. Additionally, RapidIO has significant market share in military, video processing, and industrial control applications. The large and growing RapidIO ecosystem includes a number of switch vendors (IDT, PMC-Sierra, and Mercury, for example) whose products clock latency at around 100 nanoseconds (nsec). We’ll look more closely at the value of those small latency figures later in this article.
Ethernet technology overview
Ethernet links support transmission of packets up to 9 kilobytes (KB) in size, though the most common packet has a payload of slightly more than 1500 bytes. The minimum packet size in Ethernet is 64 bytes. Ethernet has traditionally been a “lossy” medium, requiring software support of protocols, such as Transmission Control Protocol (TCP), to provide reliable transport services. Even though Ethernet packets support a much larger payload compared to RapidIO, their maximum protocol efficiency is only slightly more than 97 percent due to the large headers required for the software-based reliable delivery protocols. The “PAUSE” packet flow control mechanism has been the primary means of managing congestion between Ethernet terminals. Ethernet’s usual flow control mechanisms have been network layer protocols that control transmission rates based on the detection of packet loss, as this is the most effective way to deliver reasonable throughput in a planet-wide network.
Ethernet is currently working to specify 40 Gbps and 100 Gbps links. This work builds on the existing 10 Gbps, single-lane Ethernet standards.
Recently, the concept of “lossless Ethernet” has been publicized. Lossless Ethernet is aimed at storage area networks, specifically Fibre Channel over Ethernet (FCoE), so it is also known as “Data Center Ethernet” (DCE). Lossless Ethernet encompasses several enhancements currently under development, including IEEE 802.1p Priority-Based Flow Control (PFC) and 802.1au Congestion Notification. These enhancements leverage the packet priority information specified in established protocols, such as Multi-Protocol Label Switching (MPLS) and IEEE 802.1Q VLAN.
Lossless Ethernet reduces packet loss due to congestion in the Ethernet network by establishing link-level flow control. This drastically increases the efficiency of the overall network since the amount of packet discarding is greatly diminished. Lossless Ethernet under congestion can perform in a manner similar to RapidIO.
Although the Ethernet ecosystem is much larger than that of RapidIO, Ethernet still has just three suppliers of switch devices – Marvell, Fulcrum, and Broadcom. These manufacturers have a wide range of products for the Internet. In support of the Lossless Ethernet initiative, these vendors have announced switches with sub-microsecond latency.
System designers who find the capabilities of DCE attractive can also consider the established RapidIO ecosystem as a source of components. Similarly, the new capabilities of DCE have allowed system designers to consider it as an alternative to RapidIO in some applications.
Logical layer comparison
Table 1 shows a comparison of RapidIO and Ethernet technology and was adapted from the “System Interconnect Fabrics: Ethernet Versus RapidIO Technology” whitepaper, available from www.rapidio.org.
|
|
| Table 1: RapidIO versus Ethernet Comparison |
Packet format comparison
Although it is obvious that the DCE header is much larger than a RapidIO header, what may not be apparent is the growing number of similarities between the two packet formats.
|
|
| Figure 1: Data Center Ethernet and RapidIO Packet Formats |
Traditional “Internet” Ethernet packets do not include the Multiprotocol Label Switching (MPLS) or Virtual Local Area Network (VLAN) headers, though many modern LANs and WANs make use of both. Internet Ethernet packet routing is based on the IP address found in the TCP/IP header. However, IP routing is compute-intensive, so the simpler 20-bit MPLS label, in conjunction with the 12-bit VLAN Identifier, can be used for routing purposes. It costs much less to implement MPLS and VLAN routing operations in hardware than in IP lookups. Implementation of MPLS and VLAN routing operations occurs using indexing into a table, thus avoiding the multiple sequential comparison operations IP requires.
Table indexing is the only mechanism a RapidIO switch needs for operating a routing table. The routing quantity for RapidIO, called a device ID, can be either 8 or 16 bits in size. Many switches in the RapidIO ecosystem have one routing table per port. Per port routing tables give RapidIO switches capabilities similar to VLAN.
The largest difference between the packet formats is their maximum size. Common Internet Ethernet packets are about 1500 bytes in size. Jumbo packets can be more than 9000 bytes. The largest RapidIO packet, however, is only 276 bytes. Despite the differences in packet size, overall protocol efficiency is approximately the same: Internet Ethernet is around 97 percent while RapidIO is approximately 95 percent.
As shown in Table 1, Ethernet and RapidIO support similar capabilities. Both support read/write and messaging semantics, though RapidIO supports read/write semantics much more efficiently than existing Ethernet-based protocols. Messaging capabilities enable both technologies to support encapsulation of other protocols. For example, RapidIO has specified a standard method for encapsulating Ethernet packets over a RapidIO fabric using Message and/or Data Streaming packet types.
Lossless?
RapidIO guarantees packet delivery on each link. In the case of a 100m 10 Gbps fiber connection, recovery can be accomplished with the exchange of three control symbols, which can occur in approximately 2.5 microseconds. For a chip-to-chip link, the exchange can be completed in less than 300 nsec.
Lossless Ethernet is a partial misnomer, because packets can still be lost due to transmission error. Data Center Ethernet, therefore, continues to require offload engines and/or software stacks, which add latency to message delivery. For example, the Fibre Chanel protocol enables error recovery in FCoE. Error recovery, therefore, has much higher latency than RapidIO. This high latency has significant impact for system resource usage and performance.
Two Ethernet markets
Making a large change in direction, the Ethernet community effectively split the Ethernet market into Internet and Data Center halves. In the context of the planet-spanning network called the Internet, network-layer flow control, rather than link-level flow control, achieves higher throughput. Latencies in the planet-spanning Internet Ethernet half, and the complexity and unpredictability of the network topology, make network-level flow control the most effective type of flow control mechanism. The vast majority of Ethernet technology is Internet Ethernet.
Internet Ethernet technology depends on significant processing power in the form of switch and router platforms. Within switch and router platforms, Network Processor Units (NPUs) shape traffic to meet Service Level Agreements (SLAs) and implement network protocols, such as MPLS. Figure 2 shows a typical Internet Ethernet switch box, which can also be a router.
|
|
|
Figure 2: Internet Router Block Diagram (click graphic to zoom by 1.9x) |
The Internet router consists of a number of line cards, each connected to a redundant switch fabric. Local control plane processors are in charge of line card behavior, with each local processor connected to the master control plane processor. Each line card contains an NPU, which is responsible for classifying and scheduling Ethernet packets as they are received to enforce security, meter traffic for service-level agreements, and shape traffic to optimize Internet bandwidth. Packet headers may be modified for protocol tagging, such as in MPLS and VLAN. If a packet grows too large because of these header modifications, the packet may be split into multiple packets. Packets are then sent through the switch fabric to another line card whether or not they are transmitted. All of these functions are necessary to make throughput reliable in an Internet Ethernet-based network. Although modern NPUs can perform sophisticated analysis of the incoming packet stream at line rate, this obviously adds latency to packet transfers.
Packet transfer without loss
Note that within an Internet router or switch platform, there are a number of “short range” connections between chips. The low latency of these short-range connections can be leveraged by simple flow control mechanisms to minimize packet loss. However, even on 10 Gbps links, an Ethernet PAUSE packet is too long to deal with these short latency periods, since a PAUSE packet takes more than 60 nsec to be transmitted, not counting PHY and processing latency. It is for this reason that the leaders in Internet Ethernet switch and routers do not use Ethernet as the exclusive interconnect technology within their own platforms. Instead, interconnect standards, such as Serial Peripheral Interconnect (SPI), Interlaken, and RapidIO, incorporate low-latency flow control mechanisms to ensure efficient packet transfer without loss.
Within the data center network, network latency for control functions, such as committing file changes and confirming data transmission, can represent the real limit to system capacity. Similarly, control system latency for load balancing functions in servers can determine the utility and efficiency of the overall system.
In many data center platforms, boards are connected using switch chips, not a switch or router platform. Processors, Digital Signal Processors (DSPs), Field-Programmable Gate Arrays (FPGAs), and NPUs are connected to each other using a switch fabric made up of multiple switch chips. The switch fabric efficiently connects multiple chips, boards, and chassis with a single interconnect protocol. This approach leads to lower latencies, higher performance, less power consumption, and lower costs within the Data Center platform. These reasons, among others, led EMC to use RapidIO as the fabric for its V-Max high-performance storage products.
Many Internet Ethernet switch chips can support the functions found in the switch and router platforms. However, the lower-latency Data Center Ethernet switches do not. Devices connected directly to the Data Center Ethernet switches are therefore responsible for these functions. This represents a significant shift from the simple transmit and receive architectures that characterize most consumer Internet Ethernet technology.
As noted earlier, the advent of Lossless Ethernet signals the fracturing of the Ethernet market into Internet Ethernet and Data Center Ethernet devices. The implication is that Data Center Ethernet devices do not enjoy the same economies of scale that Internet Ethernet devices do.
Throughput, latency, and flow control
Increasing bandwidth can improve transmission latency. Ethernet and RapidIO technology have shown this as lane speeds have increased.
Transmission latency forms only part of the overall fabric latency. Overall latency includes the time taken to transmit the packet, time for the packet to be transferred through the switches of the fabric, and the time taken to receive the packet and pass it to a software entity. Flow control can affect each of these latency components.
If there is no contention for resources in a system’s fabric, then no flow control is necessary. Flow control mechanisms make decisions about which packet is sent next, as well as how to allocate system resources to different packet flows. In a system with complex and dynamic traffic patterns, flow control mechanisms resolve congestion in a manner that maximizes system performance. As previously discussed, system performance can depend on the latency of message transfer, as well as the overall throughput in the fabric. Complementary flow control mechanisms for short-, medium- and long-term congestion allow system designers to optimize the performance characteristics of their application.
Generally, a flow control mechanism should enable packets to make forward progress at the earliest opportunity. Under contention, buffers should be fully utilized to create more scheduling options to increase throughput and reduce latency. A comparison of the RapidIO and DCE flow control strategies follows.
A comprehensive flow control strategy
RapidIO implements a comprehensive flow control strategy to resolve short-, medium-, and long-term system congestion. Short-term congestion, lasting 10 µsec or less, is managed with link-level flow control. Medium-term congestion, lasting up to 200 µsec, and load balancing is managed with the Virtual Output Queue Backpressure mechanism. Long-term congestion is managed and/or avoided through the use of Virtual Channel bandwidth guarantees, as well as the rate- and credit-based flow control supported by Data Streaming packets.
RapidIO link-level flow control is implemented using short “control symbols” that can be embedded within packets. The flow control supports a “prioritized” channel with eight priorities, and up to eight more “virtual” channels, which are not prioritized. The control path latency of the control symbol is therefore minimal. In the worst case – more than a 100m optical link operating at 10 Gbps with a transmission latency of 500 nsec – the control path latency is approximately two and a half maximum-sized RapidIO packets in each direction.
RapidIO flow control mechanisms allow a transmitter to pack the receiver buffers completely full. This is important for several reasons:
· The more packets a switch chip has in its buffers, the more options it has to increase throughput by routing packets to output ports that are not congested. This should lead to maximum possible throughput with minimal latency.
· The transmitter and the receiver have agreed on how many packets of each priority/virtual channel can be transferred. The transmitter can use this information when scheduling packet transmission to optimize fabric latency performance.
· The low-latency control mechanism allows RapidIO buffer size to be minimized while guaranteeing packet delivery and achieving line rate throughput.
IEEE 802.1p priority-based flow control supports eight separate flows, the same as the RapidIO eight priorities. The PAUSE packet proposed, however, uses a timed XON/XOFF mechanism. The XON/XOFF mechanism must be activated before it is possible for the receiver to run out of buffer space, otherwise packets will be lost. The latency for acting on flow control is key to the efficient use of buffers. The amount of information that can be received between the time that a device decides that a PAUSE must be issued and the time that the receiving station acts on the pause is known as “skid.”
From Figure 3, it is clear why the ability to embed link-level flow control information within packets is such a key advantage for RapidIO. At point 1, the Receiver decides that it must turn off a particular priority. However, the yellow Max Frame (point 2) has started transmission, preventing transmission of the PAUSE frame. This allows the corresponding yellow Max Frame to be received by the Receiver. Once the PAUSE frame has been received by the Transmitter, a similar problem can occur at point 3, where a Max Frame has already started transmission. The minimum skid size that Ethernet links must account for is therefore two maximum-sized packets.
|
|
| Figure 3: Ethernet Skid Causes |
For a 10 Gbps link 100m in length, one computation of the skid length is approximately 10.1 KB (http://www.ieee802.org/1/files/public/docs2008/bb-pelissier-pfc-proposal-0508.pdf). This computation assumes a maximum packet length of 2000 bytes. If other storage-related protocols, such as FCoE with a 2.5 KB Maximum Transfer Unit (MTU) or iSCSI with a 9 KB MTU, are in use, the skid length will increase. Other Ethernet initiatives, such as Power Efficient Ethernet, may increase the skid length even more.
Skid length significance
If an Ethernet switch supports eight separate priorities, as a Gen2 RapidIO switch does, the implication is that a given priority must be XOFFed when there is enough buffer to cover the skid. For NICs and processors with gigabytes of DRAM, the skid length is not worth worrying about. However, for the embedded memories found within switches, the skid length is significant. Large skid length has important implications for an Ethernet switch designer and Data Center Ethernet users:
· In order to avoid sending PAUSE frames back to back, more than the skid length of buffer space should be available before turning on a flow. This implies that packet transfers for given flows will be bursty, resulting in uneven network behavior.
· There is no guarantee that the skid packets sent match the priority that is being turned off. Therefore, some part of the buffer reserved for skid will not contain packets, increasing latency and decreasing throughput.
· To reduce the amount of memory required for an Ethernet port, one design optimization is to group multiple priorities together for flow control purposes. This can lead to head-of-line blocking, with commensurate reductions in throughput and increases in latency.
· Scheduling algorithms for packet transmission can be rendered ineffective by flow control.
While short-term flow control has not traditionally been Ethernet’s strength, long-term flow control has been. There are many protocols designed to determine what a maximum transmission level can be. These software-based protocols generally depend on detecting packet loss when Internet Ethernet devices discard packets due to congestion. With the advent of DCE, packets are no longer discarded, rendering these long-term congestion management/transmission rate control protocols ineffective. Instead, DCE requires support for the 802.1Q Virtual LAN standard. The flow control implications of this standard are discussed later in this article.
The RapidIO Gen2 specifications have created XON/XOFF, and rate- and credit-based flow control mechanisms for flows based on Data Streaming packets. This mechanism was designed for systems with multiple senders for a single destination. The protocol allows senders to communicate how much information they have available, and for the destination to manage and schedule which senders are active and at what rate. While the flow control mechanism can be supported in software, the packet format is simple enough that hardware can support these mechanisms. When used correctly, these flow control mechanisms can avoid long-term congestion and allow applications to optimize their performance.
Edging out congestion
One approach to the problems of network congestion and flow control is to communicate the location of the network congestion. This action allows packet sources to schedule transmission of packets to uncongested areas of the network. RapidIO supports mechanisms for communicating the location of congestion in the network. RapidIO Gen2 defines a low-latency, control symbol-based mechanism, named Virtual Output Queue (VoQ) Backpressure. VoQ Backpressure defines a hierarchical method, implementable in hardware, of pushing congestion back to the edge of the network. This has two benefits:
· It slows flows that are contributing to congestion points, freeing system resources.
· It encourages transmission of other flows that are not contributing to congestion, increasing the throughput and balancing system load.
In 2010, an IEEE task group began defining a similar mechanism, known as IEEE 801.3Qau Congestion Notification for Local and Metropolitan Area Networks. It is unclear if this mechanism will be simple enough to be supported by devices that are part of the DCE ecosystem.
In many systems, some traffic is less sensitive to latency but does require bandwidth guarantees. The RapidIO Gen2 standard supports nine virtual channels, denoted VC0 through VC8. VC0 is backwards compatible with RapidIO Gen 1, supporting up to eight priorities. VC1 through VC 8 have dedicated resources and specific flow control associated with them to ensure that bandwidth guarantees can be met. Each VC has a minimum bandwidth guarantee. This ensures that packets assigned to that VC are able to make a guaranteed amount of progress over time, though latency cannot be guaranteed for VCs 1 to 8. VC0 is special in that it can optionally have as much bandwidth as it wants. In this way, VC0 traffic can have guaranteed latency, while the remaining traffic gets its fair share of the remaining bandwidth. The combination of VCs and the traffic management supported by the Data Streaming packet types can be used to engineer traffic admission to the fabric to guarantee latency.
As previously mentioned, since the software bandwidth management protocols are no longer effective, DCE requires support for the 802.1Q Virtual LAN standard, which allows the configuration of latency and bandwidth guarantees for specific VLAN traffic by assigning one of eight priority values. These priority values align with the MPLS priorities assigned. The lowest-possible latency guarantee available in the standard is 10 milliseconds for voice traffic. As previously noted, it is not certain that all DCE devices will support all eight priority values due to the memory requirements induced by skid.
Flow control in 802.1Q is much different from established Ethernet Internet protocols. The typical method for Ethernet Internet protocols to modify flow transmission rates is to increase transmission rates until packet loss is detected, and then back off transmission until packet loss is no longer detected. Although the latencies in the network mean that the flow transmission rates may require several seconds to stabilize, eventually, the optimum transmission rate is determined. 802.1Q specifies no such rate negotiation mechanism.
Conclusion
While researching this article, I came across a rant by Stuart Cheshire originally published in 1996, with the provocative title “It’s the Latency, Stupid” (http://www.stuartcheshire.org/rants/Latency.html). While his rant addresses latency in the consumer Internet Ethernet world, his key points ring true now:
• Making more bandwidth is easy.
• Making limited bandwidth go further is easy.
• Once you have bad latency, you’re stuck with it.
Mass-market Ethernet, the Ethernet of the Internet, focuses on delivering more bandwidth for less money. The latency and throughput of consumer devices has improved, thanks to the application of additional processing power and bandwidth. Although some Internet applications are dependent on latency, most consumers are more concerned about throughput. Mass-market Ethernet, therefore, continues to focus on delivering more bits for fewer dollars.
Data Center Ethernet has features that will increase throughput and improve the behavior of Internet Ethernet within the data center. However, these features have negative impacts on Ethernet chip design, and still require software and/or offload engines for transmission error recovery. The market for Data Center Ethernet is much smaller than that of Internet Ethernet, so the economies of scale that result in inexpensive Internet Ethernet technology do not apply to Data Center Ethernet.
The applications that have adopted RapidIO (telecommunications/baseband processing, military, medical imaging, industrial control) need low and predictable latency for chip-to-chip, board-to-board, backplane and box-to-box communication. The leading companies in these applications (for example, Ericsson, EMC, and Mercury) rely on RapidIO’s continued focus on low latency combined with high throughput to ensure that the performance characteristics of these applications are met. RapidIO will continue to maintain bandwidth parity with other interconnects, while remaining the lowest latency, most efficient fabric available for chip-to-chip, board-to-board, backplane and up to 100m cable/optical connections.
Barry Wood is an Expert Applications Engineer with IDT Canada. Barry joined IDT in 2001 after spending 11 years with Nortel Networks. Barry has extensive experience in fault tolerant systems design, software development, and hardware/software integration. Barry is currently the Chair of the RapidIO Technical Working Group.
IDT




