Professional Documents
Culture Documents
4/15/2009
Contents
Introduction
Terminology
References
Link Aggregation Types
Topologies
Direct Connect
Private Network
Local Network
Remote Network
Data Domain Link Aggregation and Failover
Bond Functions Available in Linux Distribution
Hash Methods Used
Link Failures
Other Link Aggregation
Cisco
Sun
Windows
AIX
HPUX
Data Domain Link Aggregation and Failover in the Customers Environment
Normal Link Aggregation
Failover of NICs
Failover Associated with Link Aggregation
Recommended Link Aggregation
Switch Information
Introduction
This document describes the use of link aggregation and failover techniques to maximize throughput on
networks with Data Domain systems installed. The basic topologies are described with notes on the usefulness
of different aggregation methods, so the right method can be chosen for the site.
The goal of Link Aggregation is to evenly split the network traffic across all the links or ports that are in the
aggregation group. This is done to maximize the network throughput on the LAN or LANs until the maximum
computer speed is encountered. Normally the aggregation is between the local system and the network device
or system that it is connected. Normally a system is connected to a switch or router. In theory aggregation
allows the system to send data on both links at the same time and therefore can get up to double the
throughput.
There are a few things that can impact how well the aggregation actually performs.
1. Speed of the switch
2.
3.
4.
5.
6.
7.
For impact 1, normally the switch can handle the speed of each link that is connected to it, but it may lose some
packets all the packets coming from several ports are concentrated on one uplink all running at maximum
speed. Note: this implies that only one switch can be used for port aggregation coming out of a system. For
most of the implementations this is true, but there are some network topologies that allow for link aggregation
across multiple switches.
Impact 2 addresses the DDR systems. DDR systems and programs processing rate is limited. As the hardware
gets faster and the use of parallel processing improves DDR systems will support a higher network throughput,
but as the processing speed increases the network link speed will also increase. For example, with the current
systems it makes sense to aggregate 1 GbE links but not 10 GbE links because one 10 GbE can provide
enough data to saturate the processing power of the current DDR systems. As the system speed improves it
will make sense to aggregate 10 GbE links.
Impact 3 addresses the inherent overhead of the network programs. This overhead will guarantee that the
transfer speed will never reach 100%. The throughput will always be reduced by the overhead it takes to create
and send a packet of data through the system until it is put onto the wire. There is an inherent delay separating
the sending of packets on Ethernet.
Impact 4 deals with the case that the packets may be out of order. The network program will need to coalesce
out of order packets into the original order. If the link aggregation mode allows the packets to be sent out of
order and the protocol requires that they be put back into the original order this added overhead may impact the
throughput speed to where the specific mode of link aggregation that causes out of order packets should not be
used.
5. Can a single client drive data fast enough to fully utilitize multiple aggregated links? In most cases, either the
physical or OS resources cannot drive data at multiple Gbps. Also, due to hashing limitations, multiple clients
would be required to push data at those speeds.
6. The number of streams, which translates to separate connections, can play a significant role in link utilization
depending on the hashing that is used.
A final impact deals with the effectiveness of the aggregation method used. If two systems are connected
together by direct connect cables, the use of Layer 2 (MAC) hashing or Layer 3 (IP) hashing would not provide
any aggregation at all. All the packets would go over the same link. In general the number of systems that will
be communicating with the Data Domain system will be small. So the aggregation method used will need to
work for a limited number of target systems.
The number of links that are aggregated will depend on the switch performance, the DDR system and
application performance and the link aggregation mode used.
Terminology:
DDR
Data Domain appliance, a Linux system used to perform only Data Domain operations.
EtherChannel
This is a term used by Cisco to define the bundling of network links as described under Ethernet
Channel. With Cisco there are three ways to form an EtherChannel: manually, automatically using
PAgP, and automatically using LACP. If it is done manually both sides have to be setup by the
administrator. If one of the protocols is used, the specific packets with the specific protocol are sent to
the other side to where the EtherChannel is setup based on the information in the packets.
Ethernet Channel
This is multiple of individual Ethernet links that is bundled into a single logical link between systems.
This provides a higher throughput than a single link does. The term used by Cisco to identify this is
EtherChannel. The actual throughput is dependent on the number of links bundled together, the
individual link speed of the individual links and the switch or router that is actually being used. If a link
within the Ethernet Channel fails the normal traffic over the failed link is sent over the remaining links
within the bundle.
LACP
Link Aggregation Control Protocol (LACP) provides a dynamic network aggregation as defined in IEEE
802.3ad standard. This is not available in DDOS 4.9 and before.
Link Aggregation
Using multiple Ethernet network cables or ports in parallel, Link Aggregation increases the link speed
beyond the limits of any one single cable or port. Link aggregation is usually limited to being connected
to the same switch. Other terms used are EtherChannel (from Cisco), Trunking, Port Trunking, Port
aggregation, NIC bonding, and Load balancing. There are proprietary methods that are used, but the
main standard method is IEEE 802.3ad. Link aggregation can be used for a type of failover too.
Load Balancing
Aggregation methods used to try to distribute loads across all available links or ports.
Round Robin
Each new packet is sent to the least busy link or port. This usually means that packets are not sent to
the first link or port until packets have been sent to all the other links or ports, but it may take into
account the packet size in the distribution of packets.
RSTP
Rapid Spanning Tree Protocol, IEEE 802.1W, allows a network topology with bridges to provide
redundant paths. This allows for failover of network traffic among systems. This is an extension to the
spanning tree protocol (STP). The two names are used inter-changeably.
TOE
TCP Offload Engine Network cards (NIC) that have the full TCP/IP stack on the card.
Trunking
Trunking is the use of multiple communication links to provide an aggregated data transfer among
systems. For computers this may be referred to as port trunking to distinguish from other types of
trunking such as frequencies sharing.
Note: Cisco uses the term trunking to refer to VLAN tagging not link aggregation, whereas other
vendors use this term in reference to link aggregation.
References:
Catalyst 4500 Series Switch Cisco IOS Software Configuration Guide (also used for the 4900 Series Switch too)
Release 12.2(44)SG, available from the Cisco Documentation site.
Cisco Documentation, http://www.cisco.com/univercd/home/home.htm
IEEE 802.3 Standard http://standards.ieee.org/getieee802/802.3.html
Also available under: http://iweb.datadomain.com/eweb/technical_library/Vendor/Cisco/
IEEE 802.3ad Standard is Clause 43 under IEEE802_3-sec3.pdf of the standards documents listed.
Linux distribution documentation, http://www.kernel.org/
Linux Ethernet Bonding Driver HOWTO, http://www.cyberciti.biz/howto/question/static/linux-ethernet-bondingdriver-howto.php, http://www.cyberciti.biz/tips/linux-bond-or-team-multiple-network-interfaces-nic-into-singleinterface.html
Linux Ethernet Bonding Driver HOWTO: http://www.cyberciti.biz/howto/question/static/linux-ethernet-bondingdriver-howto.php, http://www.cyberciti.biz/tips/linux-bond-or-team-multiple-network-interfaces-nic-into-singleinterface.html
Wikipedia, http://en.wikipedia.org/wiki/Main_Page
Various links on the web as noted within the document by hotlinks
Each part of this information will have an impact on the type of Link aggregation that is used.
Consider what systems will be doing the link aggregation. Normally link aggregation configuration requires
coordination from both the DDR system and the switch. Another type of link aggregation configuration can be
handled from the DDR system only (both transmit and receive). There is at least one network topology where a
switch may not be part of the configuration, i.e. direct connect. This will need the link aggregation to be
configured between the DDR and the Media Servers. If the DDR is on the local network and is communicating
with many systems then using Layer 2 (MAC address) could be acceptable. If connection path goes through a
router/gateway then layer 3 (IP address only) or Layer 3+4 (IP address and the port number) may be needed.
The link aggregation will need to address the use of different speed links, for example: using both 1 GbE and
10GbE. The 10 GbE TOE cards may have aggregation on the cards and not support aggregation off the card.
Most aggregation methods do not support links running at different speeds so it should be avoided..
There is also the question of the use of fail-over. Failover can be considered to be part of aggregation. Most
link aggregation modes include an failover component by allowing data transfer to continue in a degraded state.
For example, one of the links goes down the link aggregation can recognize this and drop that link from the
aggregation list and continue with one less link. The customer may feel full failover is more important than link
aggregation. Instead of aggregating over multiple links, these links can be configured in full failover mode
where idle spares that carries no data would be setup until the active link fails. This way there would be no
degradation of throughput if the one link fails and data is sent over the other. One or more would be kept in a
standby mode until it is needed.
Administration network interface is also needed with DDRs. For direct connections and one to one server
connections there is a separate Ethernet interface for this, but this could also be part of the link aggregation
unless there is a physical separation needed between the links.
Topologies
The basic types of network topologies are described below, along with their differing suitability for various types
of aggregation methods.
Direct Connect
The Data Domain system is directly connected to one or more backup servers. To be able to provide link
aggregation within this topology will require multiple links between each backup server and the Data Domain
system. Usually link aggregation is not done with this topology, especially with multiple backup servers,
because of the limited number of links available on the Data Domain system.
Data Domain
Network switch
Backup/media server
Business Servers
Tape Library
Private Network
This topology is the same as the direct connect except the connections are through a switch rather than being
directly connected. This would normally be used to connect multiple media server to multiple DDRs. The link
aggregation would be between a DDR and the switch or between a media server and the switch. The
aggregation would be to get the data to and from the switch. In this case the aggregation between the DDR and
the switch would be independent of the aggregation used between the media server and the switch. Note:
there is a possible special case where the switch would be only a pass through and would be transparent to the
aggregation. That would not be the norm and is discussed in further detail later.
Data Domain
Network switch
Backup/media server
Business Servers
Tape Library
Local Network
The Data Domain system is connected to the backup server through a common switch. In the previous network
topologies shown the Data Domain system may have a connection through the common switch to handle
administration and maintenance tasks which need not be part of the aggregation. In this example the data is
also being sent through the shared network.
Network switch
Backup/media servers
Business Servers
Tape Library
Remote Network
This is similar to the local network except that connection is through a router before it gets to the media server.
There will normally be switch in between the DDR and the router unless the router also provides switch
functionality. What is important to note in this diagram is that there is a gateway function that is involved in the
network data flow. It is important to maximize the data throughput between the DDR and the media servers. So
normally the DDR will be located on the same LAN and use the same switch as the media server. There may
be cases that with multiple media servers some of them may be on separate VLANs. The DDR would need to
go through at least one gateway to get to them. It is not expected that the remote network will go across a WAN.
Across a WAN topology is likely to be the case for DDR with replication. Normally the data flow in replication is
low enough where it does not need aggregation and also the WAN would tend to make aggregation ineffective.
Yet there has been one customer that has asked about it and may be pursuing it.
Data Domain
Network router
Network router
Tape Library
Business Servers
Backup/media servers
The balanced-xor aggregation is selected by choosing the specific hash that is supported:
Layer 2 or
Layer 3+4.
There are four virtual interfaces that can be used to define the aggregation or failover:
veth0,
veth1,
veth2,
veth3.
Any of the physical links that are available on the system can be included: eth0, eth1, eth2, eth3, etc. The onboard links (eht0 and eth1) have only recently been allowed to be added to the aggregation group. Older
installations of the Data Domain software may not allow those two links to be aggregated.
To specify aggregation of eth2 and eth3 in the virtual interface veth0 one of the following commands would be
used:
Net aggregate add veth0 mode roundrobin interfaces eth2 eth3
The first network packet sent to veth0 will be forwarded to one of the interfaces and the next packet would
be forwarded to the other. Sending of packets will continue to alternate between the interfaces until there
are no more packets or a link fails. If eth3 loses physical connection all packets are sent through eth2 until
the eth3 link is brought back up. To make this effective the other side of the network will also need to be
setup to do round robin. For direct connect (the only topology that is recommended for round robin) the
media server will have to be able to setup and support round robin.
Net aggregate add veth0 mode xor-L2 interfaces eth2 eth3
The aggregation used would be balanced-xor. The packets are distributed across eth2 and eth3 based on
XOR of the source and destination MAC addresses. Because there are only 2 links to be aggregated the
lowest bit is used to determine the interface to use for the packet. If the result is 0 one interface will be
chosen. If the result is 1 the other interface will be used. To get the packets to be spread across the two
links requires that data is sent to more than one destination and the MAC addresses of the destination
needs to be different in such a way that XOR results provide a different number. This means that one
address needs to be odd and the other needs to be even. If there are three links that are aggregate, the
XOR result is split 3 ways. There has to be at least two media servers there must be at least two media
servers with odd and even MAC addresses to get any aggregation at all. In general, this aggregation should
not be used with less than 4 media servers.
Net aggregate add veth0 mode xor-L3L4 interfaces eth2 eth3
The aggregation used with this command will also be balanced-xor. The packets are distributed across eth2
and eth3 based on the XOR of the source IP address, destination IP address, source port number, and the
destination port number. The result gives a number in which the lowest bit is used to determine which link
to use to send the packet. An even result will go over one and an odd result will go over the other. With
three links the result is divided by 3 with the remainder determining which interface to use. This aggregation
would be used when there are a lot of connections (there is one connection per stream) or a lot of media
servers or both. This is the mode of choice for Data Domain, but some switches do not support this type of
hashing.
Net failover add veth0 interfaces eth2 eth3
This is not aggregation but the command will group together interfaces eth2 and eth3 for failover. There is
only one failover type supported. If the active physical link goes away the data is sent to the second
physical link. The active interface is determined by which link comes up first when it is setup. This is
nondeterministic. It is dependent on several factors such as switch activity, network activity, and which
interface is brought up first when they are enabled. The active one can be determined by specifying one of
the links as primary. The primary interface will always be set as active if it is UP and RUNNING.
Mode Options
1. balance-rr or BOND_MODE_ROUNDROBIN (0)
Aggregation using Round Robin
Failover with degradation
Normally a good type to use with direct connect or something equivalent
To get full aggregation both ends of the link needs to be set up to use round robin
2. active-backup or BOND_MODE_ACTIVEBACKUP (1)
Failover method used by Data Domain
Works only when one or more standby links are in the group
There is one active and all others in the group are stanby
The active link is non-deterministic unless a primary is specified
3. balance-xor or BOND_MODE_XOR (2)
Send transmit to a specific NIC based on specified hash method being used
Default (Source MAC address XOR Destination MAC address) modulo size of aggregation
group
Note: this only aggregates transmissions.
The receive needs to be aggregated on the other end
This mode is referred to as static because of the manual setup that is needed.
some switches that do not support port number hashing. In this case mode xor-L3L4 will not work.
Consider also that the best aggregation may be to have each media server use a different link instead of
grouping them together. Consider the following example:
Assign a different IP address to each link and setup up each media server to send data to one unique IP
address on the media server. That way the throughput will approach 4 times a single link speed verses around
2.5 times if aggregation is used. This is very dependent on the expected traffic pattern from the media servers.
Link failures
A link can fail at several places. It can occur in the driver, the wire, the switch, the router, or the remote system.
For failover to work the program (this is the bonding module in the Data Domain case) must be able to
determine that a link to the other side is down. This information is normally provided by the hardware driver.
For a simple case consider a direct connect were the wire is disconnected. The driver can sense that the carrier
is down and will report this back to the bonding module. The bonding module will mark it as down and switch to
a different link. The bonding module will continue to monitor the link and when it comes back up it will mark it as
up. If the restored link is marked as the primary the data will be switched back to using that link again.
Otherwise the data flow will stay on the current link.
Note: the failover method that is currently supported is for directly attached hardware. The driver can sense
when the directly attached link is no longer functioning, but beyond that it gets a little harder. Consider the case
that there is a switch or maybe two in the middle. Can the driver determine that the connection to the remote
system has failed and therefore it needs to switch to the backup? This is possible if the switch provides a link
fault signaling similar to what is defined in IEEE 802.3ae. This is supported by the Fujitsu 10GbE switch and a
similar thing is supported by Cisco. This is rather limited network topology where the systems are directly
connected via switches and there are no other routes available. This would be an extension of the direct
connect to the media server. Currently the driver and the bonding module does not support the link fault
signaling because it is not widely available too limited of a network topology
For a more complex case consider the local network but with a switch and a router in the network path. There
are at least two distinct paths that can be followed to get to the router. Failures have to be able to be detected
on any part of each network path. For example if there is a failure at the one port on the router that the DDR link
connected via the switch, the driver would have to be able to determine that the remote link is down and mark
that link as down. In this case the switch itself would be able to switch the signal to the other path between the
switch and the router and a failover at the DDR is not needed. Once again the DDR need only determine that
there is a failure between its NIC and the switch or router to which it is attached.
There are two types of failover. One is failover to a standby. The standby is not being used until a failure
happens and the traffic is redirected to the standby link. This is a waste of resources if there is never a failover.
This is the method used by Data Domain when the bonding method failover is specified:
Net add failover veth1 interfaces eth3
Another type of failover is failover with degradation. In this method there is no standby. All the links in the
group are being used. If there is a failure the failed link is removed and the rest of the network traffic from that
link is redirected to the other links in the group. This is the failover associated with link aggregation, but it can
become complex if the bonding driver has to determine if a path to the target system no longer exists and it
needs to not send data to that link.
LACP link aggregation (IEEE 802.3ad standard). Some offer proprietary aggregation types. If they offer
aggregation they support the XOR of Layer 2 to define which packet goes to which port.
Cisco
Some of the older Cisco switches and routers only support the older proprietary protocol, PAgP. The Data
Domain system will not support this type of aggregation. Fortunately, the newer switches and routers support
the IEEE 802.3ad standard. When using Cisco switches and routers the IEEE 802.3ad should be used with
Layer 3 and 4 hashing. It may be possible in some cases to set the aggregation with PAgP to round robin, but
that is not currently supported for the DDR when connected to a switch or a router because of through put
delays from potential packet ordering issues. At high speeds with fast retransmissions out of order packets can
generate many more packets which would decrease the overall performance.
Nortel
Nortel supports an aggregation called Split Multi-Link Trunking which uses LACP_AUTO mode link aggregation
Sun
The initial version 10 Solaris and earlier models supported Sun Trunking. Later releases of Solaris 10 and
beyond support IEEE 802.3ad standard in communicating with switches. Back-to-back link aggregation is
supported in which two systems are directly connected over multiple ports. The balancing of the load can be
done with L2 (MAC address), L3 (IP address), L4 (TCP port number), or any combination of these. Note the
DDR currently only supports L2 or L3+L4. Link aggregation can run in either passive mode or active mode. At
least one side must be in active mode. The DDR always uses active mode.
Sun trunking supports round robin type of aggregation. This type of aggregation could be used if the DDR is
connected directly to a Sun system.
For more information on Sun Aggregation refer to the following:
http://docs.sun.com/app/docs/doc/816-4554/fpjvl?l=en&q=%22link+aggregation%22&a=view
For more information on Sun Trunking refer to the following:
http://docs.sun.com/source/817-3374-11/preface.html
Windows
Microsofts view of Link aggregation is that it is a switch problem or a hardware problem. So Microsoft feels that
it should be handled by the switch/router and the NIC card. There is nothing in the OS that directly supports it.
Rather if the customer wants it they should get NIC cards that support it and either have a special driver to
initiate it or use the switch to drive it. In the current documentation for their server 2008 they refer to the support
of PAgP an old proprietary Cisco aggregation protocol:
http://blogs.technet.com/winserverperformance/
They also refer to Receive-Side Scaling (RSS):
http://www.microsoft.com/whdc/device/network/NDIS_RSS.mspx
This refers to a way to allocate a program to handle packets across NIC cards which are normally tied to
specific CPUs. There are drivers from outside of Microsoft that at least provide passive IEEE 802.3ad support if
not active. Passive support means that the Windows system will respond to the to the IEEE 802.3ad protocol
packets, but it will not generate them. For direct connect this may be the only way to have a directly connected
aggregated link. The following link provides Microsofts view of servers for 2008:
http://technet2.microsoft.com/windowsserver2008/en/library/59e1e955-3159-41a1-b8fd047defcbd3f41033.mspx?mfr=true
If the Windows server is not directly connected then it is not important to the DDR system if or how Link
aggregation is provided by Windows. That would be between the windows server and the switch/router.
It is still TBD for more specific information on which NIC cards support Link aggregation.
AIX
According to an AIX and Linux administration guide AIX supports EtherChannel and IEEE 802.3ad types of link
aggregation as mentioned in the RSCT administration guide:
http://publib.boulder.ibm.com/infocenter/clresctr/vxrx/index.jsp?topic=/com.ibm.cluster.rsct.doc/rsct_aix5l53/bl5a
dm05/bl5adm0559.html
When using DDR, the round robin available through the EtherChannel can be used when directly connected.
IEEE 803.3ad can be used if Layer 4 hashing is included. If it is not directly connected then it is dependent on
the switch or router being used.
AIX uses a variant of EtherChannel for backup, referred to as EtherChannel backup. This is similar to the active
backup supported by the Linux bonding driver and does not need any handshake from the equipment connected
to the links except to have multiple links available.
HPUX
The link aggregation product is referred to as HP Auto Port Aggregation (APA). As with the Link bonding this
product also provides either a full standby failover or a degradation failover by overloading other links with in an
aggregation group. The aggregation can use Layer 2, Layer 3, and/or Layer 4 hashing for aggregating across
the links. It also supports the IEEE 802.3ad standard. A summary of the product is given here:
http://h20392.www2.hp.com/portal/swdepot/displayProductInfo.do?productNumber=J4240AA
The administration guide can be found here:
http://docs.hp.com/en/J4240-90039/index.html
According to the administration guide, direct connect server to server is supported, but round robin type of
aggregation does not seem to be. This is further brought out in figure 3-4 in the document where for direct
connect it is recommended to have many connections for load balancing to be effective. With round robin
multiple connections are not required for effective aggregation. With this understanding the HPUX systems
would not support round robin with a directly connected system
If the DDR is connected to a switch the link aggregation is between the DDR and the switch. All the links in the
aggregation bond would be directly connected to this switch and the switch would need to be setup to handle
the aggregation chosen. What type of aggregation is done on the target system which may be connected to
this same switch or a different switch is independent of what method is used by the DDR.
There is a case that there may be one or more switches between the DDR and the target system, but it is still
considered a direct connect direct to the target server. The following diagram shows the network topology of
this setup. Notice that there is a separate switch for each link and the switches do not communicate with each
other. This is important because the IP address for each link on the DDR is the same. This target server would
also have to have a similar setup with the IP at the media server being shared.
Network switch A
eth2
en5
en4
Data Domain
Appliance
eth3
Network switch B
Backup/media servers
This setup may be done to handle distances that are too long for direct connect, but the user still wants to
directly connect the two systems. In this case the aggregation handshake would be between the two end
systems. It would be expected that round robin would be used and would have to be set up on both sides.
There are some concerns with this setup as dealing with failures. If the link between the target system and the
switch goes down the local system would have to be able to detect this and send everything over the other
link(s). For example, suppose the link between switch B and en4 is broken. The media server would sense the
carrier is lost and route the traffic to en5, but the driver for eth3 on the DDR would also have to be able to sense
this and indicate a carrier down condition to the bonding module so the DDR would also route all the traffic
through eth2. With the current software and switch hardware this is not done and sense the switches are
isolated the packets would just get dropped.
Failover of NICs
The case of the pure failover is different. In this case the bonded links do not necessarily need to be connected
to the same switch or router as long as all the links in the bond can transfer data to the target system. With
failover the data is not split among the links. It is sent over one link only and that link is referred to as the active
link. A single virtual IP is shared across all links in the failover bond, but the MAC does not necessarily need to
be the same. While the active link is up the other links are idle. When a failure occurs, the DDR sends the
packets to another link and redirects the receive packets to another link through the use of ARP. To get the
receive packets to go to the active link, the ARP is turned off for all the links except the active link and a
gratuitous ARP is sent on the new active link. This sets to the ARP cache in the associated switches and
routers.
Switch information
Link aggregation is setup on both sides of a link. The link aggregation does not necessarily have to match on
both sides of the link. For example, the DDR may be set to xor-L3L4 but the switch may be set to src-ip. A
good rule of thumb to follow is to keep the aggregations close, such as xor-L3L4 on the DDR and src-dst-port on
the switch. The reason for this is that if an aggregation is good enough for one direction it is good eough for the
other direction.
Aggregation on the switch is used to distribute traffic being received by the DDR. If the main set of operations
being done is backup the switch aggregation is very important. Backup network traffic is mostly data being
received by the DDR.
Because of the limited number of clients communicating with the DDR the recommended aggregation method is
balance-xor with Layer 3+4 hashing. To support this, the device directly connected to the DDR, e.g. switch or
router (see the Normal Link Aggregation), needs to support src-dst-port or at least src-port load balancing. This
section uses the vendors documentation to provide potential switches that may work with the Layer 3+4
hashing and also some that may not. There are no plans to validate or certify these. The final authority whether
a switch supports the desired aggregation is to physically try it. For example, there is at least one case where
round robin was desired and tried and it worked satisfactory even though it is listed that it is not supported. Note
again, even though round robin may be supported by a switch the aggregation performance is poor or even
worst then not having it. This is mostly due to the out of order packets.
Note: There are few switches that supports layer 3 + 4 aggregation. The supported aggregation may be for
layer 3 only or layer 4 only. Matching layer 4 (port aggregation) with layer 3 + 4 (IP address and port
aggregation) is not a problem, but be aware that it may cause data to be sent on one link and received on a
different link, but the concern of out of order packets shold not occur. Which link the data is sent on is not
important as long as all the data associated with a connection is sent on the same link.
Definitions:
Dest := Destination
IP := IP address
L4 := Layer 4 of the network stack, i.e. TCP
MAC := mac or hardware address
Port := TCP port number
Src := Source
SW := software
Switch brand
& model
Switch
Vendor SW
Release
Src
MAC
Dest
MAC
SrcDest
MAC
Src
IP
Dest
IP
SrcDest
IP
Src
L4
Port
Dest
L4
Port
SrcDest
L4
Port
Round
Robin
Cisco Catalyst
6500 CatOS
Cisco Catalyst
6500 IOS
Cisco Catalyst
3560
Cisco Catalyst
2960
Cisco Catalyst
3750
Cisco Catalyst
4500/4948/4924
8.6
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
No
12.2SXF
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
No
12.2(44)SE
Yes
Yes
Yes
Yes
Yes
Yes
No
No
No
No
12.2(44)SE
Yes
Yes
Yes
Yes
Yes
Yes
No
No
No
No
12.2(44)SE
Yes
Yes
Yes
Yes
Yes
Yes
No
No
No
No
12.2(37)SG
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
No
For directly connected systems the support for round robin is as follows:
Sun - yes
AIX - yes, it can
HPUX - no
Windows maybe, it depends on the NIC software, but dont count on it.
Cisco Configuration
Set the etherchannel mode to on:
Manually set the ports to participate in the channel group
DDR Configuration
xor-l3l4
xor-l2
Appendix
This appendix gives more details as to what aggregation is normally offered by the Linux system being used by
Data Domain. The other options are not made available because they do not provide better aggregation or
failover than what is already available. It is expected that is section will be used by developers.
Data domain uses the link aggregation and failover provided by the bonding module available in the Linux
distribution. The bonding module was developed separate from Linux OS, but is now provided with each
distribution under drivers/net/bonding. For each mode used on the system a separate bonding module is
loaded. To do this each bonding module instance is tied to a specific virtual interface. The names used by Data
Domain are: veth0, veth1, veth2, veth3. You can see these along with all the physical interfaces available and
what the aggregation or failover is being used by using the command:
net show settings
Mode Options
1. balance-rr or BOND_MODE_ROUNDROBIN (0)
Aggregation using Round Robin
Failover with degradation
Normally a good type to use with direct connect or something equivalent
To get full aggregation both ends of the link needs to be set up to use round robin
2. active-backup or BOND_MODE_ACTIVEBACKUP (1)
Failover method used by Data Domain
Works only when one or more standby links are in the group
The active link is non-deterministic unless a primary is specified
3. balance-xor or BOND_MODE_XOR (2))
Note: this only aggregates transmissions. The receive needs to be aggregated on the other end
Send transmit to a specific NIC based on specified hash method being used
Default (Source MAC address XOR Destination MAC address) modulo size of aggregation
group
This mode is used when mode XOR is specified in the CLI
4. broadcast or BOND_MODE_BROADCAST (3)
Failover send everything on all links in group
This mode is not available when using the Data Domain shell CLI
5. 802.3ad or BOND_MODE_8023AD (4)
IEEE dynamic link aggregation using the same hash as used by mode 3
The aggregation is determined by the hash method chosen, layer 2 is the default
Requires the driver to support ethtool
Requires the switch to support the IEEE 802.3ad standard, specifically the protocol
Requires the same IP address and MAC address across all the slaves
All the slaves must run at the same speed and be connected to the same switch
This mode is not available when using the Data Domain shell CLI
6. balance-tlb or BOND_MODE_TLB (5)
Aggregation and failover, does not require switch support, but deals with transmit only.
Aggregated according to load of each link and the link speed.
Different speed links can be used.
7.
The MAC addresses across the links do not have to be the same.
Matching MACs may be required for receive aggregation, but it is not tied to this.
The associated drivers must support ethtool interface
This is not currently supported by the Data Domain shell CLI.