I have the chance to participate in the current Early Shipment Program (ESP) for Power Systems, especially the software part. One of my tasks is to test a new feature called SRIOV vNIC. For those who does not know anything about SRIOV this technology is comparable to LHEA except it is based on a industry standard (and have a couple of other features). By using SRIOV adapter you can divide a physical port into what we call a Virtual Function (or a Logical Port) and map this Virtual Function to a partition. You can also set “Quality Of Service” on these Virtual Functions. At the creation you will setup the Virtual Function allowing it to take a certain percentage of the physical port. These can be very useful if you want to be sure that your production server will always have a guaranteed bandwidth instead of using a Shared Ethernet Adapter where every clients partitions are competing for the bandwidth. Customers are also using SRIOV adapters for performance purpose ; as nothing is going through the Virtual I/O Server the latency added by this action is eliminated and CPU cycles are saved on the Virtual I/O Server side (Shared Ethernet Adapter consume a lot of CPU cycles). If you are not aware of what SRIOV is I encourage you to check the IBM Redbook about it (http://www.redbooks.ibm.com/abstracts/redp5065.html?Open. Unfortunately you can’t move a partition by using Live Partition Mobility if this one have a Virtual Function assigned to it. Using vNICs allows you to use SRIOV through the Virtual I/O Servers and enable the possibility to move your partition even if you are using an SRIOV logical port. The better of two worlds : performance/qos and virtualization. Is this the end of the Shared Ethernet Adapter ?
SRIOV vNIC, what’s this ?
Before talking about the technical details it is important to understand what vNICs are. When I’m explaining this to newbies I often refer to NPIV. Imagine something similar as the NPIV but for the network part. By using SRIOV vNIC:
- A Virtual Function (SRIOV Logical Port) is created and assigned to the Virtual I/O Server.
- A vNIC adapter is created in the client partition.
- The Virtual Function and the vNIC adapter are linked (mapped) together.
- This is a one to one relationship between a Virtual Function and a vNIC (like a vfcs adapter is a one to one relationship between your vfcs and the physical fiber channel adapter).
On the image below, the vNIC lpars are the “yellow” ones, you can see here that the SRIOV adapter is divided in different Virtual Function, and some of them are mapped to the Virtual I/O Server. The relationship between the Virtual Function and the vNIC is achieved by a vnicserver (this is a special Virtual I/O Server device).
One of the major advantage of using vNIC is that you eliminate the need of the Virtual I/O Server for data flows:
- The network data flow is direct between the partition memory and the SRIOV adapter, there is no data copy passing through the Virtual I/O Server and it eliminate the CPU cost and the latency of doing that. This is achieved by LRDMA. Pretty cool !
- The vNIC will inherits the bandwidth allocation of the Virtual Function (QoS). If the VF is configured with a capacity of 2% the vNIC will also have this capacity.
vNIC Configuration
Before checking all the details on how to configure an SRIOV vNIC adapter you have to check all the prerequisites. As this is a new feature you will need the latest level of …. everything. My advice is to stay up to date as much as possible.
vNIC Prerequisites
These outputs are taken from the early shipment program. All of this can be changed at the GA release:
- Hardware Management Console v840:
# lshmc -V lshmc -V "version= Version: 8 Release: 8.4.0 Service Pack: 0 HMC Build level 20150803.3 ","base_version=V8R8.4.0 "
# oslevel -s 7200-00-00-0000 # cat /proc/version Oct 20 2015 06:57:03 1543A_720 @(#) _kdb_buildinfo unix_64 Oct 20 2015 06:57:03 1543A_720
Using the HMC GUI
The configuration of a vNIC is done at the partition level. The configuration is only available on the enhanced version of the GUI. Select the virtual machine on which you want to add the vNIC and in the Virtual I/O tab you’ll see that a new Virtual NICs session is here. Click on “Virtual NICs” and a new panel will be opened with a new button called “Add Virtual NIC”, just click this one to add a Virtual NIC:
All the SRIOV capable port will be displayed on the next screen. Choose the SRIOV port you want (a virtual function will be created on this one. Don’t do anything more, the creation of a vNIC will automatically create a Virtual Function; assign it to Virtual I/O Server and do the mapping to the vNIC for you). Choose the Virtual I/O Server that will be used for this vNIC (the vNIC server will be created on this Virtual I/O Server. Don’t worry we will talk about vNIC redundancy later in this post) and the Virtual NIC Capacity (the percentage the Phyiscal SRIOV port that will be dedicated to this vNIC)(this has to be a multiple of 2)(be careful with that it can’t be changed afterwards and you’ll have to delete your vNIC to redo the configuration) :
The “Advanced Virtual NIC Settings” allows you to choose the Virtual NIC Adapter ID, choosing a MAC Address, and configuring the vlan restrictions and vlan tagging. In the example below I’m configuring my Virtual NIC in the vlan 310:
Using the HMC Command Line
As always the configuration can be achieved using the HMC command line, using lshwres to list vNIC and chhwres to create a vNIC.
List SRIOV adapters to get the adapter_id needed by the chhwres command:
# lshwres -r sriov --rsubtype adapter -m blade-8286-41A-21AFFFF adapter_id=1,slot_id=21020014,adapter_max_logical_ports=48,config_state=sriov,functional_state=1,logical_ports=48,phys_loc=U78C9.001.WZS06RN-P1-C12,phys_ports=4,sriov_status=running,alternate_config=0 # lshwres -r virtualio -m blade-8286-41A-21AFFFF --rsubtype vnic --level lpar --filter "lpar_names=72vm1" lpar_name=72vm1,lpar_id=9,slot_num=7,desired_mode=ded,curr_mode=ded,port_vlan_id=310,pvid_priority=0,allowed_vlan_ids=all,mac_addr=ee3b8cd87707,allowed_os_mac_addrs=all,desired_capacity=2.0,backing_devices=sriov/vios1/2/1/1/27004008/2.0
Create the vNIC:
# chhwres -r virtualio -m blade-8286-41A-21AFFFF -o a -p 72vm1 --rsubtype vnic -v -a "port_vlan_id=310,backing_devices=sriov/vios2/1/1/1/2"
List the vNIC after create:
# lshwres -r virtualio -m blade-8286-41A-21AFFFF --rsubtype vnic --level lpar --filter "lpar_names=72vm1" lpar_name=72vm1,lpar_id=9,slot_num=7,desired_mode=ded,curr_mode=ded,port_vlan_id=310,pvid_priority=0,allowed_vlan_ids=all,mac_addr=ee3b8cd87707,allowed_os_mac_addrs=all,desired_capacity=2.0,backing_devices=sriov/vios1/2/1/1/27004008/2.0 lpar_name=72vm1,lpar_id=9,slot_num=2,desired_mode=ded,curr_mode=ded,port_vlan_id=310,pvid_priority=0,allowed_vlan_ids=all,mac_addr=ee3b8cd87702,allowed_os_mac_addrs=all,desired_capacity=2.0,backing_devices=sriov/vios2/1/1/1/2700400a/2.0
System and Virtual I/O Server Side:
- On the Virtual I/O Server you can use two commands to check your vNIC configuration. You can first use the lsmap command to check the one to one relationship between the VF and the vNIC (you see on the output below that a VF and a vnicserver device are created)(you can also see the name of the vNIC in the client partition side) :
# lsdev | grep VF ent4 Available PCIe2 100/1000 Base-TX 4-port Converged Network Adapter VF (df1028e214103c04) # lsdev | grep vnicserver vnicserver0 Available Virtual NIC Server Device (vnicserver) # lsmap -vadapter vnicserver0 -vnic Name Physloc ClntID ClntName ClntOS ------------- ---------------------------------- ------ -------------- ------- vnicserver0 U8286.41A.21FFFFF-V2-C32897 6 72nim1 AIX Backing device:ent4 Status:Available Physloc:U78C9.001.WZS06RN-P1-C12-T4-S16 Client device name:ent1 Client device physloc:U8286.41A.21FFFFF-V6-C3
# vnicstat -b vnicserver0 [..] -------------------------------------------------------------------------------- VNIC Server Statistics: vnicserver0 -------------------------------------------------------------------------------- Device Statistics: ------------------ State: active Backing Device Name: ent4 Client Partition ID: 6 Client Partition Name: 72nim1 Client Operating System: AIX Client Device Name: ent1 Client Device Location Code: U8286.41A.21FFFFF-V6-C3 [..] Device ID: df1028e214103c04 Version: 1 Physical Port Link Status: Up Logical Port Link Status: Up Physical Port Speed: 1Gbps Full Duplex [..] Port VLAN (Priority:ID): 0:3331 [..] VF Minimum Bandwidth: 2% VF Maximum Bandwidth: 100%
# lsdev -c adapter -s vdevice -t IBM,vnic ent0 Available Virtual NIC Client Adapter (vnic) ent1 Available Virtual NIC Client Adapter (vnic) ent3 Available Virtual NIC Client Adapter (vnic) ent4 Available Virtual NIC Client Adapter (vnic) # entstat -d ent0 | more [..] ETHERNET STATISTICS (ent0) : Device Type: Virtual NIC Client Adapter (vnic) [..] Virtual NIC Client Adapter (vnic) Specific Statistics: ------------------------------------------------------ Current Link State: Up Logical Port State: Up Physical Port State: Up Speed Running: 1 Gbps Full Duplex Jumbo Frames: Disabled [..] Port VLAN ID Status: Enabled Port VLAN ID: 3331 Port VLAN Priority: 0
Redundancy
You will certainly agree that having a such new cool feature without having something that is fully redundant would be a shame. Hopefully we have here a solution with the return with a great fanfare of the Network Interface Backup (NIB). As I told you before each time a vNIC is created a vnicserver is created on one of the Virtual I/O Server. (At the vNIC creation you have to choose on which Virtual I/O server it will be created). So to be fully redundant and to have a failover feature the only way is to create two vNIC adapters (one using the first Virtual I/O Server and the second one using the second Virtual I/O Server, on top of this you then have to create a Network Interface Backup, like in the old times ). Here are a couple of things and best practices to know before doing this.
- You can’t use two VF coming from the same SRIOV adapter physical port (the NIB creation will be ok, but any configuration on top of this NIB will fail).
- You can use two VF coming from the same SRIOV adapter but with two different logical ports (this is the example I will show below).
- The best partice is to use two VF coming from two different SRIOV adapters (you can then afford to loose one of the two SRIOV adapter).
Verify on your partition that you have two vNIC adapters and check that the status are ok using the ‘entstat‘ command:
- Both vNIC are available on the client partition:
# lsdev -c adapter -s vdevice -t IBM,vnic ent0 Available Virtual NIC Client Adapter (vnic) ent1 Available Virtual NIC Client Adapter (vnic) # lsdev -c adapter -s vdevice -t IBM,vnic -F physloc U8286.41A.21FFFFF-V6-C2 U8286.41A.21FFFFF-V6-C3
# entstat -d ent0 | grep -p vnic ------------------------------------------------------------- ETHERNET STATISTICS (ent0) : Device Type: Virtual NIC Client Adapter (vnic) Hardware Address: ee:3b:86:f6:45:02 Elapsed Time: 0 days 0 hours 0 minutes 0 seconds Virtual NIC Client Adapter (vnic) Specific Statistics: ------------------------------------------------------ Current Link State: Up Logical Port State: Up Physical Port State: Up
# entstat -d ent1 | grep -p vnic ------------------------------------------------------------- ETHERNET STATISTICS (ent1) : Device Type: Virtual NIC Client Adapter (vnic) Hardware Address: ee:3b:86:f6:45:03 Elapsed Time: 0 days 0 hours 0 minutes 0 seconds Virtual NIC Client Adapter (vnic) Specific Statistics: ------------------------------------------------------ Current Link State: Up Logical Port State: Up Physical Port State: Up
Verify on both Virtual I/O Server that the two vNIC are coming from two different SRIOV adapters (for the purpose of this test I’m using two different ports on the same SRIOV adapters but it remains the same with two different adapters). You can see on the output below that on Virtual I/O Server 1 the vNIC is backed to the adapter on position 3 (T3) and that on Virtual I/O Server 2 the vNIC is backed to the adapter on position 4 (T4):
- Once again use the lsmap command on the first Virtual I/O Server to check that (note that you can check the client name, and the client device):
# lsmap -vadapter vnicserver0 -vnic Name Physloc ClntID ClntName ClntOS ------------- ---------------------------------- ------ -------------- ------- vnicserver0 U8286.41A.21AFF8V-V1-C32897 6 72nim1 AIX Backing device:ent4 Status:Available Physloc:U78C9.001.WZS06RN-P1-C12-T3-S13 Client device name:ent0 Client device physloc:U8286.41A.21AFF8V-V6-C2
# lsmap -vadapter vnicserver0 -vnic -fmt : vnicserver0:U8286.41A.21AFF8V-V2-C32897:6:72nim1:AIX:ent4:Available:U78C9.001.WZS06RN-P1-C12-T4-S14:ent1:U8286.41A.21AFF8V-V6-C3
Finally create the Network Interface Backup and put and IP on top of it:
# mkdev -c adapter -s pseudo -t ibm_ech -a adapter_names=ent0 -a backup_adapter=ent1 ent2 Available # mktcpip -h 72nim1 -a 10.44.33.223 -i en2 -g 10.44.33.254 -m 255.255.255.0 -s en2 72nim1 inet0 changed en2 changed inet0 changed [..] # echo "vnic" | kdb +-------------------------------------------------+ | pACS | Device | Link | State | |------------------+--------+------+--------------| | F1000A0032880000 | ent0 | Up | Open | |------------------+--------+------+--------------| | F1000A00329B0000 | ent1 | Up | Open | +-------------------------------------------------+
Let’s now try different things to see if the redundancy is working ok. First let’s shutdown one of the Virtual I/O Server and let’s ping our machine from another one:
# ping 10.14.33.223 PING 10.14.33.223 (10.14.33.223) 56(84) bytes of data. 64 bytes from 10.14.33.223: icmp_seq=1 ttl=255 time=0.496 ms 64 bytes from 10.14.33.223: icmp_seq=2 ttl=255 time=0.528 ms 64 bytes from 10.14.33.223: icmp_seq=3 ttl=255 time=0.513 ms [..] 64 bytes from 10.14.33.223: icmp_seq=40 ttl=255 time=0.542 ms 64 bytes from 10.14.33.223: icmp_seq=41 ttl=255 time=0.514 ms 64 bytes from 10.14.33.223: icmp_seq=47 ttl=255 time=0.550 ms 64 bytes from 10.14.33.223: icmp_seq=48 ttl=255 time=0.596 ms [..] --- 10.14.33.223 ping statistics --- 50 packets transmitted, 45 received, 10% packet loss, time 49052ms rtt min/avg/max/mdev = 0.457/0.525/0.596/0.043 ms
# errpt | more IDENTIFIER TIMESTAMP T C RESOURCE_NAME DESCRIPTION 59224136 1120200815 P H ent2 ETHERCHANNEL FAILOVER F655DA07 1120200815 I S ent0 VNIC Link Down 3DEA4C5F 1120200815 T S ent0 VNIC Error CRQ 81453EE1 1120200815 T S vscsi1 Underlying transport error DE3B8540 1120200815 P H hdisk0 PATH HAS FAILED # echo "vnic" | kdb (0)> vnic +-------------------------------------------------+ | pACS | Device | Link | State | |------------------+--------+------+--------------| | F1000A0032880000 | ent0 | Down | Unknown | |------------------+--------+------+--------------| | F1000A00329B0000 | ent1 | Up | Open | +-------------------------------------------------+
Same test with the addition of an address to ping, and I’m only loosing 4 packets:
# ping 10.14.33.223 [..] 64 bytes from 10.14.33.223: icmp_seq=41 ttl=255 time=0.627 ms 64 bytes from 10.14.33.223: icmp_seq=42 ttl=255 time=0.548 ms 64 bytes from 10.14.33.223: icmp_seq=46 ttl=255 time=0.629 ms 64 bytes from 10.14.33.223: icmp_seq=47 ttl=255 time=0.492 ms [..] # errpt | more 59224136 1120203215 P H ent2 ETHERCHANNEL FAILOVER F655DA07 1120203215 I S ent0 VNIC Link Down 3DEA4C5F 1120203215 T S ent0 VNIC Error CRQ
vNIC Live Partition Mobility
By default you can use Live Partition Mobility with SRIOV vNIC, it is super simple and it is fully supported by IBM, as always I’ll show you how to do that using the HMC GUI and the command line:
Using the GUI
First validate the mobility operation, it will allow you to choose the destination SRIOV adapter/port on which to map your current vNIC. You have to choose:
- The adapter (if you have more than one SRIOV adapter).
- The Physical port on which the vNIC will be mapped.
- The Virtual I/O Server on which the vnicserver will be created.
New options are now available in the mobility validation panel:
Modify each vNIC to match your destination SRIOV adapter and ports (choose the destination Virtual I/O Server here):
Then migrate:
IDENTIFIER TIMESTAMP T C RESOURCE_NAME DESCRIPTION A5E6DB96 1120205915 I S pmig Client Partition Migration Completed 4FB9389C 1120205915 I S ent1 VNIC Link Up F655DA07 1120205915 I S ent1 VNIC Link Down 11FDF493 1120205915 I H ent2 ETHERCHANNEL RECOVERY 4FB9389C 1120205915 I S ent1 VNIC Link Up 4FB9389C 1120205915 I S ent0 VNIC Link Up [..] 59224136 1120205915 P H ent2 ETHERCHANNEL FAILOVER B50A3F81 1120205915 P H ent2 TOTAL ETHERCHANNEL FAILURE F655DA07 1120205915 I S ent1 VNIC Link Down 3DEA4C5F 1120205915 T S ent1 VNIC Error CRQ F655DA07 1120205915 I S ent0 VNIC Link Down 3DEA4C5F 1120205915 T S ent0 VNIC Error CRQ 08917DC6 1120205915 I S pmig Client Partition Migration Started
The ping test during the lpm show only 9 ping lost, due to etherchannel failover (on of my port was down at the destination server):
# ping 10.14.33.223 64 bytes from 10.14.33.223: icmp_seq=23 ttl=255 time=0.504 ms 64 bytes from 10.14.33.223: icmp_seq=31 ttl=255 time=0.607 ms
Using the command line
I’m moving back the partition using the HMC command line interface, check the manpage for all the details. Here is the details for the vnic_mappings: slot_num/ded/[vios_lpar_name]/[vios_lpar_id]/[adapter_id]/[physical_port_id]/[capacity]:
- Validate:
# migrlpar -o v -m blade-8286-41A-21AFFFF -t runner-8286-41A-21AEEEE -p 72nim1 -i 'vnic_mappings="2/ded/vios1/1/1/2/2,3/ded/vios2/2/1/3/2"' Warnings: HSCLA291 The selected partition may have an open virtual terminal session. The management console will force termination of the partition's open virtual terminal session when the migration has completed.
# migrlpar -o m -m blade-8286-41A-21AFFFF -t runner-8286-41A-21AEEEE -p 72nim1 -i 'vnic_mappings="2/ded/vios1/1/1/2/2,3/ded/vios2/2/1/3/2"'
Port Labelling
One thing very annoying using LPM with vNIC is that you have to do the mapping of your vNIC each time you are moving. The default choices are never ok and the GUI will always show you the first port or the first adapter and you will have to do that job by yourself. Even worse with the command line the vnic_mappings can give you some headaches . Hopefully there is a feature called port labelling. You can put a label on each SRIOV Physical port and all your machines. My advice is to tag the ports that are serving the same network and the same vlan with the same label on all your machines. During the mobility operation if labels are matching between two machine the adapter/port combination matching the label will be automatically chosen for the mobility and you will have nothing to do to map on your own. Super useful. The outputs below show you how to label your SRIOV ports:
# chhwres -m s00ka9942077-8286-41A-21C9F5V -r sriov --rsubtype physport -o s -a "adapter_id=1,phys_port_id=3,phys_port_label=adapter1port3" # chhwres -m s00ka9942077-8286-41A-21C9F5V -r sriov --rsubtype physport -o s -a "adapter_id=1,phys_port_id=2,phys_port_label=adapter1port2"
# lshwres -m s00ka9942077-8286-41A-21C9F5V -r sriov --rsubtype physport --level eth -F adapter_id,phys_port_label 1,adapter1port2 1,adapter1port3
At the validation time source and destination ports will automatically be matched:
What about performance
One of the main reason I’m looking for SRIOV vNIC adapter is performance. As all of our design is based on the fact that we need to move all of our virtual machines from a host to one another we need a solution allowing both mobility and performance. If you have tried to run a TSM server in a virtualized environment you’ll probably understand what I mean about performance and virtualization. In the case of TSM you need a lot of network bandwidth. My current customer and my previous one tried to do that using Shared Ethernet Adapters and of course this solution did not work because a classic Virtual Ethernet Adapter is not able to provide enough bandwidth for a single Virtual I/O client. I’m not an expert about network performance but the result you will see below are pretty obvious to understand and will show you the power of vNIC and SRIOV (I know some optimization can be done on the SEA side but it’s just a super simple test).
Methodology
I will try here to compare a classic Virtual Ethernet Adapter with a vNIC in the same configuration, both environments are the same, using same machines, same switches on so on:
- Two machines are used to do the test. In case of vNIC both are using a single vNIC bacedk to a 10Gb adapter, in case of Virtual Ethernet Adapter both are backed to a SEA build on top of a 10Gb adapter.
- The two machines are running on two different s814.
- Entitlement and memory are the same for source and destination machines.
- In the case of vNIC the capacity of the VF is set at 100% and the physical port of the SRIOV adapter is dedicated to the vNIC.
- In the case of vent the SEA is dedicated to the test virtual machine.
- In both cases a MTU of 1500 is utilized.
- The tool used for the performance test is iperf (MTU 1500, Window Size 64K, and 10 TCP thread)
SEA test for reference only
vNIC SRIOV test
We are here running the exact same test:
By using a vNIC I get 300% of the bandwidth I get with an virtual ethernet adapter. Just awesome no tuning (out of the box configuration). Nothing more to add about it it’s pretty obvious that the usage of vNIC for performance will be a must.
Conclusion
Are SRIOV vNICs the end of the SEAs ? Maybe, but not yet ! For some cases like performance and QoS it will be very useful and adopted (I’m pretty sure I will use that for my current customer to virtualized the TSM servers). But today in my opinion SRIOV lacks a real redundancy feature at the adapter level. What I want is a heartbeat communication between the two SRIOV adapters. Having such a feature on a SRIOV adapter will finish to convince customers to move from SEA to SRIOV vNIC. I know nothing about the future but I hope something like that will be available in the next few years. To sum up SRIOV vNICs are powerful, easy to use and simplify the configuration and management of your Power Servers. Please wait for the GA and try this new killer functionality. As always I hope it helps.