Configuration of a Remote Restart Capable partition

November 11, 2014, 2:25 pm

≫ Next: Automating systems deployment & other new features : HMC8, IBM Provisioning Toolkit for PowerVM and LPM Automation Tool

How can we move a partition to another machine if the machine or the data-center on which the partition is hosted is totally unavailable ? This question is often asked by managers and technical people. Live Partition Mobility can’t answer to this question because the source machine needs to be running to initiate the mobility. I’m sure that most of you are implementing a manual solution based on a bunch of scripts recreating the partition profile by hand but this is hard to maintain and it’s not fully automatized and not supported by IBM. A solution to this problem is to setup your partitions as Remote Restart Capable partitions. This PowerVM feature is available since the release of VMcontrol (IBM Systems Director plugin). Unfortunately this powerful feature is not well documented but will probably in the next months or in the next year be a must have on your newly deployed AIX machines. One last word : with the new Power8 machines things are going to change about remote restart, the functionality will be easier to use and a lot of prerequisites are going to disappear. Just to be clear this post has been written using Power7+ 9117-MMD machines, the only thing you can’t do with these machines (compared to Power8 ones) is changing a current partition to be remote restart capable aware without having to delete and recreate its profile.

Pre-requesite

To create and use a remote restart partition on Power7+/Power8 machines you’ll need this prerequisites :

A PowerVM enterprise license (Capability “PowerVM remote restart capable” to true, be careful there is another capability named “Remote restart capable” this was used by VMcontrol only, so double check the capability ok for you).
A firmware 780 (or later all Power8 firmware are ok, all Power7 >= 780 are ok).
Your source and destination machine are connected to the same Hardware Management Console, you can’t remote restart between two HMC at the moment.
Minimum version of HMC is 8r8.0.0. Check you have the rrstartlpar command (not the rrlpar command used by VMcontrol only).
Better than a long post check this video (don’t laugh at me, I’m trying to do my best but this is one of my first video …. hope it is good) :

What is a remote restart capable virtual machine ?

Better than a long text to explain you what is, check the picture below and follow each number from 1 to 4 to understand what is a remote restart partition :

Create the profile of you remote restart capable partition : Power7 vs Power8

A good reason to move to Power8 faster than you planed is that you can change a virtual machine to be remote restart capable without having to recreate the whole profile. I don’t know why at the time of writing this post but changing a non remote restart capable lpar to a remote restart capable lpar is only available on Power8 systems. If you are using a Power7 machine (like me in all the examples below) be carful to check this option while creating the machine. Keep in mind that if you forgot to check to option you will not be able to enable the remote restart capability afterwards and you unfortunately have to remove you profile and recreate it, sad but true … :

Don’t forget to check the check box to allow the partition to be remote restart capable :

After the partition is created you can notice in the I/O tab that all remote restart capable partition are not able to own any physical I/O adapter :

You can check in the properties that the remote restart capable feature is activated :

If you try to modify an existing profile on a Power7 machine you’ll get this error message. On a Power8 machine there will be not problem :

# chsyscfg -r lpar -m XXXX-9117-MMD-658B2AD -p test_lpar-i remote_restart_capable=1
An error occurred while changing the partition named test_lpar.
The managed system does not support changing the remote restart capability of a partition. You must delete the partition and recreate it with the desired remote restart capability.

You can verify that some of your lpar are remote restart capable :

lssyscfg -r lpar -m source-machine -F name,remote_restart_capable
[..]
lpar1,1
lpar2,1
lpar3,1
remote-restart,1
[..]

On a Power 7 machine the best way to enable remote restart on an already created machine is to delete the profile and recreate it by hand and adding it the remote restart attribute :

Get the current partition profile :

$ lssyscfg -r prof -m s00ka9927558-9117-MMD-658B2AD --filter "lpar_names=temp3-b642c120-00000133"
name=default_profile,lpar_name=temp3-b642c120-00000133,lpar_id=11,lpar_env=aixlinux,all_resources=0,min_mem=8192,desired_mem=8192,max_mem=8192,min_num_huge_pages=0,desired_num_huge_pages=0,max_num_huge_pages=0,mem_mode=ded,mem_expansion=0.0,hpt_ratio=1:128,proc_mode=shared,min_proc_units=2.0,desired_proc_units=2.0,max_proc_units=2.0,min_procs=4,desired_procs=4,max_procs=4,sharing_mode=uncap,uncap_weight=128,shared_proc_pool_id=0,shared_proc_pool_name=DefaultPool,affinity_group_id=none,io_slots=none,lpar_io_pool_ids=none,max_virtual_slots=64,"virtual_serial_adapters=0/server/1/any//any/1,1/server/1/any//any/1",virtual_scsi_adapters=3/client/2/s00ia9927560/32/0,virtual_eth_adapters=32/0/1659//0/0/vdct/facc157c3e20/all/0,virtual_eth_vsi_profiles=none,"virtual_fc_adapters=""2/client/1/s00ia9927559/32/c050760727c5007a,c050760727c5007b/0"",""4/client/1/s00ia9927559/35/c050760727c5007c,c050760727c5007d/0"",""5/client/2/s00ia9927560/34/c050760727c5007e,c050760727c5007f/0"",""6/client/2/s00ia9927560/35/c050760727c50080,c050760727c50081/0""",vtpm_adapters=none,hca_adapters=none,boot_mode=norm,conn_monitoring=1,auto_start=0,power_ctrl_lpar_ids=none,work_group_id=none,redundant_err_path_reporting=0,bsr_arrays=0,lpar_proc_compat_mode=default,electronic_err_reporting=null,sriov_eth_logical_ports=none

Remove the partition :

$ chsysstate -r lpar -o shutdown --immed -m source-server -n temp3-b642c120-00000133
$ rmsyscfg -r lpar -m source-server -n temp3-b642c120-00000133

Recreate the partition with the remote restart attribute enabled :

mksyscfg -r lpar -m s00ka9927558-9117-MMD-658B2AD -i 'name=temp3-b642c120-00000133,profile_name=default_profile,remote_restart_capable=1,lpar_id=11,lpar_env=aixlinux,all_resources=0,min_mem=8192,desired_mem=8192,max_mem=8192,min_num_huge_pages=0,desired_num_huge_pages=0,max_num_huge_pages=0,mem_mode=ded,mem_expansion=0.0,hpt_ratio=1:128,proc_mode=shared,min_proc_units=2.0,desired_proc_units=2.0,max_proc_units=2.0,min_procs=4,desired_procs=4,max_procs=4,sharing_mode=uncap,uncap_weight=128,shared_proc_pool_name=DefaultPool,affinity_group_id=none,io_slots=none,lpar_io_pool_ids=none,max_virtual_slots=64,"virtual_serial_adapters=0/server/1/any//any/1,1/server/1/any//any/1",virtual_scsi_adapters=3/client/2/s00ia9927560/32/0,virtual_eth_adapters=32/0/1659//0/0/vdct/facc157c3e20/all/0,virtual_eth_vsi_profiles=none,"virtual_fc_adapters=""2/client/1/s00ia9927559/32/c050760727c5007a,c050760727c5007b/0"",""4/client/1/s00ia9927559/35/c050760727c5007c,c050760727c5007d/0"",""5/client/2/s00ia9927560/34/c050760727c5007e,c050760727c5007f/0"",""6/client/2/s00ia9927560/35/c050760727c50080,c050760727c50081/0""",vtpm_adapters=none,hca_adapters=none,boot_mode=norm,conn_monitoring=1,auto_start=0,power_ctrl_lpar_ids=none,work_group_id=none,redundant_err_path_reporting=0,bsr_arrays=0,lpar_proc_compat_mode=default,sriov_eth_logical_ports=none'

Creating a reserved storage device

The reserved storage device pool is used to store the configuration data of the remote restart partition. At the time of writing this post thoses devices are mandatory and as far as I know they are used just to store the configuration and not the state (memory state) of the virtual machines itself (maybe in the future, who knows ?) (You can’t create or boot any remote restart partition if you do not have a reserved storage device pool created, do this before doing anything else) :

You have first to find on both Virtual I/O Server and on both machines (source and destination machine used for the remote restart operation) a bunch of devices. These ones have to be the same on all the Virtual I/O Server used for the remote restart operation. The lsmemdev command is used to find those devices :

vios1$ lspv | grep -iE "hdisk988|hdisk989|hdisk990"
hdisk988         00ced82ce999d6f3                     None
hdisk989         00ced82ce999d960                     None
hdisk990         00ced82ce999dbec                     None
vios2$ lspv | grep -iE "hdisk988|hdisk989|hdisk990"
hdisk988         00ced82ce999d6f3                     None
hdisk989         00ced82ce999d960                     None
hdisk990         00ced82ce999dbec                     None
vios3$ lspv | grep -iE "hdisk988|hdisk989|hdisk990"
hdisk988         00ced82ce999d6f3                     None
hdisk989         00ced82ce999d960                     None
hdisk990         00ced82ce999dbec                     None
vios4$ lspv | grep -iE "hdisk988|hdisk989|hdisk990"
hdisk988         00ced82ce999d6f3                     None
hdisk989         00ced82ce999d960                     None
hdisk990         00ced82ce999dbec                     None

$ lsmemdev -r avail -m source-machine -p vios1,vios2
[..]
device_name=hdisk988,redundant_device_name=hdisk988,size=61440,type=phys,phys_loc=U2C4E.001.DBJN916-P2-C1-T1-W500507680140F32C-L3E5000000000000,redundant_phys_loc=U2C4E.001.DBJN916-P2-C2-T1-W500507680140F32C-L3E5000000000000,redundant_capable=1
device_name=hdisk989,redundant_device_name=hdisk989,size=61440,type=phys,phys_loc=U2C4E.001.DBJN916-P2-C1-T1-W500507680140F32C-L3E6000000000000,redundant_phys_loc=U2C4E.001.DBJN916-P2-C2-T1-W500507680140F32C-L3E6000000000000,redundant_capable=1
device_name=hdisk990,redundant_device_name=hdisk990,size=61440,type=phys,phys_loc=U2C4E.001.DBJN916-P2-C1-T1-W500507680140F32C-L3E7000000000000,redundant_phys_loc=U2C4E.001.DBJN916-P2-C2-T1-W500507680140F32C-L3E7000000000000,redundant_capable=1
[..]
$ lsmemdev -r avail -m dest-machine -p vios3,vios4
[..]
device_name=hdisk988,redundant_device_name=hdisk988,size=61440,type=phys,phys_loc=U2C4E.001.DBJN914-P2-C2-T1-W500507680140F32C-L3E5000000000000,redundant_phys_loc=U2C4E.001.DBJN914-P2-C1-T1-W500507680140F32C-L3E5000000000000,redundant_capable=1
device_name=hdisk989,redundant_device_name=hdisk989,size=61440,type=phys,phys_loc=U2C4E.001.DBJN914-P2-C2-T1-W500507680140F32C-L3E6000000000000,redundant_phys_loc=U2C4E.001.DBJN914-P2-C1-T1-W500507680140F32C-L3E6000000000000,redundant_capable=1
device_name=hdisk990,redundant_device_name=hdisk990,size=61440,type=phys,phys_loc=U2C4E.001.DBJN914-P2-C2-T1-W500507680140F32C-L3E7000000000000,redundant_phys_loc=U2C4E.001.DBJN914-P2-C1-T1-W500507680140F32C-L3E7000000000000,redundant_capable=1
[..]

Create the reserved storage device pool using the chhwres command on the Hardware Management Console (create on all machines used by the remote restart operation) :

$ chhwres -r rspool -m source-machine -o a -a vios_names=\"vios1,vios2\"
$ chhwres -r rspool -m source-machine -o a -p vios1 --rsubtype rsdev --device hdisk988 --manual
$ chhwres -r rspool -m source-machine -o a -p vios1 --rsubtype rsdev --device hdisk989 --manual
$ chhwres -r rspool -m source-machine -o a -p vios1 --rsubtype rsdev --device hdisk990 --manual
$ lshwres -r rspool -m source-machine --rsubtype rsdev
device_name=hdisk988,vios_name=vios1,vios_id=1,size=61440,type=phys,state=Inactive,phys_loc=U2C4E.001.DBJN916-P2-C1-T1-W500507680140F32C-L3E5000000000000,is_redundant=1,redundant_device_name=hdisk988,redundant_vios_name=vios2,redundant_vios_id=2,redundant_state=Inactive,redundant_phys_loc=U2C4E.001.DBJN916-P2-C2-T1-W500507680140F32C-L3E5000000000000,lpar_id=none,device_selection_type=manual
device_name=hdisk989,vios_name=vios1,vios_id=1,size=61440,type=phys,state=Inactive,phys_loc=U2C4E.001.DBJN916-P2-C1-T1-W500507680140F32C-L3E6000000000000,is_redundant=1,redundant_device_name=hdisk989,redundant_vios_name=vios2,redundant_vios_id=2,redundant_state=Inactive,redundant_phys_loc=U2C4E.001.DBJN916-P2-C2-T1-W500507680140F32C-L3E6000000000000,lpar_id=none,device_selection_type=manual
device_name=hdisk990,vios_name=vios1,vios_id=1,size=61440,type=phys,state=Inactive,phys_loc=U2C4E.001.DBJN916-P2-C1-T1-W500507680140F32C-L3E7000000000000,is_redundant=1,redundant_device_name=hdisk990,redundant_vios_name=vios2,redundant_vios_id=2,redundant_state=Inactive,redundant_phys_loc=U2C4E.001.DBJN916-P2-C2-T1-W500507680140F32C-L3E7000000000000,lpar_id=none,device_selection_type=manual
$ lshwres -r rspool -m source-machine
"vios_names=vios1,vios2","vios_ids=1,2"

You can also create the reserved storage device pool from Hardware Management Console GUI :

After selecting the Virtual I/O Server, click select devices :

Choose the maximum and minimum size to filter the devices you can select for the creation of the reserved storage device :

Choose the disk you want to put in you reserved storage device pool (put all the devices used by remote restart partitions in manual mode, automatic devices are used by suspend/resume operation or AMS pool. One device can not be shared by two remote restart partitions) :

You can check afterwards that your reserved device storage pool is created and is composed by three devices :

Select a storage device for each remote restart partition before starting it :

After creating the reserved device storage pool you have for every partition to select a device from the storage pool. This device will be used to store the configuration data of the partition :

You can see you cannot start the partition if no devices were selected !
To select the correct device size you first have to calculate the needed space for every partition using the using the lsrsdevsize command. This size around the size of max memory value set in the partition profile (don’t ask me why):

$ lsrsdevsize -m source-machine -p temp3-b642c120-00000133
size=8498

Select the device you want to assign to your machine (in my case there was already a device selected for this machine) :

Then select the machine you want to assign for the device :

Or do this in command line :

$ chsyscfg -r lpar -m source-machine -i "name=temp3-b642c120-00000133,primary_rs_vios_name=vios1,secondary_rs_vios_name=vios2,rs_device_name=hdisk988"
$ lssyscfg -r lpar -m source-machine --filter "lpar_names=temp3-b642c120-00000133" -F primary_rs_vios_name,secondary_rs_vios_name,curr_rs_vios_name
vios1,vios2,vios1
$ lshwres -r rspool -m source-machine --rsubtype rsdev
device_name=hdisk988,vios_name=vios1,vios_id=1,size=61440,type=phys,state=Active,phys_loc=U2C4E.001.DBJN916-P2-C1-T1-W500507680140F32C-L3E5000000000000,is_redundant=1,redundant_device_name=hdisk988,redundant_vios_name=vios2,redundant_vios_id=2,redundant_state=Active,redundant_phys_loc=U2C4E.001.DBJN916-P2-C2-T1-W500507680140F32C-L3E5000000000000,lpar_name=temp3-b642c120-00000133,lpar_id=11,device_selection_type=manual

Launch the remote restart operation

All the remote restart operations are launched from the Hardware Management Console with the rrstartlpar command. At the time of writing this post there is not GUI function to remote restart a machine and you can only do it with the command line :

Validation

Like you can do it with a Live Partition Mobility move you can validate a remote restart operation before running it. You can only perform the remote restart operation if the machine on which the remote restart machine is hosted is shutdown or in error, so the validation is very useful and mandatory to check your remote restart machine are well configured without having to stop the source machine :

$ rrstartlpar -o validate -m source-machine -t dest-machine -p rrlpar
$ rrstartlpar -o validate -m source-machine -t dest-machine -p rrlpar -d 5
$ rrstartlpar -o validate -m source-machine -t dest-machine -p rrlpar --redundantvios 2 -d 5 -v

Execution

As I said before the remote restart operation can only be performed if the source machine is in a particular state, the states that allows a remote restart operation are :

Power Off.
Error.
Error – Dump in progress state.

So the only way to test a remote restart operation today is to shutdown your source machine :

Shutdown the source machine :

$ chsysstate -m source-machine -r sys  -o off --immed

You can next check on the Hardware Management Console that Virtual I/O Servers and the remote restart lpar are in state “Not available”. You’re now ready to remote restart the lpar (if the partition id is used on the destination machine the next available one will be used) (you have to wait a little before remote restarting the partition, check below) :

$ rrstartlpar -o restart -m source-machine -t dest-machine -p rrlpar -d 5 -v
HSCLA9CE The managed system is not in a valid state to support partition remote restart operations.
$ rrstartlpar -o restart -m source-machine -t dest-machine -p rrlpar -d 5 -v
Warnings:
HSCLA32F The specified partition ID is no longer valid. The next available partition ID will be used.

Cleanup

When the source machine is ready to be up (after an outage for instance) just boot the machine and its Virtual I/O Server. After the machine is up you can notice that the rrlpar profile is still there and it can be a huge problem if somebody is trying to boot this machine because it is started on the other machine after the remote restart operation. To prevent such an error you have to cleanup your remote restart partition by using the rrstartlpar command again. Be careful not to check the option to boot the partitions after the machine is started :

Restart the source machine and its Virtual I/O Servers :

$ chsysstate -m source-machine -r sys -o on
$ chsysstate -r lpar -m source-machine -n vios1 -o on -f default_profile
$ chsysstate -r lpar -m source-machine -n vios2 -o on -f default_profile

Perform the cleanup operation to remove the profile of the remote restart partition (if you want later to LPM back your machine you have to keep the device of the reserved device storage pool in the pool, if you do not use the –retaindev option the device will be automatically removed from the pool) :

$ rrstartlpar -o cleanup -m source-machine -p rrlpar --retaindev -d 5 -v --force

Refresh the partition and profile data

During my test I encounter a problem. The configuration was not correctly synced between the device used in the reserved device storage pool and the current partition profile. I had to use a command named refdev (for refresh device) to synchronize the partition and profile data to the storage device.

$ refdev -m source-machine -p refdev -m sys1 -p temp3-b642c120-00000133 -v

What’s in the reserved storage device ?

I’m a curious guy. After playing with remote restart I asked myself a question, what is really stored in the reserved device storage device assigned to the remote restart partition. Looking in the documentation on the internet does not answer to my question so I had to look on it on my own. By ‘dding” the reserved storage device assigned to a partition I realized that the profile is stored in xml format. Maybe this format is the same format that the one used by the HMC 8 templates library. For the moment and during my tests on Power7+ machine the state of the memory of the partition is not transferred to the destination machine, maybe because I had to shutdown the whole source machine to test. Maybe the memory state of the machine is transferred to the destination machine if this one is in error state or is dumping. I had not chance to test this :

root@vios1:/home/padmin# dd if=/dev/hdisk17 of=/tmp/hdisk17.out bs=1024 count=10
10+0 records in
10+0 records out
root@vios1:/home/padmin# more hdisk17.out
[..]
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
BwEAAAAAAACAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAACgDIAZAAAQAEAAgAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA" Profile="H4sIAAAAA
98VjxbxEAhNaZEqpEptPS/iMJO4cTJBdHVj38zcYvu619fTGQlQVmxY0AUICSH4A5XYorJgA1I3sGMBCx5Vs4RNd2zgXI89tpNMxslIiRzPufec853zfefk/t/osMfRBYPZRbpuF9ueUTQsShxR1NSl9dvEEPPMMgnfvPnVk
a2ixplLuOiVCHaUKn/yYMv/PY/ydTRuv016TbgOzdVv4w6+KM0vyheMX62jgq0L7hsCXtxBH6J814WoZqRh/96+4a+ff3Br8+o3uTE0pqJZA7vYoKKnOgYnNoSsoiPECp7KzHfELTQV/lnBAgt0/Fbfs4Wd1sV+ble7Lup/c
be0LQj01FJpoVpecaNP15MhHxpcJP8al6b7fg8hxCnPY68t8LpFjn83/eKFhcffjqF8DRUshs0almioaFK0OfHaUKCue/1GcN0ndyfg9/fwsyzQ6SblellXK6RDDaIIwem6L4iXCiCfCuBZxltFz6G4eHed2EWD2sVVx6Mth
eEOtnzSjQoVwLbo2+uEf3T/s2emPv3z4xA16eD0AC6oRN3FXNnYoA6U7y3OfFc1g5hOIiTQsVUHSusSc43QVluEX2wKdKJZq4q2YmJXEF7hhuqYJA0+inNx3YTDab2m6T7vEGpBlAaJnU0qjWofTkj+uT2Tv3Rl69prZx/9s
thQTBMK42WK7XSzrizqFhPL5E6FeHGVhnSJQLlKKreab1l6z9MwF0C/jTi3OfmKCsoczcJGwITgy+f74Z4Lu2OU70SDyIdXg1+JAApBWZoAbLaEj4InyonZIDbjvZGwv3H5+tb7C5tPThQA9oUdsCN0HsnWoLxWLjPHAdJSp
Ja45pBarVb3JDyUJOn3aemXcIqtUfgPi3wCuiw76tMh6mVtNVDHOB+BxqEUDWZGtPgPrFc9oBgBhhJzEdsEVI9zC1gr0JTexhwgThzIwYEG7lLbt3dcPyHQLKQqfGzVsSNzVSvenkDJU/lUoiXGRNrdxLy2soyhtcNX47INZ
nHKOCjYfsoeR3kpm58GdYDVxipIZXDgSmhfCDCPlKZm4dZoVFORzEX0J6CLvK4py6N7Pz94yiXlPBAArd3zqIEtjXFZ4izJzQ44sCv7hh3bTnY5TbKdnOtHGtatTjrEynTuWFNXV3ouaUKIIKfDgE5XrrpWb/SHWyWCbXMM5
DkaHNzXVJws6csK57jnpToLopiQLZdgHJJh9wm+M+wbof7GzSRJBYvAAaV0RvE8ZlA5yxSob4fAiJiNNwwQAwu2y5/O881fvvz3HxgK70ZDwc1FS8JezBgKR0e/S4XR3ta8OwmdS56akXJITAmYBpElF5lZOdlXuO+8N0opU
m0HeJTw76oiD8PS9QfRECUYqk0B1KGkZ+pRGQPUhPFEb12XIoe7u4WXuwdVqTAnZT8gyYrvAPlL/sYG4RkDmAx5HFZpFIVnAz9Lrlyh9tFIc4nZAColOLNGdFRKmE8GJd5zZx++zMiAoTOWNrJvBjODNo1UOGuXngzcHWjrn
LgmkxjBXLj+6Fjy1DHFF0zV6lVH/p+VYO6pbZzYD9/ORFLouy6MwvlGuRz8Qz10ugawprAdtJ4GxWAOtmQjZXJ+Lg58T/fDy4K74bYWr9CyLIVdQiplHPLbjinZRu4BZuAENE6jxTP2zNkBVgfiWiFcv7f3xYjFqxs/7vb0P
 lpar_name="rrlpar" lpar_uuid="0D80582A44F64B43B2981D632743A6C8" lpar_uuid_gen_method="0"><SourceLparConfig additional_mac_addr_bases="" ame_capability="0" auto_start_e
rmal" conn_monitoring="0" desired_proc_compat_mode="default" effective_proc_compat_mode="POWER7" hardware_mem_encryption="10" hardware_mem_expansion="5" keylock="normal
"4" lpar_placement="0" lpar_power_mgmt="0" lpar_rr_dev_desc="	<cpage>		<P>1</P>
		<S>51</S>
		<VIOS_descri
00010E0000000000003FB04214503IBMfcp</VIOS_descriptor>
	</cpage>
" lpar_rr_status="6" lpar_tcc_slot_id="65535" lpar_vtpm_status="65535" mac_addres
x_virtual_slots="10" partition_type="rpa" processor_compatibility_mode="default" processor_mode="shared" shared_pool_util_authority="0" sharing_mode="uncapped" slb_mig_
ofile="1" time_reference="0" uncapped_weight="128"><VirtualScsiAdapter is_required="false" remote_lpar_id="2" src_vios_slot_number="4" virtual_slot_number="4"/><Virtual
"false" remote_lpar_id="1" src_vios_slot_number="3" virtual_slot_number="3"/><Processors desired="4" max="8" min="1"/><VirtualFibreChannelAdapter/><VirtualEthernetAdapt
" filter_mac_address="" is_ieee="0" is_required="false" mac_address="82776CE63602" mac_address_flags="0" qos_priority="0" qos_priority_control="false" virtual_slot_numb
witch_id="1" vswitch_name="vdct"/><Memory desired="8192" hpt_ratio="7" max="16384" memory_mode="ded" min="256" mode="ded" psp_usage="3"><IoEntitledMem usage="auto"/></M
 desired="200" max="400" min="10"/></SourceLparConfig></SourceLparInfo></SourceInfo><FileInfo modification="0" version="1"/><SriovEthMappings><SriovEthVFInfo/></SriovEt
VirtualFibreChannelAdapterInfo/></VfcMappings><ProcPools capacity="0"/><TargetInfo concurr_mig_in_prog="-1" max_msp_concur_mig_limit_dynamic="-1" max_msp_concur_mig_lim
concur_mig_limit="-1" mpio_override="1" state="nonexitent" uuid_override="1" vlan_override="1" vsi_override="1"><ManagerInfo/><TargetMspInfo port_number="-1"/><TargetLp
ar_name="rrlpar" processor_pool_id="-1" target_profile_name="mig3_9117_MMD_10C94CC141109224549"><SharedMemoryConfig pool_id="-1" primary_paging_vios_id="0"/></TargetLpa
argetInfo><VlanMappings><VlanInfo description="VkVSU0lPTj0xClZJT19UWVBFPVZFVEgKVkxBTl9JRD0zMzMxClZTV0lUQ0g9dmRjdApCUklER0VEPXllcwo=" vlan_id="3331" vswitch_mode="VEB" v
ibleTargetVios/></VlanInfo></VlanMappings><MspMappings><MspInfo/></MspMappings><VscsiMappings><VirtualScsiAdapterInfo description="PHYtc2NzaS1ob3N0PgoJPGdlbmVyYWxJbmZvP
mVyc2lvbj4KCQk8bWF4VHJhbmZlcj4yNjIxNDQ8L21heFRyYW5mZXI+CgkJPGNsdXN0ZXJJRD4wPC9jbHVzdGVySUQ+CgkJPHNyY0RyY05hbWU+VTkxMTcuTU1ELjEwQzk0Q0MtVjItQzQ8L3NyY0RyY05hbWU+CgkJPG1pb
U9TcGF0Y2g+CgkJPG1pblZJT1Njb21wYXRhYmlsaXR5PjE8L21pblZJT1Njb21wYXRhYmlsaXR5PgoJCTxlZmZlY3RpdmVWSU9TY29tcGF0YWJpbGl0eT4xPC9lZmZlY3RpdmVWSU9TY29tcGF0YWJpbGl0eT4KCTwvZ2VuZ
TxwYXJ0aXRpb25JRD4yPC9wYXJ0aXRpb25JRD4KCTwvcmFzPgoJPHZpcnREZXY+CgkJPHZEZXZOYW1lPnJybHBhcl9yb290dmc8L3ZEZXZOYW1lPgoJCTx2TFVOPgoJCQk8TFVBPjB4ODEwMDAwMDAwMDAwMDAwMDwvTFVBP
FVOU3RhdGU+CgkJCTxjbGllbnRSZXNlcnZlPm5vPC9jbGllbnRSZXNlcnZlPgoJCQk8QUlYPgoJCQkJPHR5cGU+dmRhc2Q8L3R5cGU+CgkJCQk8Y29ubldoZXJlPjE8L2Nvbm5XaGVyZT4KCQkJPC9BSVg+CgkJPC92TFVOP
gkJCTxyZXNlcnZlVHlwZT5OT19SRVNFUlZFPC9yZXNlcnZlVHlwZT4KCQkJPGJkZXZUeXBlPjE8L2JkZXZUeXBlPgoJCQk8cmVzdG9yZTUyMD50cnVlPC9yZXN0b3JlNTIwPgoJCQk8QUlYPgoJCQkJPHVkaWQ+MzMyMTM2M
DAwMDAwMDAwMDNGQTA0MjE0NTAzSUJNZmNwPC91ZGlkPgoJCQkJPHR5cGU+VURJRDwvdHlwZT4KCQkJPC9BSVg+CgkJPC9ibG9ja1N0b3JhZ2U+Cgk8L3ZpcnREZXY+Cjwvdi1zY3NpLWhvc3Q+" slot_number="4" sou
_slot_number="4"><PossibleTargetVios/></VirtualScsiAdapterInfo><VirtualScsiAdapterInfo description="PHYtc2NzaS1ob3N0PgoJPGdlbmVyYWxJbmZvPgoJCTx2ZXJzaW9uPjIuNDwvdmVyc2lv
NjIxNDQ8L21heFRyYW5mZXI+CgkJPGNsdXN0ZXJJRD4wPC9jbHVzdGVySUQ+CgkJPHNyY0RyY05hbWU+VTkxMTcuTU1ELjEwQzk0Q0MtVjEtQzM8L3NyY0RyY05hbWU+CgkJPG1pblZJT1NwYXRjaD4wPC9taW5WSU9TcGF0
YXRhYmlsaXR5PjE8L21pblZJT1Njb21wYXRhYmlsaXR5PgoJCTxlZmZlY3RpdmVWSU9TY29tcGF0YWJpbGl0eT4xPC9lZmZlY3RpdmVWSU9TY29tcGF0YWJpbGl0eT4KCTwvZ2VuZXJhbEluZm8+Cgk8cmFzPgoJCTxwYXJ0
b25JRD4KCTwvcmFzPgoJPHZpcnREZXY+CgkJPHZEZXZOYW1lPnJybHBhcl9yb290dmc8L3ZEZXZOYW1lPgoJCTx2TFVOPgoJCQk8TFVBPjB4ODEwMDAwMDAwMDAwMDAwMDwvTFVBPgoJCQk8TFVOU3RhdGU+MDwvTFVOU3Rh
cnZlPm5vPC9jbGllbnRSZXNlcnZlPgoJCQk8QUlYPgoJCQkJPHR5cGU+dmRhc2Q8L3R5cGU+CgkJCQk8Y29ubldoZXJlPjE8L2Nvbm5XaGVyZT4KCQkJPC9BSVg+CgkJPC92TFVOPgoJCTxibG9ja1N0b3JhZ2U+CgkJCTxy
UlZFPC9yZXNlcnZlVHlwZT4KCQkJPGJkZXZUeXBlPjE8L2JkZXZUeXBlPgoJCQk8cmVzdG9yZTUyMD50cnVlPC9yZXN0b3JlNTIwPgoJCQk8QUlYPgoJCQkJPHVkaWQ+MzMyMTM2MDA1MDc2ODBDODAwMDEwRTAwMDAwMDAw
ZmNwPC91ZGlkPgoJCQkJPHR5cGU+VURJRDwvdHlwZT4KCQkJPC9BSVg+CgkJPC9ibG9ja1N0b3JhZ2U+Cgk8L3ZpcnREZXY+Cjwvdi1zY3NpLWhvc3Q+" slot_number="3" source_vios_id="1" src_vios_slot_n
tVios/></VirtualScsiAdapterInfo></VscsiMappings><SharedMemPools find_devices="false" max_mem="16384"><SharedMemPool/></SharedMemPools><MigrationSession optional_capabil
les" recover="na" required_capabilities="veth_switch,hmc_compatibilty,proc_compat_modes,remote_restart_capability,lpar_uuid" stream_id="9988047026654530562" stream_id_p
on>

About the state of the source machine ?

You have to know this before using remote restart : at the time of writing this post the remote restart feature is still young and have to evolve before being usable in real life, I’m saying this because the FSP of the source machine has to be up to perform a remote restart operation. To be clear the remote restart feature does not answer to the total loss of one of your site. It’s just useful to restart partitions of a system with a problem that is not an FSP problem (problem with memory DIMM, problem with CPUs for instance). It can be used in your DRP exercises but not if your whole site is totally down which is -in my humble opinion- one of the key feature that remote restart needs to answer. Don’t be afraid read the conclusion ….

Conclusion

This post have been written using Power7+ machines, my goal was to give you an example of remote restart operations : a summary of what is is, how it work, and where and when to use it. I’m pretty sure that a lot of things are going to change about remote restart. First, on Power8 machines you don’t have to recreate the partitions to make them remote restart aware. Second, I know that changes are on the way for remote restart on Power8 machines, especially about reserved storage devices and about the state of the source machine. I’m sure this feature will have a bright future and used with PowerVC it can be a killer feature. Hope to see all this changes in a near future ;-). Once again I hope this post helps you.

↧

Automating systems deployment & other new features : HMC8, IBM Provisioning Toolkit for PowerVM and LPM Automation Tool

January 22, 2015, 4:28 pm

≫ Next: Using Chef and cloud-init with PowerVC 1.2.2.2 | What’s new in version 1.2.2.2

≪ Previous: Configuration of a Remote Restart Capable partition

I am involved in a project where we are going to deploy dozen of Power Systems (still Power7 for the moment, and Power8 in a near future). All the systems will be the same : same models with the same slots emplacements and the same Virtual I/O Server configuration. To be sure that all my machines are the same and to allow other people (who are not aware of the design or are not skilled enough to do it by themselves) I had to find a solution to automatize the deployment of the new machines. For the virtual machines the solution is now to use PowerVC but what about the Virtual I/O Servers, what about the configuration of the Shared Ethernet Adapters. In other words what about the infrastructure deployment ? I spent a week with an IBM US STG Lab services consultant (Bonnie Lebarron) for a PowerCare (you have now a PowerCare included with every high end machine you buy) about the IBM Provisioning Toolkit for PowerVM (which is a very powerful tool that allows you to deploy your Virtual I/O Server and your virtual machines automatically) and the Live Partition Mobility Automation tool. With the new Hardware Management Console (8R8.2.0) you now have the possibility to create templates not just for the new virtual machines creation, but also to deploy create and configure your Virtual I/O Severs. The goal of this post is to show that there are different way to do that but also to show you the new features embedded with the new Hardware Management Console and to spread the world about those two STG Labs Services wonderful tools that are well know in US but not so much in Europe. So it’s a HUGE post, just take what is useful for you in it. Here we go :

Hardware Management Console 8 : System templates

The goal of the systems templates is to deploy a new server in minutes without having to logging on different servers to do some tasks, you now just have to connect on the HMC to do all the work. The systems templates will deploy the Virtual I/O Server image by using your NIM server or by using the images stored in the Hardware Management Console media repository. Please note a few points :

You CAN’T deploy a “gold” mksysb of your Virtual I/O Server using the Hardware Management Console repository. I’ve tried this myself and it is for the moment impossible (if someone has a solution …). I’ve tried two different ways. Creating a backupios image without the mksysb flag (it will produce a tar file impossible to upload on the image repository, but usable by the installios command). Creating a backupios image with the mksysb flag and use the mkcd/mkdvd command to create iso images. Both method were failing at the installation process.
The current Virtual I/O Server images provided in the Eelectonic Software Delivry (2.2.3.4 at the moment) are provided in the .udf format and not the .iso format. This is not a huge problem, just rename both files to .iso before uploading the file on the Hardware Management Console.
If you want to deploy your own mksysb you can still choose to use your NIM server, but you will have to manually create the NIM objects, and to manually configure a bosinst installation (in my humble opinion what we are trying to do is to reduce manual interventions, but you can still do that for the moment, that’s what I do because I don’t have the choice). You’ll have to give the IP address of the NIM server and the HMC will boot the Virtual I/O Servers with the network settings already configured.
The Hardware Management Console installation with the media repository is based on the old well known installios command. You still need to have the NIM port opened between your HMC and the Virtual I/O Server management network (the one you will choose to install both Virtual I/O Servers) (installios is based on NIMOL). You may experience some problems if you already install your Virtual I/O Servesr this way and you may have to reset some things. My advice is to always run these three commands before deploying a system template :

# installios -F -e -R default1
# installios -u 
# installios -q

Uploading an iso file on the Hardware Management Console

Upload the images on the Hardware Management Console, I’ll not explain this in details …:

Creating a system template

To create a system template you have first to copy an existing predefined template provided by the Hardware Management Console (1) and then edit this template to fit you own needs (2) :

You can’t edit the physical I/O part when editing a new template, you first have to deploy a system with this template to choose the physical I/O for each Virtual I/O Server and then capture this deployed system as an HMC template. Change the properties of your Virtual I/O Server :

Create your Shared Ethernet Adapters : let’s say we want to create one Shared Ethernet Adapter in sharing mode with four virtual adapters :
Adapter 1 : PVID10, vlans=1024;1025
Adapter 2 : PVID11, vlans=1028;1029
Adapter 3 : PVID12, vlans=1032;1033
Adapter 4 : PVID13, vlans=1036;1037
In the new HMC8 the terms are changing and are not the same : Virtual Network Bridge = Shared Ethernet Adapter; Load (Balance) Group = A pair of virtual adapters with the same PVID on both Virtual I/O Server.
Create the Shared Ethernet Adapter with the first (with PVID10) and the second (with PVID11) adapter and the first vlan (vlan 1024 has to be added on adapter with PVID 10) :

Add the second vlan (the vlan 1028) in our Shared Ethernet Adapter (Virtual Network Bridge) and choose to put it on the adapter with PVID 11 (Load Balance Group 11) :

Repeat this operation for the next vlan (1032), but this time we have to create new virtual adapters with PVID 12 (Load Balance Group 12) :

Repeat this operation for the next vlan (1036), but this time we have to create new virtual adapters with PVID 13 (Load Balance Group 13).
You can check on this picture our 4 virtual adapters with two vlans for each ones :

I’ll not detail the other part which are very simple to understand. You can check at the end our template is created 2 Virtual I/O Servers and 8 virtual networks.

The Shared Ethernet Adapter problem : Are you deploying a Power8/Power7 with a 780 firmware or a Power6/7 server ?

When creating a system template you probably notice that when your are defining your your Shared Ethernet Adapters … sorry your Virtual Network Bridges there is no possibility to create any control channel adapters or any possibility to assign a vlan id for this control channel. If you choose to create the system template by hand with the HMC the template will be usable by all Power8 systems and all Power7 system with a firmware that allows you to create a Shared Ethernet Adapter without any control channel (780 firmwares). I’ve tried this myself and we will check that later. If you are deploying a system template an older power 7 system the deployment will fail because of this reason. You have two solutions to this problem. Create your first system “by hand” and create your Shared Ethernet Adapters with control channel on your own and then capture the system to redeploy on other machines or you have the choice to edit the XML of you current template to add the control channel adapter in it …no comments.

If you choose to edit the template to add the control channel on your own, export your template as an xml file and edit it by hand (here is an example on the picture below), and then re-imported the modified xml file :

Capture an already deployed system

As you can see creating a system template from scratch can be hard and cannot match all your needs especially with this Shared Ethernet Adapter problem. My advice is to deploy by hand or by using the toolkit your first system and then capture the system to create and Hardware Management Console template based on this one. By doing this all the Shared Ethernet Adapters will be captured as configured, the ones with control channels and the ones without control channel. It can match all the cases without having to edit the xml file by hand.

Click “Capture configuration as template with physical I/O” :

The whole system will be captured and if you put your physical I/O in the same slot (as we do in my team) each time you deploy a new server you will not have to choice which physical I/O will belong to which Virtual I/O server :

In the system template library you can check that the physical I/O are captured and that we do not have to define our Shared Ethernet Adapter (the screenshot below shows you 49 vlans ready to be deployed) :

To do this don’t forget to edit the template and check the box “Use captured I/O information” :

Deploying a system template

BE VERY CAREFUL BEFORE DEPLOYING A SYSTEM TEMPLATE ALL THE ALREADY EXISTING VIRTUAL I/O SERVERS AND PARTITIONS WILL BE REMOVED BY DOING THIS. THE HMC WILL PROMPT YOU A WARNING MESSAGE. Go in the template library and right click on the template you want to deploy, then click deploy :

If you are deploying a “non captured template” choose the physical I/O for each Virtual I/O Servers :

If you are deploying a “captured template” the physical I/O will be automatically choose for each Virtual I/O Servers :

The Virtual I/O Server profiles are craved here :

You next have the choice to use a NIM server or to use the HMC image repository to deploy the Virtual I/O Servers in both cases you have to choose the adapter used to deploy the image :
NIM way :

HMC way (check the tip at the beginning of the post about installios if you are choosing this method :

Click start when you are ready. The start button will invoke the lpar_netboot command with the settings you put in the previous screen :

You can monitor the installation process by clicking monitoring vterm (on the images below you can check the ping is successful, the bootp is ok, the tftp is downloading, and the being mksysb restored :

The RMC connection has to be up on both Virtual I/O Servers to build the Shared Ethernet Adapters and the Virtual I/O Server license must be accepted. Check both are ok.

Choose where the Shared Ethernet Adapters will be created and the create the link aggregation device here (choose here on which network adapters and network ports will your Shared Ethernet Adapters be created) :

Click start on the next screen to create the Shared Ethernet Adapter automatically :

After a successful deployment of a system template a summary will be displayed on the screen :

IBM Provisioning Toolkit for PowerVM : A tool created by the Admins for the Admins

As you now know the HMC templates are ok, but there are some drawbacks about using this method. In my humble opinion the HMC templates are good for a beginner, the user is now guided step by step and it is much simpler for someone who doesn’t know anything about PowerVM to build a server from scratch, without knowing and understanding all the features of PowerVM (Virtual I/O Server, Shared Ethernet Adapter). The deployment is not fully automatized the HMC will not mirror your rootvg, will not set any attributes on your fiber channel adapters, will never run a custom script after the installation to fit your needs. Last point, I’m sure that as a system administrator you probably prefer using command line tools than a “crappy” GUI, a template can not be created, neither deployed in command line (change this please). There is another way to build your server and it’s called IBM PowerVM Provisioning toolkit. This tool is developed by STG Lab Services US and is not well known in Europe but I can assure you that a lot of US customers are using it (raise your voice in comments us guys). This tool can help you in many ways :

Carving Virtual I/O Servers profiles.
Building and deploying Virtual I/O Servers with a NIM Server without having to create anything by hand.
Creating your SEA with or without control channel, failover/sharing, tagged/non-tagged.
Setting attributes on your fire channel adapters.
Building and deploying Virtual I/O Clients in NPIV and vscsi.
Mirroring you rootvg.
Capturing a whole frame and redeploy it on another server.
A lot of other things.

Just to let you understand the approach of the tool let’s begin with an example. I want to deploy a new machine with two Virtual I/O Server :

1 (white) – I’m writing a profile file : in this one I’m putting all the information that are the same all the machines (virtual switches, shared processor pools, Virtual I/O Server profiles, Shared Ethernet Adapter definition, image chosen to deploy the Virtual I/O Server, physical I/O adapter for each Virtual I/O Server)
2 (white) – I’m writing a config file : in this one I’m putting all the information that are unique for each machine (name, ip, HMC name used to deploy, CEC serial number, and so on)
3 (yellow) – I’m launching the provisioning toolkit to build my machine, the NIM objects are created (networks, standalone machines) and the bosinst operation is launched from the NIM server
4 (red) – The Virtual I/O Servers profiles are created and the lpar_netboot command is launched an ssh key has to be shared between the NIM server and the Hardware management console
5 (blue) – Shared Ethernet Adapter are created and post configuration is launched on the Virtual I/O Server (mirror creation, vfc attributes …)

Let me show you a detailed example of a new machine deployment :

On the NIM server, the toolkit is located in /export/nim/provision. You can see that the main script called buildframe.ksh.v3.24.2, and two directories one for the profiles (build_profiles) and one for the configuration files (config_files). The work_area directory is the log directory :

# cd /export/nim/provision
# ls
build_profiles          buildframe.ksh.v3.24.2  config_files       lost+found              work_area

Let’s check a configuration file a new Power720 deployment :

# vi build_profiles/p720.conf

Some variables will be set in the configuration file put N/A value for this ones :

VARIABLES      (SERVERNAME)=NA
VARIABLES      (BUILDHMC)=NA
[..]
VARIABLES      (BUILDUSER)=hscroot
VARIABLES      (VIO1_LPARNAME)=NA
VARIABLES      (vio1_hostname)=(VIO1_LPARNAME)
VARIABLES      (VIO1_PROFILE)=default_profile

VARIABLES      (VIO2_LPARNAME)=NA
VARIABLES      (vio2_hostname)=(VIO2_LPARNAME)
VARIABLES      (VIO2_PROFILE)=default_profile

VARIABLES      (VIO1_IP)=NA
VARIABLES      (VIO2_IP)=NA

Choose the ports that will be used to restore the Virtual I/O Server mksysb :

VARIABLES      (NIMPORT_VIO1)=(CEC1)-P1-C6-T1
VARIABLES      (NIMPORT_VIO2)=(CEC1)-P1-C7-T1

In the example I’m building the Virtual I/O Server with 3 Shared Ethernet Adapters, and I’m not creating any LACP aggregation :

# SEA1
VARIABLES      (SEA1VLAN1)=401
VARIABLES      (SEA1VLAN2)=402
VARIABLES      (SEA1VLAN3)=403
VARIABLES      (SEA1VLAN4)=404
VARIABLES      (SEA1VLANS)=(SEA1VLAN1),(SEA1VLAN2),(SEA1VLAN3),(SEA1VLAN4)
# SEA2
VARIABLES      (SEA2VLAN1)=100,101,102
VARIABLES      (SEA2VLAN2)=103,104,105
VARIABLES      (SEA2VLAN3)=106,107,108
VARIABLES      (SEA2VLAN4)=109,110
VARIABLES      (SEA2VLANS)=(SEA2VLAN1),(SEA2VLAN2),(SEA2VLAN3),(SEA2VLAN4)
# SEA3
VARIABLES      (SEA3VLAN1)=200,201,202,203,204,309
VARIABLES      (SEA3VLAN2)=205,206,207,208,209,310
VARIABLES      (SEA3VLAN3)=210,300,301,302,303
VARIABLES      (SEA3VLAN4)=304,305,306,307,308
VARIABLES      (SEA3VLANS)=(SEA3VLAN1),(SEA3VLAN2),(SEA3VLAN3),(SEA3VLAN4)
# SEA DEF (I'm putting adapter ID and PVID here)
SEADEF         seadefid=SEA1,networkpriority=S,vswitch=vdct,seavirtid=10,10,(SEA1VLAN1):11,11,(SEA1VLAN2):12,12,(SEA1VLAN3):13,13,(SEA1VLAN4),seactlchnlid=14,99,vlans=(SEA1VLANS),netmask=(SEA1NETMASK),gateway=(SEA1GATEWAY),etherchannel=NO,lacp8023ad=NO,vlan8021q=YES,seaat
trid=nojumbo
SEADEF         seadefid=SEA2,networkpriority=S,vswitch=vdca,seavirtid=15,15,(SEA2VLAN1):16,16,(SEA2VLAN2):17,17,(SEA2VLAN3):18,18,(SEA2VLAN4),seactlchnlid=19,98,vlans=(SEA2VLANS),netmask=(SEA2NETMASK),gateway=(SEA2GATEWAY),etherchannel=NO,lacp8023ad=NO,vlan8021q=YES,seaat
trid=nojumbo
SEADEF         seadefid=SEA3,networkpriority=S,vswitch=vdcb,seavirtid=20,20,(SEA3VLAN1):21,21,(SEA3VLAN2):22,22,(SEA3VLAN3):23,23,(SEA3VLAN4),seactlchnlid=24,97,vlans=(SEA3VLANS),netmask=(SEA3NETMASK),gateway=(SEA3GATEWAY),etherchannel=NO,lacp8023ad=NO,vlan8021q=YES,seaat
trid=nojumbo
# SEA PHYSICAL PORTS 
VARIABLES      (SEA1AGGPORTS_VIO1)=(CEC1)-P1-C6-T2
VARIABLES      (SEA1AGGPORTS_VIO2)=(CEC1)-P1-C7-T2
VARIABLES      (SEA2AGGPORTS_VIO1)=(CEC1)-P1-C1-C3-T1
VARIABLES      (SEA2AGGPORTS_VIO2)=(CEC1)-P1-C1-C4-T1
VARIABLES      (SEA3AGGPORTS_VIO1)=(CEC1)-P1-C4-T1
VARIABLES      (SEA3AGGPORTS_VIO2)=(CEC1)-P1-C5-T1
# SEA ATTR 
SEAATTR        seaattrid=nojumbo,ha_mode=sharing,largesend=1,large_receive=yes

I’m defining each physical I/O adapter for each Virtual I/O Servers :

VARIABLES      (HBASLOTS_VIO1)=(CEC1)-P1-C1-C1,(CEC1)-P1-C2
VARIABLES      (HBASLOTS_VIO2)=(CEC1)-P1-C1-C2,(CEC1)-P1-C3
VARIABLES      (ETHSLOTS_VIO1)=(CEC1)-P1-C6,(CEC1)-P1-C1-C3,(CEC1)-P1-C4
VARIABLES      (ETHSLOTS_VIO2)=(CEC1)-P1-C7,(CEC1)-P1-C1-C4,(CEC1)-P1-C5
VARIABLES      (SASSLOTS_VIO1)=(CEC1)-P1-T9
VARIABLES      (SASSLOTS_VIO2)=(CEC1)-P1-C19-T1
VARIABLES      (NPIVFCPORTS_VIO1)=(CEC1)-P1-C1-C1-T1,(CEC1)-P1-C1-C1-T2,(CEC1)-P1-C1-C1-T3,(CEC1)-P1-C1-C1-T4,(CEC1)-P1-C2-T1,(CEC1)-P1-C2-T2,(CEC1)-P1-C2-T3,(CEC1)-P1-C2-T4
VARIABLES      (NPIVFCPORTS_VIO2)=(CEC1)-P1-C1-C2-T1,(CEC1)-P1-C1-C2-T2,(CEC1)-P1-C1-C2-T3,(CEC1)-P1-C1-C2-T4,(CEC1)-P1-C3-T1,(CEC1)-P1-C3-T2,(CEC1)-P1-C3-T3,(CEC1)-P1-C3-T4

I’m defining the mksysb image to use and the Virtual I/O Server profiles :

BOSINST        bosinstid=viogold,source=mksysb,mksysb=golden-vios-2234-29122014-mksysb,spot=golden-vios-2234-29122014-spot,bosinst_data=no_prompt_hdisk0-bosinst_data,accept_licenses=yes,boot_client=no

PARTITIONDEF   partitiondefid=vioPartition,bosinstid=viogold,lpar_env=vioserver,proc_mode=shared,min_proc_units=0.4,desired_proc_units=1,max_proc_units=16,min_procs=1,desired_procs=4,max_procs=16,sharing_mode=uncap,uncap_weight=255,min_mem=1024,desired_mem=8192,max_mem=12
288,mem_mode=ded,max_virtual_slots=500,all_resources=0,msp=1,allow_perf_collection=1
PARTITION      name=(VIO1_LPARNAME),profile_name=(VIO1_PROFILE),partitiondefid=vioPartition,lpar_netboot=(NIM_IP),(vio1_hostname),(VIO1_IP),(NIMNETMASK),(NIMGATEWAY),(NIMPORT_VIO1),(NIM_SPEED),(NIM_DUPLEX),NA,YES,NO,NA,NA
PARTITION      name=(VIO2_LPARNAME),profile_name=(VIO2_PROFILE),partitiondefid=vioPartition,lpar_netboot=(NIM_IP),(vio2_hostname),(VIO2_IP),(NIMNETMASK),(NIMGATEWAY),(NIMPORT_VIO2),(NIM_SPEED),(NIM_DUPLEX),NA,YES,NO,NA,NA

Let’s now check a configuration file for a specific machine (as you can see I’m putting the Virtual I/O Server name here, the ip address and all that is specific to the new machines (CEC serial number and so on)) :

# cat P720-8202-E4D-1.conf
(BUILDHMC)=myhmc
(SERVERNAME)=P720-8202-E4D-1
(CEC1)=WZSKM8U
(VIO1_LPARNAME)=labvios1
(VIO2_LPARNAME)=labvios2
(VIO1_IP)=10.14.14.1
(VIO2_IP)=10.14.14.2
(NIMGATEWAY)=10.14.14.254
(VIODNS)=10.10.10.1,10.10.10.2
(VIOSEARCH)=lab.chmod66.org,prod.chmod666.org
(VIODOMAIN)=chmod666.org

We are now ready to build the new machine. the first thing to do is to create the vswitches on the machine (you have to confirm all operations):

./buildframe.ksh.v3.24.2 -p p720 -c P720-8202-E4D-1.conf -f vswitch
150121162625 Start of buildframe DATE: (150121162625) VERSION: v3.24.2
150121162625        profile: p720.conf
150121162625      operation: FRAMEvswitch
150121162625 partition list:
150121162625   program name: buildframe.ksh.v3.24.2
150121162625    install dir: /export/nim/provision
150121162625    post script:
150121162625          DEBUG: 0
150121162625         run ID: 150121162625
150121162625       log file: work_area/150121162625_p720.conf.log
150121162625 loading configuration file: config_files/P720-8202-E4D-1.conf
[..]
Do you want to continue?
Please enter Y or N Y
150121162917 buildframe is done with return code 0

Let’s now build the Virtual I/O Servers, create the Shared Ethernet Adapters and let’s have a coffee ;-)

# ./buildframe.ksh.v3.24.2 -p p720 -c P720-8202-E4D-1.conf -f build
[..]
150121172320 Creating partitions
150121172320                 --> labvios1
150121172322                 --> labvios2
150121172325 Updating partition profiles
150121172325   updating VETH adapters in partition: labvios1 profile: default_profile
150121172329   updating VETH adapters in partition: labvios1 profile: default_profile
150121172331   updating VETH adapters in partition: labvios1 profile: default_profile
150121172342   updating VETH adapters in partition: labvios2 profile: default_profile
150121172343   updating VETH adapters in partition: labvios2 profile: default_profile
150121172344   updating VETH adapters in partition: labvios2 profile: default_profile
150121172345   updating IOSLOTS in partition: labvios1 profile: default_profile
150121172347   updating IOSLOTS in partition: labvios2 profile: default_profile
150121172403 Configuring NIM for partitions
150121172459 Executing--> lpar_netboot   -K 255.255.255.0 -f -t ent -l U78AA.001.WZSKM8U-P1-C6-T1 -T off -D -s auto -d auto -S 10.20.20.1 -G 10.14.14.254 -C 10.14.14.1 labvios1 default_profile s00ka9936774-8202-E4D-845B2CV
150121173247 Executing--> lpar_netboot   -K 255.255.255.0 -f -t ent -l U78AA.001.WZSKM8U-P1-C7-T1 -T off -D -s auto -d auto -S 10.20.20.1 -G 10.14.14.254 -C 10.14.14.2 labvios2 default_profile s00ka9936774-8202-E4D-845B2CV
150121174028 buildframe is done with return code 0

After the mksysb is deployed you can tail the logs on each Virtual I/O Server to check what is going on :

[..]
150121180520 creating SEA for virtID: ent4,ent5,ent6,ent7
ent21 Available
en21
et21
150121180521 Success: running /usr/ios/cli/ioscli mkvdev -sea ent1 -vadapter ent4,ent5,ent6,ent7 -default ent4 -defaultid 10 -attr ctl_chan=ent8  ha_mode=sharing largesend=1 large_receive=yes, rc=0
150121180521 found SEA ent device: ent21
150121180521 creating SEA for virtID: ent9,ent10,ent11,ent12
[..]
ent22 Available
en22
et22
150121180523 Success: running /usr/ios/cli/ioscli mkvdev -sea ent20 -vadapter ent9,ent10,ent11,ent12 -default ent9 -defaultid 15 -attr ctl_chan=ent13  ha_mode=sharing largesend=1 large_receive=yes, rc=0
150121180523 found SEA ent device: ent22
150121180523 creating SEA for virtID: ent14,ent15,ent16,ent17
[..]
ent23 Available
en23
et23
[..]
150121180540 Success: /usr/ios/cli/ioscli cfgnamesrv -add -ipaddr 10.10.10.1, rc=0
150121180540 adding DNS: 10.10.10.1
150121180540 Success: /usr/ios/cli/ioscli cfgnamesrv -add -ipaddr 10.10.10.2, rc=0
150121180540 adding DNS: 159.50.203.10
150121180540 adding DOMAIN: lab.chmod666.org
150121180541 Success: /usr/ios/cli/ioscli cfgnamesrv -add -dname fr.net.intra, rc=0
150121180541 adding SEARCH: lab.chmod666.org prod.chmod666.org
150121180541 Success: /usr/ios/cli/ioscli cfgnamesrv -add -slist lab.chmod666.org prod.chmod666.org, rc=0
[..]
150121180542 Success: found fcs device for physical location WZSKM8U-P1-C2-T4: fcs3
150121180542 Processed the following FCS attributes: fcsdevice=fcs4,fcs5,fcs6,fcs7,fcs0,fcs1,fcs2,fcs3,fcsattrid=fcsAttributes,port=WZSKM8U-P1-C1-C1-T1,WZSKM8U-P1-C1-C1-T2,WZSKM8U-P1-C1-C1-T3,WZSKM8U-P1-C1-C1-T4,WZSKM8U-P1-C2-T1,WZSKM8U-P1-C2-T2,WZSKM8U-P1-C2-T3,WZSKM8U-P
1-C2-T4,max_xfer_size=0x100000,num_cmd_elems=2048
150121180544 Processed the following FSCSI attributes: fcsdevice=fcs4,fcs5,fcs6,fcs7,fcs0,fcs1,fcs2,fcs3,fscsiattrid=fscsiAttributes,port=WZSKM8U-P1-C1-C1-T1,WZSKM8U-P1-C1-C1-T2,WZSKM8U-P1-C1-C1-T3,WZSKM8U-P1-C1-C1-T4,WZSKM8U-P1-C2-T1,WZSKM8U-P1-C2-T2,WZSKM8U-P1-C2-T3,WZS
KM8U-P1-C2-T4,fc_err_recov=fast_fail,dyntrk=yes
[..]
150121180546 Success: found device U78AA.001.WZSKM8U-P2-D4: hdisk0
150121180546 Success: found device U78AA.001.WZSKM8U-P2-D5: hdisk1
150121180546 Mirror hdisk0 -->  hdisk1
150121180547 Success: extendvg -f rootvg hdisk1, rc=0
150121181638 Success: mirrorvg rootvg hdisk1, rc=0
150121181655 Success: bosboot -ad hdisk0, rc=0
150121181709 Success: bosboot -ad hdisk1, rc=0
150121181709 Success: bootlist -m normal hdisk0 hdisk1, rc=0
150121181709 VIOmirror <- rc=0
150121181709 VIObuild <- rc=0
150121181709 Preparing to reboot in 10 seconds, press control-C to abort

The new server was deployed in one command and you avoid any manual mistake by using the toolkit. The example above is just one of the many was to use the toolkit. This is a very powerful and simple tool and I really want to see other Europe customers using it, so ask you IBM Pre-sales, ask for PowerCare and take the control of you deployment by using the toolkit. The toolkit is also used to capture and redeploy a whole frame for disaster recovery plan.

Live Partition Mobility Automation Tool

Because understanding the provisioning toolkit didn't takes me one full week we still had plenty of time the with Bonnie from STG Lab Service and we decided to give a try to another tool called Live Partition Mobility Automation Tool. I'll not talk about it in details but this tool allows you to automatize your Live Partition Mobility moves. It's a web interface coming with a tomcat server that you can run on a Linux or directly on your laptop. This web application is taking control of your Hardware Management Console and allows you to do a lot of things LPM related :

You can run a validation on every partitions on a system.
You can move you partitions by spreading or packing them on destination server.
You can "record" a move to replay it later (very very very useful for my previous customer for instance, we were making our moves by clients, all clients were hosted on two big P795)
You can run a dynamic platform optimizer after the moves.
You have an option to move back the partitions to their original location and this is (in my humble opinion) what's make this tool so powerfull

Since I have this tool I'm now running on a week basis a validation of all my partition to check if there are any errors. I'm now using it to move and move back the partitions when I have to. So I really recommends the Live Partition Mobility Automation tool.

Hardware Management Console 8 : Other new features

Adding a VLAN to an already existing Shared Ethernet Adapter

With the new Hardware Management Console you can easily add a new vlan to an already existing Shared Ethernet Adapter (failover and shared, with and without control channel : no restriction) without having to perform a dlpar operation on each Virtual I/O Server and then modifying your profiles (if you do not have the synchronization enabled). Even better by using this method to add your new vlans you will avoid any misconfiguration, for instance by forgetting to add the vlan on one or the Virtual I/O Server or by not choosing the same adapter on both side.

Open the Virtual Network page in the HMC and click "Add a Virtual Network". You have to remember that a Virtual Network Bridge is an Shared Ethernet Adapter, and a Load balance group is a pair of virtual adapters on both Virtual I/O Server with the same PVID :

Choose the name of your vlan (in my case VLAN3331), then choose bridged network (bridged network is the new name for Shared Ethernet Adapters ...), choose "yes" for vlan tagging, and put the vlan id (in my case 3331). By choosing the virtual switch, the HMC will only let you choose a Shared Ethernet Adapter configured in the virtual switch (no mistake possible). DO NOT forget to check the box "Add new virtual network to all Virtual I/O servers" to add the vlan on both sides :

On the next page you have to choose the Shared Ethernet Adapter on which the vlan will be added (in my case this is super easy, I ALWAYS create one Shared Ethernet Adapter per virtual switch to avoid misconfiguration and network loops created by adding with the same vlan id on two differents Shared Ethernet Adapter) :

At last choose or create a new "Load Sharing Group". A load sharing group is one of the virtual adapter of your Shared Ethernet Adapter. In my case my Shared Ethernet Adapter was created with two virtual adapters with id 10 and 11. On this screenshot I'm telling the HMC to add the new vlan on the adapter with the id 10 on both Virtual I/O Servers. You can also create a new virtual adapter to be included in the Shared Ethernet Adapter by choosing "Create a new load sharing group" :

Before applying the configuration a summary is prompted to the user to check the changes :

Partition Templates

You can also use the template to capture and created partitions not just systems. I'll not give you all the details because the HMC is well documented for this part and there is no tricky things to do, just follow the GUI. One more time the HMC8 is for the noobs \o/. Here are a few screenshot of partitions templates (capture and deploy) :

A new a nice look and feel for the new Hardware Management Console

Everybody that the HMC GUI is not very nice but it's working great. One of the major new thing of the HMC 8r8.2.0 is the new GUI. In my opinion the new GUI is awesome the design is nice and I love it. Look at the pictures below :

Conclusion

The Hardware Management Console 8 is still young but offers a lot of new cool features like system and partitions template, performance dashboard and a new GUI. In my opinion the new GUI is slow and there are a lot of bugs for the moment, my advice is to use when you have the time to use it, not in a rush. Learn the new HMC on your own by trying to do all the common tasks with the new GUI (there are still impossible things to do ;-)). I can assure you that you will need more than a few hour to be familiarized with all those new features. And don't forget to call you pre-sales to have a demonstration of the STG's toolkits, both provisioning and LPM are awesome. Use it !

What is going on in this world

This blog is not and will never be the place for political things but with the darkest days we had in France two weeks ago with this insane and inhuman terrorists attacks I had to say a few words about it (because even if my whole life is about AIX for the moment, I'm also an human being .... if you doubt about it). Since the tragic death of 17 men and women in France everybody is raising his voice to tell us (me ?) what is right and what is wrong without thinking seriously about it. Things like this terrorist attack should never happen again. I just wanted to say that I'm for liberty, no only for the "liberty of expression", but just the liberty. By defending this liberty we have to be very careful because in the name of this defense things that are done by our government may take us what we call liberty forever. Are the phone and the internet going to be tapped and logged in the name of the liberty ? Is this liberty ? Think about it and resist.

↧

Using Chef and cloud-init with PowerVC 1.2.2.2 | What’s new in version 1.2.2.2

May 2, 2015, 1:49 pm

≫ Next: Using the Simplified Remote Restart capability on Power8 Scale Out Servers

≪ Previous: Automating systems deployment & other new features : HMC8, IBM Provisioning Toolkit for PowerVM and LPM Automation Tool

I’ve been busy; very busy and I apologize for that … almost two months since the last update on the blog, but I’m still alive and I love AIX more than ever ;-). There is no blog post about it but I’ve developped a tool called “lsseas” which can be useful to all PowerVM administrators (you can find the script on github at this address https://github.com/chmod666org/lsseas). I’ll not talk to much about it but I thought sharing the information to all my readers who are not following me on twitter was the best way to promote the tool. Have a look on it, submit your own changes on github, code and share !

This said we can talk about this new blog post. PowerVC 1.2.2.2 has been released since a few months and there are a few things I wanted to talk about. The new version include new features making the product more powerful than ever (export/import images, activation input, vscsi lun management). PowerVC is only building “empty” machine, it’s a good start but we can do better. The activation engine can customize the virtual machines but is limited and in my humble opinion not really usable for post-installation tasks. With the recent release of cloud-init and Chef for AIX PowerVC can be utilized to build your machines from nothing … and finally get your application running in minutes. Using cloud-init and Chef can help you making your infrastructure repeatable, “versionable” and testable this is what we call infrastructure as code and it is damn powerful.

A big thank you to Jay Kruemcke (@chromeaix), Philippe Hermes (@phhermes) and S.Tran (https://github.com/transt) , they gave me very useful help about the cloud-init support on AIX. Follow them on twitter !

PowerVC 1.2.2.1 mandatory fixes

Before starting please note that I strongly recommend to have the latest ifixes installed on your Virtual I/O Server. These ones are mandatory for PowerVC, install these ifixes no matter what :

On Virtual I/O Servers install IV66758m4c, rsctvios2:

# emgr -X -e /mnt/VIOS_2.2.3.4_IV66758m4c.150112.epkg.Z
# emgr -l
[..]
ID  STATE LABEL      INSTALL TIME      UPDATED BY ABSTRACT
=== ===== ========== ================= ========== ======================================
1    S    rsctvios2  03/03/15 12:13:42            RSCT fixes for VIOS
2    S    IV66758m4c 03/03/15 12:16:04            Multiple PowerVC fixes VIOS 2.2.3.4
3    S    IV67568s4a 03/03/15 14:12:45            man fails in VIOS shell
[..]

Check you have the latest version of the Hardware Management Console (I strongly recommend v8.2.20 Service Pack 1):

hscroot@myhmc:~> lshmc -V
"version= Version: 8
 Release: 8.2.0
 Service Pack: 1
HMC Build level 20150216.1
","base_version=V8R8.2.0
"

Exporting and importing image from another PowerVC

The PowerVC latest version allows you to export and import images. It’s a good thing ! Let’s say that like me you have a few PowerVC hosts, on different SAN networks with different storage arrays, you probably do not want to create your images on each one and you prefer to be sure to use the same image for each PowerVC. Just create one image and use the export/import feature to copy/move this image to a different storage array or PowerVC host:

To do so map your current image disk on the PowerVC itself (in my case by using the SVC), you can’t attach volume used for an image volume directly from PowerVC so you have to do it on the storage side by hand:

On the PowerVC host, rescan the volume and copy the whole new discovered lun with a dd:

powervc_source# rescan-scsi-bus.sh
[..]
powervc_source# multipath -ll
mpathe (3600507680c810010f800000000000097) dm-10 IBM,2145
[..]
powervc_source# dd if=/dev/mapper/mpathe of=/data/download/aix7100-03-04-cloudinit-chef-ohai bs=4M
16384+0 records in
16384+0 records out
68719476736 bytes (69 GB) copied, 314.429 s, 219 MB/s

Map a new volume to the new PowerVC server and upload this new created file on the new PowerVC server, then dd the file back to the new volume:

powervc_dest# scp /data/download/aix7100-03-04-cloudinit-chef-ohai new_powervc:/data/download
aix7100-03-04-cloudinit-chef-ohai          100%   64GB  25.7MB/s   42:28.
powervc_dest# dd if=/data/download/aix7100-03-04-cloudinit-chef-ohai of=/dev/mapper/mpathc bs=4M
16384+0 records in
16384+0 records out
68719476736 bytes (69 GB) copied, 159.028 s, 432 MB/s

Unmap the volume from the new PowerVC after the dd operation, and import it with the PowerVC graphical interface.

Manage the existing current volume you just created (note that the current PowerVC code does not allows you to choose cloud-init as an activation engine even if it is working great) :

Import the image:

You can also use the command powervc-volume-image-import to import the new volume by using the command line instead of the graphical user interface. Here is an example with a Red Hat Enterprise Linux 6.4 image:

powervc_source# dd if=/dev/hdisk4 of=/apps/images/rhel-6.4.raw bs=4M
5815360+0 records in
15360+0 records out
powervc_dest# scp 10.255.248.38:/apps/images/rhel-6.4.raw .
powervc_dest# dd if=/home/rhel-6.4.raw of=/dev/mapper/mpathe
30720+0 records in
30720+0 records out
64424509440 bytes (64 GB) copied, 124.799 s, 516 MB/s
powervc_dest# powervc-volume-image-import --name rhel64 --os rhel --volume volume_capture2 --activation-type ae
Password:
Image creation complete for image id: e3a4ece1-c0cd-4d44-b197-4bbbc2984a34

Activation input (cloud-init and ae)

Instead of doing post-installation tasks by hand after the deployment of the machine you can now use the activation input added recently in PowerVC. The activation input can be utilized to run any scripts you want or even better things (such as cloud-config syntax) if you are using cloud-init instead of the old activation engine. You have to remember that cloud-init is not yet officially supported by PowerVC, for this reason I think most of customers will still use the old activation engine. Latest activation engine version is also working with the activation input. On the examples below I’m of course using cloud-init :-). Don’t worry I’ll detail later in this post how to install and use cloud-init on AIX:

If you are using the activation engine please be sure to use the latest version. The current version of the activation engine in PowerVC 1.2.2.* is vmc-vsae-ext-2.4.5-1, the only way to be sure your are using this version is to check the size of /opt/ibm/ae/AS/vmc-sys-net/activate.py. The size of this file is 21127 bytes for the latest version. Check this before trying to do anything with the activation input. More information can be found here: Activation input documentation.
A simple shebang script can be used, on the example below this one is just writing a file, but it can be anything you want:

# cat /tmp/activation_input
Activation input was used on this server

If you are using cloud-init you can directly put cloud-config “script” in the activation input. The first line is always mandatory to tell the format of the activation input. If you forget to put this first line the activation input can not determine the format and the script will not be executed. Check the next point for more information about activation input:

# cat /tmp/activation_input
cloud-config activation input

There are additional fields called “server meta data key/value pairs”, just do not use them. They are used by images provided by IBM with customization of the activation engine. Forget about this it is useless, use this field only if IBM told you to do so.

cloud-init valid activation input can be found here: http://cloudinit.readthedocs.org/en/latest/topics/format.html. As you can see on the two examples above shell scripts and cloud-config format can be utilized, but you can also upload a gzip archive, or use a part handler format. Go on the url above for more informations.

vscsi and mix NPIV/vscsi machine creation

This is one of the major enhancement, PowerVC is now able create and map vscsi disks, even better you can create mixed NPIV vscsi machine. To do so create storage connectivity groups for each technology you want to use. You can choose a different way to create disk for boot volumes and for data volumes. Here are three examples, full NPIV, full vscsi, and a mixed vscsi(boot) and NPIV(data):

What is really cool about this new feature is that PowerVC can use existing mapped luns on the Virtual I/O Server, please note that PowerVC will only use SAN backed devices and cannot use iSCSI or local disk (local disk can be use in the express version). You obviously have to make the zoning of your Virtual I/O Server by yourself. Here is an example where I have 69 devices mapped to my Virtual I/O Server, you can see that PowerVC is using one of the existing device for its deployment. This can be very useful if you have different teams working for the SAN and the system side, the storage guys will not change their habits and still can map you bunch of luns on the Virtual I/O Server, this can be used as a transition if you did not succeed in convincing guys from you storage team:

$ lspv | wc -l
      69

$ lspv | wc -l
      69
$ lsmap -all -fmt :
vhost1:U8202.E4D.845B2DV-V2-C28:0x00000009:vtopt0:Available:0x8100000000000000:/var/vio/VMLibrary/vopt_c1309be1ed244a5c91829e1a5dfd281c: :N/A:vtscsi1:Available:0x8200000000000000:hdisk66:U78AA.001.WZSKM6P-P1-C3-T1-W500507680C11021F-L41000000000000:false

Please note that you still need to add fabrics and storage on PowerVC even if you have pre-mapped luns on your Virtual I/O Servers. This is mandatory for PowerVC image management and creation.

Maintenance Mode

This last feature is probably the one I like the most. You can now put your host in maintenance mode, this means that when you put a host in maintenance mode all the virtual machines hosted on this one are migrated with live partition mobility (remember the migrlpar –all option, I’m pretty sure this option is utilized for the PowerVC maintenance mode). By putting an host in maintenance mode this one is no longer available for new machines deployment and for mobility operations. The host can be shutdown for instance for a firmware upgrade.

Select a host and click the “Enter maintenance mode button”:

Choose where you want to move virtual machines, or let PowerVC decide for you (packing or stripping placement policy):

The host is entering maintenance mode:

Once the host is in maintenance mode this one is ready to be shutdown:

Leave the maintenance mode when you are ready:

An overview of Chef and cloud-init

With PowerVC you are now able to deploy new AIX virtual machines in a few minutes but there is still some work to do. What about post-installation tasks ? I’m sure that most of you are using NIM post-install scripts for post installation tasks. PowerVC does not use NIM and even if you can run your own shell scripts after a PowerVC deployment the goal of this tool is to automate a full installation… post-install included.

If the activation engine do the job to change the hostname and ip address of the machine it is pretty hard to customize it to do other tasks. Documentation is hard to find and I can assure you that it is not easy at all to customize and maintain. PowerVC Linux user’s are probably already aware of cloud-init. cloud-init is a tool (like the activation engine) in charge of the reconfiguration of your machine after its deployment, as the activation engine do today cloud-init change the hostname and the ip address of the machine but it can do way more than that (create user, add ssh-keys, mounting a filesystem, …). The good news is that cloud-init is now available an AIX since a few days, and you can use it with PowerVC. Awesome \o/.

If cloud-init can do one part of this job, it can’t do all and is not designed for that! It is not a configuration management tool, configurations are not centralized in a server, there is now way to create cookbooks, runbooks (or whatever you call it), you can’t pull product sources from a git server, there are a lot of things missing. cloud-init is a light tool designed for a simple job. I recently (at work and in my spare time) played a lot with configuration management tools. I’m a huge fan of Saltstack but unfortunately salt-minion (which are Saltstack clients) is not available on AIX… I had to find another tool. A few months ago Chef (by Opscode) announced the support of AIX and a release of chef-client for AIX, I decided to give it a try and I can assure you that this is damn powerful, let me explain this further.

Instead of creating shell scripts to do your post installation, Chef allows you to create cookbooks. Cookbooks are composed by recipes and each recipes is doing a task, for instance install an Oracle client, create the home directory for root user and create its profile file, enable or disable service on the system. The recipes are coded in a Chef language, and you can directly put Ruby code inside a recipe. Chef recipes are idempotent, it means that if something has already be done, it will not be done again. The advantage of using a solution like this is that you don’t have to maintain shell code and shells scripts which are difficult to change/rewrite. Your infrastructure is repeatable and changeable in minutes (after Chef is installed you can for instance told him to change /etc/resolv.conf for all your Websphere server). This is called “infrastructure as a code”. Give it a try and you’ll see that the first thing you’ll think will be “waaaaaaaaaaaaaooooooooooo”.

Trying to explain how PowerVC, cloud-init and Chef can work together is not really easy, a nice diagram is probably better than a long text:

You have built an AIX virtual machine. On this machine cloud-init is installed, Chef client 12 is installed. cloud-init is configured to register the chef-client on the chef-server, and to run a cookbook for a specific role. This server has been captured with PowerVC and is now ready to be deployed.
Virtual machines are created with PowerVC.
When the machine is built cloud-init is running on first boot. The ip address and the hostname of this machine is changed with the values provided in PowerVC. cloud-init create the chef-client configuration (client.rb, validation.pem). Finally chef-client is called.
chef-client is registering on chef-server. Machine are now known by the chef-server.
chef-client is resolving and downloading cookbooks for a specific role. Cookbooks and recipes are executed on the machine. After cookbooks execution the machine is ready and configured.
Administrator create and upload cookbooks an recipe from his knife workstation. (knife is the tool to interact with the chef-server this one can be hosted anywhere you want, your laptop, a server …)

In a few step here is what you need to do to use PowerVC, cloud-init, and Chef together:

Create a virtual machine with PowerVC.
Download cloud-init, and install cloud-init in this virtual machine.
Download chef-client, and install chef-client in this virtual machine.
Configure cloud-init, modifiy /opt/freeware/etc/cloud.cfg. In this file put the Chef configuration of the cc_chef cloud-init module.
Create mandatory files, such as /etc/chef directory, put your ohai plugins in /etc/chef/ohai-plugins directory.
Stop the virtual machine.
Capture the virtual machine with PowerVC.
Obviously as prerequisites a chef-server is up and running, cookbooks, recipes, roles, environments are ok in this chef-server.

cloud-init installation

cloud-init is now available on AIX, but you have to build the rpm by yourself. Sources can be found on github at this address : https://github.com/transt/cloud-init-0.7.5. There are a lot of prerequisites, most of them can be found on the github page, some of them on famous perzl site, download and install these prerequisites; it is mandatory (links to download the prerequisites are on the github page, the zip file containing cloud-init can be downloaded here : https://github.com/transt/cloud-init-0.7.5/archive/master.zip

# rpm -ivh --nodeps gettext-0.17-8.aix6.1.ppc.rpm
[..]
gettext                     ##################################################
# for rpm in bzip2-1.0.6-2.aix6.1.ppc.rpm db-4.8.24-4.aix6.1.ppc.rpm expat-2.1.0-1.aix6.1.ppc.rpm gmp-5.1.3-1.aix6.1.ppc.rpm libffi-3.0.11-1.aix6.1.ppc.rpm openssl-1.0.1g-1.aix6.1.ppc.rpm zlib-1.2.5-6.aix6.1.ppc.rpm gdbm-1.10-1.aix6.1.ppc.rpm libiconv-1.14-1.aix6.1.ppc.rpm bash-4.2-9.aix6.1.ppc.rpm info-5.0-2.aix6.1.ppc.rpm readline-6.2-3.aix6.1.ppc.rpm ncurses-5.9-3.aix6.1.ppc.rpm sqlite-3.7.15.2-2.aix6.1.ppc.rpm python-2.7.6-1.aix6.1.ppc.rpm python-2.7.6-1.aix6.1.ppc.rpm python-devel-2.7.6-1.aix6.1.ppc.rpm python-xml-0.8.4-1.aix6.1.ppc.rpm python-boto-2.34.0-1.aix6.1.noarch.rpm python-argparse-1.2.1-1.aix6.1.noarch.rpm python-cheetah-2.4.4-2.aix6.1.ppc.rpm python-configobj-5.0.5-1.aix6.1.noarch.rpm python-jsonpointer-1.0.c1ec3df-1.aix6.1.noarch.rpm python-jsonpatch-1.8-1.aix6.1.noarch.rpm python-oauth-1.0.1-1.aix6.1.noarch.rpm python-pyserial-2.7-1.aix6.1.ppc.rpm python-prettytable-0.7.2-1.aix6.1.noarch.rpm python-requests-2.4.3-1.aix6.1.noarch.rpm libyaml-0.1.4-1.aix6.1.ppc.rpm python-setuptools-0.9.8-2.aix6.1.noarch.rpm fdupes-1.51-1.aix5.1.ppc.rpm ; do rpm -ivh $rpm ;done
[..]
python-oauth                ##################################################
python-pyserial             ##################################################
python-prettytable          ##################################################
python-requests             ##################################################
libyaml                     ##################################################

Build the rpm by following the commands below. You can reuse this rpm on every AIX on which you want to install cloud-init package:

# jar -xvf cloud-init-0.7.5-master.zip
inflated: cloud-init-0.7.5-master/upstart/cloud-log-shutdown.conf
# mv cloud-init-0.7.5-master  cloud-init-0.7.5
# chmod -Rf +x cloud-init-0.7.5/bin
# chmod -Rf +x cloud-init-0.7.5/tools
# cp cloud-init-0.7.5/packages/aix/cloud-init.spec.in /opt/freeware/src/packages/SPECS/cloud-init.spec
# tar -cvf cloud-init-0.7.5.tar cloud-init-0.7.5
[..]
a cloud-init-0.7.5/upstart/cloud-init.conf 1 blocks
a cloud-init-0.7.5/upstart/cloud-log-shutdown.conf 2 blocks
# gzip cloud-init-0.7.5.tar
# cp cloud-init-0.7.5.tar.gz /opt/freeware/src/packages/SOURCES/cloud-init-0.7.5.tar.gz
# rpm -v -bb /opt/freeware/src/packages/SPECS/cloud-init.spec
[..]
Requires: cloud-init = 0.7.5
Wrote: /opt/freeware/src/packages/RPMS/ppc/cloud-init-0.7.5-4.1.aix7.1.ppc.rpm
Wrote: /opt/freeware/src/packages/RPMS/ppc/cloud-init-doc-0.7.5-4.1.aix7.1.ppc.rpm
Wrote: /opt/freeware/src/packages/RPMS/ppc/cloud-init-test-0.7.5-4.1.aix7.1.ppc.rpm

Finally install the rpm:

# rpm -ivh /opt/freeware/src/packages/RPMS/ppc/cloud-init-0.7.5-4.1.aix7.1.ppc.rpm
cloud-init                  ##################################################
# rpm -qa | grep cloud-init
cloud-init-0.7.5-4.1

cloud-init configuration

By installing cloud-init package on AIX some entries have been added to /etc/rc.d/rc2.d:

ls -l /etc/rc.d/rc2.d | grep cloud
lrwxrwxrwx    1 root     system           33 Apr 26 15:13 S01cloud-init-local -> /etc/rc.d/init.d/cloud-init-local
lrwxrwxrwx    1 root     system           27 Apr 26 15:13 S02cloud-init -> /etc/rc.d/init.d/cloud-init
lrwxrwxrwx    1 root     system           29 Apr 26 15:13 S03cloud-config -> /etc/rc.d/init.d/cloud-config
lrwxrwxrwx    1 root     system           28 Apr 26 15:13 S04cloud-final -> /etc/rc.d/init.d/cloud-final

The default configuration file is located in /opt/freeware/etc/cloud/cloud.cfg, this configuration file is splited in three parts. The first one called cloud_init_module tells cloud-init to run specifics modules when the cloud-init script is started at boot time. For instance set the hostname of the machine (set_hostname), reset the rmc (reset_rmc) and so on. In our case this part will automatically change the hostname and the ip address of the machine by the values provided in PowerVC at the deployement time. This cloud_init_module part is splited in two, the local one and the normal one. The local on is using information provided by the cdrom build by PowerVC at the time of the deployment. This cdrom provides ip and hostname of the machine, activation input script, nameservers information. The datasource_list stanza tells cloud-init to use the “ConfigDrive” (in our case virtual cdrom) to get ip and hostname needed by some cloud_init_modules. The second one called cloud_config_module tells cloud-init to run specific modules when cloud-config script is called, at this stage the minimal requirements have already been configured by the previous cloud_init_module stage (dns, ip address, hostname are ok). We will configure and setup the chef-client in this stage. The last part called cloud_final_module tells cloud-init to run specific modules when the cloud-final script is called. You can at this step print a final message, reboot the host and so on (In my case host reboot is needed by my install_sddpcm Chef recipe). Here is an overview of the cloud.cfg configuration file:

The datasource_list stanza tells cloud-init to use the virtual cdrom as a source of information:

datasource_list: ['ConfigDrive']

cloud_init_module:

cloud_init_modules:
[..]
 - set-multipath-hcheck-interval
 - update-bootlist
 - reset-rmc
 - set_hostname
 - update_hostname
 - update_etc_host

cloud_config_module:

cloud_config_modules:
[..]
  - mounts
  - chef
  - runcmd

cloud_final_module:

cloud_final_modules:
  [..]
  - final-message

If you do not want to use Chef at all you can modify the cloud.cfg file to fit you needs (running homemade scripts, mounting filesystems …), but my goal here is to do the job with Chef. We will try to do the minimal job with cloud-init, so the goal here is to configure cloud-init to configure chef-client. Anyway I also wanted to play with cloud-init and see its capabilities. The full documentation of cloud-init can be found here https://cloudinit.readthedocs.org/en/latest/. Here are a few thing I just added (the Chef part will be detailed later), but keep in mind you can just use cloud-init without Chef if you want (setup you ssh key, mount or create filesystems, create files and so on):

write_files:
  - path: /tmp/cloud-init-started
    content: |
      cloud-init was started on this server
    permissions: '0755'
  - path: /var/log/cloud-init-sub.log
    content: |
      starting chef logging
    permissions: '0755'

final_message: "The system is up, cloud-init is finished"

EDIT : The IBM developper of cloud-init for AIX just send me a mail yesterday about the new support of cc_power_state. As I need to reboot my host at the end of the build I can with the latest version of cloud-init for AIX use the power_state stanza, I here use poweroff as an example, use reboot … for reboot:

power_state:
 delay: "+5"
 mode: poweroff
 message: cloud-init mandatory reboot for sddpcm
 timeout: 5

Rerun cloud-init for testing purpose

You probably want to test your cloud-init configuration before of after capturing the machine. When cloud-init is launched by the startup script a check is performed to be sure that cloud-init has not already been run. Some “semaphores” files are created in /opt/freeware/var/lib/cloud/instance/sem to tell modules have already been executed. If you want to re-run cloud-init by hand without having to rebuild a machine, just remove these files in this directory :

# rm -rf /opt/freeware/var/lib/cloud/instance/sem

Let’s say we just want to re-run the Chef part:

# rm /opt/freeware/var/lib/cloud/instance/sem/config_chef

To sum up here is what I want to do with cloud-init:

Use the cdrom as datasource.
Set the hostname and ip.
Setup my chef-client.
Print a final message.
Do a mandatory reboot at the end of the installation.

chef-client installation and configuration

Before modifying the cloud.cfg file to tell cloud-init to setup the Chef client we first have to download and install the chef-client on the AIX host we will capture later. Download the Chef client bff file at this address: https://opscode-omnibus-packages.s3.amazonaws.com/aix/6.1/powerpc/chef-12.1.2-1.powerpc.bff and install it:

# installp -aXYgd . chef
[..]
+-----------------------------------------------------------------------------+
                         Installing Software...
+-----------------------------------------------------------------------------+

installp: APPLYING software for:
        chef 12.1.2.1
[..]
Installation Summary
--------------------
Name                        Level           Part        Event       Result
-------------------------------------------------------------------------------
chef                        12.1.2.1        USR         APPLY       SUCCESS
chef                        12.1.2.1        ROOT        APPLY       SUCCESS
# lslpp -l | grep -i chef
  chef                      12.1.2.1    C     F    The full stack of chef
# which chef-client
/usr/bin/chef-client

The configuration file of chef-client created by cloud-init will be created in the /etc/chef directory, by default the /etc/chef directory does not exists, so you’ll have to create it

# mkdir -p /etc/chef
# mkdir -p /etc/chef/ohai_plugins

If -like me- you are using custom ohai plugins, you have two things to do. cloud-init is using templates files to build configuration files needed by Chef. Theses templates files are located in /opt/freeware/etc/cloud/templates. Modify the chef_client.rb.tmpl file to add a configuration line for ohai plugin_path. Copy your ohai plugin in /etc/chef/ohai_plugins:

# tail -1 /opt/freeware/etc/cloud/templates/chef_client.rb.tmpl
Ohai::Config[:plugin_path] << '/etc/chef/ohai_plugins'
# ls /etc/chef/ohai_plugins
aixcustom.rb

Add the chef stanza in the /opt/freeware/cloud/cloud.cfg. After this step the image is ready to be captured (Check ohai plugin configuration if you need one), so the chef-client is already installed. Put the force_install stanza to false, put the server_url, the validation_name of your Chef server, the organization and finally put the validation RSA private key provided in your Chef server (in the example below the key has been truncated for obvious purpose; server_url and validation_name have also been replaced). As you can see below, I tell here to Chef to run all recipes defined in the aix7 cookbook, we'll see later how to create a cookbook and recipes :

chef:
  force_install: false
  server_url: "https://chefserver.lab.chmod666.org/organizations/chmod666"
  validation_name: "chmod666-validator"
  validation_key: |
    -----BEGIN RSA PRIVATE KEY-----
    MIIEpQIBAAKCAQEApj/Qqb+zppWZP+G3e/OA/2FXukNXskV8Z7ygEI9027XC3Jg8
    [..]
    XCEHzpaBXQbQyLshS4wAIVGxnPtyqXkdDIN5bJwIgLaMTLRSTtjH/WY=
    -----END RSA PRIVATE KEY-----
  run_list:
    - "role[aix7]"

runcmd:
  - /usr/bin/chef-client

EDIT: With the latest build of cloud-init for AIX there is no need to run chef-client with the runcmd stanza. Just add exec: 1 in the chef stanza.

To sum up, cloud-init is installed, cloud-init is configured to run a few actions at boot time but mainly to configure chef-client and run it with a specific role> The chef-client is installed. The machine can now be shutdown and is ready to be deployed. At the deployement time cloud-init will do the job to change ip address and hostname, and configure Chef. Chef will retreive the cookbooks and recipes and run it on the machine.

If you want to use custom ohai plugins read the ohai part before capturing your machine.

Use chef-solo for testing

You will have to create your own recipes. My advice is to use chef-solo to debug. The chef-solo binary file is provided with the chef-client package. This one can be use without a Chef server to run and execute Chef recipes:

Create a test recipe:

# mkdir -p ~/chef/cookbooks/testing/recipes
# cat  ~/chef/cookbooks/testing/recipes/test.rb
file "/tmp/helloworld.txt" do
  owner "root"
  group "system"
  mode "0755"
  action :create
  content "Hello world !"
end

Create a run_list with you test recipe:

# cat ~/chef/node.json
{
  "run_list": [ "recipe[testing::test]" ]
}

Create attribute file for chef-solo execution:

# cat  ~/chef/solo.rb
file_cache_path "/root/chef"
cookbook_path "/root/chef/cookbooks"
json_attribs "/root/chef/node.json"

Run chef-solo:

# chef-solo -c /root/chef/solo.rb

cookbooks and recipes example on AIX

Let's say you have written all you recipes using chef-solo on a test server. On the Chef server you now want to put all these recipes in a cookbook. From the workstation, create a cookbook :

# knife cookbook create test
** Creating cookbook test in /home/kadmin/.chef/cookbooks
** Creating README for cookbook: aix7
** Creating CHANGELOG for cookbook: aix7
** Creating metadata for cookbook: aix7

In the .chef directory you can now find a directory for the aix7 cookbook. In this one you will find a directory for each Chef objects : recipes, templates, files, and so on. This place is called the chef-repo. I strongly recommend using this place as a git repository (you will by doing this save all modifications of any object in the cookbook).

# ls /home/kadmin/.chef/cookbooks/aix7/recipes
create_fs_rootvg.rb  create_profile_root.rb  create_user_group.rb  delete_group.rb  delete_user.rb  dns.rb  install_sddpcm.rb  install_ssh.rb  ntp.rb  ohai_custom.rb  test_ohai.rb
# ls /home/kadmin/.chef/cookbooks/aix7/templates/default
aixcustom.rb.erb  ntp.conf.erb  ohai_test.erb  resolv.conf.erb

Recipes

Here are a few examples of my own recipes:

install_ssh, the recipe is mounting an nfs filesystem (nim server). The nim_server is an attribute coming from role default attribute (we will check that later), the oslevel is an ohai attribute coming from an ohai custom plugin (we will check that later too). openssh.license and openssh.server filesets are installed, the filesystem is unmounted, and finally ssh service is started:

# creating temporary directory
directory "/var/mnttmp" do
  action :create
end
# mouting nim server
mount "/var/mnttmp" do
  device "#{node[:nim_server]}:/export/nim/lppsource/#{node['aixcustom']['oslevel']}"
  fstype "nfs"
  action :mount
end
# installing ssh packages (openssh.license, openssh.base)
bff_package "openssh.license" do
  source "/var/mnttmp"
  action :install
end
bff_package "openssh.base" do
  source "/var/mnttmp"
  action :install
end
# umount the /var/mnttmp directory
mount "/var/mnttmp" do
  fstype "nfs"
  action :umount
end
# deleting temporary directory
directory "/var/mnttmp" do
  action :delete
end
# start and enable ssh service
service "sshd" do
  action :start
end

install_sddpcm, the recipe is mounting an nfs filesystem (nim server). The nim_server is an attribute coming from role default attribute (we will check that later), the platform_version is coming from ohai. devices.fcp.disk.ibm.mpio and devices.sddpcm.71.rte filesets are installed, the filesystem is unmounted:

# creating temporary directory
directory "/var/mnttmp" do
  action :create
end
# mouting nim server
mount "/var/mnttmp" do
  device "#{node[:nim_server]}:/export/nim/lpp_source/#{node['platform_version']}/sddpcm-71-2660"
  fstype "nfs"
  action :mount
end
# installing sddpcm packages (devices.fcp.disk.ibm.mpio, devices.sddpcm.71.rte)
bff_package "devices.fcp.disk.ibm.mpio" do
  source "/var/mnttmp"
  action :install
end
bff_package "devices.sddpcm.71.rte" do
  source "/var/mnttmp"
  action :install
end
# umount the /var/mnttmp directory
mount "/var/mnttmp" do
  fstype "nfs"
  action :umount
end
# deleting temporary directory
directory "/var/mnttmp" do
  action :delete
end

create_fs_rootvg, some filesystems are extended, an /apps filesystem is created and mounted. Please note that there are no cookbooks for AIX lvm for the moment and you have here to use the execute statement which is the only not to be idempotent:

execute "hd3" do
  command "chfs -a size=1024M /tmp"
end
execute "hd9var" do
  command "chfs -a size=512M /var"
end
execute "/apps" do
  command "crfs -v jfs2 -g rootvg -m /apps -Ay -a size=1M ; chlv -n appslv fslv00"
  not_if { ::File.exists?("/dev/appslv")}
end
mount "/apps" do
  device "/dev/appslv"
  fstype "jfs2"
end

ntp, ntp.conf.erb located in the template directory is copied to /etc/ntp.conf:

template "/etc/ntp.conf" do
  source "ntp.conf.erb"
end

dns, resolv.conf.erb located in the template directory is copied to /etc/resolv.conf:

template "/etc/resolv.conf" do
  source "resolv.conf.erb"
end

crearte_user_group, a user for tadd is created:

user "taddmux" do
  gid 'sys'
  uid 421
  home '/home/taddmux'
  comment 'user TADDM connect SSH'
end

Templates

On the recipes above templates are used for ntp and dns configuration. Templates files are files in which some strings are replaced by Chef attributes found in the roles, the environments, in ohai, or even directly in recipes, here are the two files I used for dns and ntp

ntp.conf.erb, ntpserver1,2,3 attributes are found in environments (let's say I have siteA and siteB and ntp are different for each site, I can define an environment for siteA en siteB):

[..]
server <%= node['ntpserver1'] %>
server <%= node['ntpserver2'] %>
server <%= node['ntpserver3'] %>
driftfile /etc/ntp.drift
tracefile /etc/ntp.trace

resolv.conf.erb, nameserver1,2,3 and namesearch are found in environments:

search  <%= node['namesearch'] %>
nameserver      <%= node['nameserver1'] %>
nameserver      <%= node['nameserver2'] %>
nameserver      <%= node['nameserver3'] %>

role assignation

Chef roles can be used to run different chef recipes depending of the type of server you want to post install. You can for instance create a role for webserver in which the Websphere recipe will be executed and create a role for databases server in which the recipe for Oracle will be executed. In my case and for the simplicity of this example I just create one role called aix7

# knife role create aix7
Created role[aix7]
# knife role edit aix7
{
  "name": "aix7",
  "description": "",
  "json_class": "Chef::Role",
  "default_attributes": {
    "nim_server": "nimsrv01"
  },
  "override_attributes": {

  },
  "chef_type": "role",
  "run_list": [
    "recipe[aix7::ohai_custom]",
    "recipe[aix7::create_fs_rootvg]",
    "recipe[aix7::create_profile_root]",
    "recipe[aix7::test_ohai]",
    "recipe[aix7::install_ssh]",
    "recipe[aix7::install_sddpcm]",
    "recipe[aix7::ntp]",
    "recipe[aix7::dns]"
  ],
  "env_run_lists": {

  }
}

What we can se here are two important things. We created an attribute specific to this role called nim_server. In all recipes, templates "node['nim_server']" will be replaced by nimsrv01 (remember the recipes above, and remember we told chef-client to run the aix7 role). We created a run_list telling that recipes coming from aix7 cookbook : install_ssh, install_sddpcm, ... should be exectued on a server calling chef-client with the aix7 role.

environments

Chef environments can be use to separate you environments, for instance production, developpement, backup, or in my example sites. In my company depending the site on which you are building a machine nameservers and ntp servers will differ. Remember that we are using templates files for resolv.conf and ntp.conf files :

knife environment show siteA
chef_type:           environment
cookbook_versions:
default_attributes:
  namesearch:  lab.chmod666.org chmod666.org
  nameserver1: 10.10.10.10
  nameserver2: 10.10.10.11
  nameserver3: 10.10.10.12
  ntpserver1:  11.10.10.10
  ntpserver2:  11.10.10.11
  ntpserver3:  11.10.10.12
description:         production site
json_class:          Chef::Environment
name:                siteA
override_attributes:

When chef-client will be called with -E siteA attribute it will replace node['namesearch'] by "lab.chmod666.org chomd666.org" in all recipes, and templates files.

A Chef run

When you are ok with your cookbook upload it to the Chef server:

# knife cookbook upload aix7
Uploading aix7           [0.1.0]
Uploaded 1 cookbook.

When chef-client is not executed by cloud-init you can run it by hand. I thought it is interessting to put an output of chef-client here, you can see that files are modified, packages installed and so on ;-) :

Ohai

ohai is a command delivered with chef-client. Its purpose is to gather information about the machine on which chef-client is executed. Each time chef-client is running a call to ohai is launched. By default ohai is gathering a lot of information such as ip address of the machine, the lpar id, the lpar name, and so on. A call to ohai is returning a json tree. Each element of this json tree can be accessed in Chef recipes or in Chef templates. For instance to get the lpar name the 'node['virtualization']['lpar_name']' can be called. Here is an example of a single call to ohai:

# ohai | more
  "ipaddress": "10.244.248.56",
  "macaddress": "FA:A3:6A:5C:82:20",
  "os": "aix",
  "os_version": "1",
  "platform": "aix",
  "platform_version": "7.1",
  "platform_family": "aix",
  "uptime_seconds": 14165,
  "uptime": "3 hours 56 minutes 05 seconds",
  "virtualization": {
    "lpar_no": "7",
    "lpar_name": "s00va9940866-ada56a6e-0000004d"
  },

At the time of writing this blog post there is -at my humble opinion- some attirbutes missing in ohai. For instance if you want to install a specific package from an lpp_source you first need to know what is you current oslevel (I mean the output of oslevel -s). Fortunately ohai can be surcharged by custom plugin and you can add your own attributes what ever it is.

In ohai 7 (the one shipped with chef-client 12) an attribute needs to be added to the Chef client.rb configuration to tells where the ohai plugins will be located. Remember that the chef-client is configured by cloud-init, to do so you need to modify the template used by cloud-init the build the client.rb file. This one is located in /opt/freeware/etc/cloud/template:

# tail -1 /opt/freeware/etc/cloud/templates/chef_client.rb.tmpl
Ohai::Config[:plugin_path] << '/etc/chef/ohai_plugins'
# mkdir -p /etc/chef/ohai_plugins

After this modification the machine is ready to be captured.

You want your custom ohai plugins to be uploaded to the chef-client machine at the time of chef-client execution, here is an example of custom ohai plugin used as a template. This one will gather the oslevel (oslevel -s), the node name, the partition name and the memory mode of the machine. These attributes are gathered with lparstat command:

Ohai.plugin(:Aixcustom) do
  provides "aixcustom"

  collect_data(:aix) do
    aixcustom Mash.new

    oslevel = shell_out("oslevel -s").stdout.split($/)[0]
    nodename = shell_out("lparstat -i | awk -F ':' '$1 ~ \"Node Name\" {print $2}'").stdout.split($/)[0]
    partitionname = shell_out("lparstat -i | awk -F ':' '$1 ~ \"Partition Name\" {print $2}'").stdout.split($/)[0]
    memorymode = shell_out("lparstat -i | awk -F ':' '$1 ~ \"Memory Mode\" {print $2}'").stdout.split($/)[0]

    aixcustom[:oslevel] = oslevel
    aixcustom[:nodename] = nodename
    aixcustom[:partitionname] = partitionname
    aixcustom[:memorymode] = memorymode
  end
end

The custom ohai plugin is written. Remember that you want this one to be uploaded on the machine a the chef-client execution. New attributes created by this plugin needs to be added in ohai. Here is a recipe uploading the custom ohai plugin, at the time the plugin is uploaded ohai is reloaded and new attributes can be utilized in any further templates (for recipes you have no other choice than putting the custom ohai plugin in the directroy before the capture):

cat ~/.chef/cookbooks/aix7/recipes/ohai_custom.rb
ohai "reload" do
  action :reload
end

template "/etc/chef/ohai_plugins/aixcustom.rb" do
  notifies :reload, "ohai[reload]", :immediately
end

chef-server, chef workstation, knife

I'll not detail here how to setup a Chef server, and how configure you Chef workstation (knife). There are plenty of good tutorials about that on the internet. Please just note that you need to use Chef sever 12 if you are using Chef client 12. Here are some good link to start.

Chef server installation : https://docs.chef.io/install_server.html
Knife documentation: http://docs.chef.io/knife.html

I had some difficulties during the configuration here are a few tricks to know :

cacert can by found here: /opt/opscode/embedded/ssl/cert/cacert.pem
The Chef validation key can be found in /etc/chef/chef-validator.pem

Building the machine, checking the logs

The write_file part was executed, the file is present in /tmp filesystem:

# cat /tmp/cloud-init-started
cloud-init was started on this server

The chef-client was configured, file are present in /etc/chef directory, looking at the log file these files were created by cloud-init

# ls -l /etc/chef
total 32
-rw-------    1 root     system         1679 Apr 26 23:46 client.pem
-rw-r--r--    1 root     system          646 Apr 26 23:46 client.rb
-rw-r--r--    1 root     system           38 Apr 26 23:46 firstboot.json
-rw-r--r--    1 root     system         1679 Apr 26 23:46 validation.pem

# grep chef | /var/log/cloud-init-output.log
2015-04-26 23:46:22,463 - importer.py[DEBUG]: Found cc_chef with attributes ['handle'] in ['cloudinit.config.cc_chef']
2015-04-26 23:46:22,879 - util.py[DEBUG]: Writing to /opt/freeware/var/lib/cloud/instances/a8b8fe0d-34c1-4bdb-821c-777fca1c391f/sem/config_chef - wb: [420] 23 bytes
2015-04-26 23:46:22,882 - helpers.py[DEBUG]: Running config-chef using lock ()
2015-04-26 23:46:22,884 - util.py[DEBUG]: Writing to /etc/chef/validation.pem - wb: [420] 1679 bytes
2015-04-26 23:46:22,887 - util.py[DEBUG]: Reading from /opt/freeware/etc/cloud/templates/chef_client.rb.tmpl (quiet=False)
2015-04-26 23:46:22,889 - util.py[DEBUG]: Read 892 bytes from /opt/freeware/etc/cloud/templates/chef_client.rb.tmpl
2015-04-26 23:46:22,954 - util.py[DEBUG]: Writing to /etc/chef/client.rb - wb: [420] 646 bytes
2015-04-26 23:46:22,958 - util.py[DEBUG]: Writing to /etc/chef/firstboot.json - wb: [420] 38 bytes

The runcmd part was executed:

# cat /opt/freeware/var/lib/cloud/instance/scripts/runcmd
#!/bin/sh
/usr/bin/chef-client

2015-04-26 23:46:22,488 - importer.py[DEBUG]: Found cc_runcmd with attributes ['handle'] in ['cloudinit.config.cc_runcmd']
2015-04-26 23:46:22,983 - util.py[DEBUG]: Writing to /opt/freeware/var/lib/cloud/instances/a8b8fe0d-34c1-4bdb-821c-777fca1c391f/sem/config_runcmd - wb: [420] 23 bytes
2015-04-26 23:46:22,986 - helpers.py[DEBUG]: Running config-runcmd using lock ()
2015-04-26 23:46:22,987 - util.py[DEBUG]: Writing to /opt/freeware/var/lib/cloud/instances/a8b8fe0d-34c1-4bdb-821c-777fca1c391f/scripts/runcmd - wb: [448] 31 bytes
2015-04-26 23:46:25,868 - util.py[DEBUG]: Running command ['/opt/freeware/var/lib/cloud/instance/scripts/runcmd'] with allowed return codes [0] (shell=False, capture=False)

The final message was printed in the output of the cloud-init log file

2015-04-26 23:06:01,203 - helpers.py[DEBUG]: Running config-final-message using lock ()
The system is up, cloud-init is finished
2015-04-26 23:06:01,240 - util.py[DEBUG]: The system is up, cloud-init is finished
2015-04-26 23:06:01,242 - util.py[DEBUG]: Writing to /opt/freeware/var/lib/cloud/instance/boot-finished - wb: [420] 57 bytes

On the Chef server you can check the client registred itself and get details about it.

# knife node list | grep a8b8fe0d-34c1-4bdb-821c-777fca1c391f
a8b8fe0d-34c1-4bdb-821c-777fca1c391f
# knife node show a8b8fe0d-34c1-4bdb-821c-777fca1c391f
Node Name:   a8b8fe0d-34c1-4bdb-821c-777fca1c391f
Environment: _default
FQDN:
IP:          10.10.208.61
Run List:    role[aix7]
Roles:       france_testing
Recipes:     aix7::create_fs_rootvg, aix7::create_profile_root
Platform:    aix 7.1
Tags:

What's next ?

If you have a look on the Chef supermarket (the place where you can download Chef cookbooks written by the community and validated by opscode) you'll see that there are not a lot of cookbooks for AIX. I'm currently writting my own cookbook for AIX logical volume manager and filesystems creation, but there is still a lot of work to do on cookbooks creation for AIX. Here is a list of cookbooks that needs to be written by the community : chdev, multibos, mksysb, nim client, wpar, update_all, ldap_client .... I can continue this list but I'm sure that you have a lot of ideas. Last word learn ruby and write cookbooks, they will be used by the community and we can finally have a good configuration management tool on AIX. With PowerVC, cloud-init and Chef support AIX will have a full "DevOps" stack and can finally fight against Linux. As always hope this blog post helps you to understand PowerVC, cloud-init and Chef !

↧

Using the Simplified Remote Restart capability on Power8 Scale Out Servers

June 20, 2015, 12:02 pm

≫ Next: Tips and tricks for PowerVC 1.2.3 (PVID, ghostdev, clouddev, rest API, growing volumes, deleting boot volume) | PowerVC 1.2.3 Redbook

≪ Previous: Using Chef and cloud-init with PowerVC 1.2.2.2 | What’s new in version 1.2.2.2

A few weeks ago I had to work on simplified remote restart. I’m not lucky enough yet -because of some political decisions in my company- to have access to any E880 or E870. We just have a few scale-out machines to play with (S814). For some critical applications we need in the future to be able to reboot the virtual machine if the system hosting the machine has failed (Hardware problem). We decided a couple of month ago not to use remote restart because it was mandatory to use a reserved storage pool device and it was too hard to manage because of this mandatory storage. We now have enough P8 boxes to try and understand the new version of remote restart called simplified remote restart which does not need any reserved storage pool device. For those who want to understand what remote restart is I strongly recommend you to check my previous blog post about remote restart on two P7 boxes: Configuration of a remote restart partition. For the others here is what I learned about the simplified version of this awesome feature.

Please keep in mind that the FSP of the machine must be up to perform a simplified remote restart operation. It means that if for instance you loose one of your datacenter or the link between your two datacenters you cannot use simplified remote restart to restart you partitions on the main/backup site. Simplified Remote Restart only prevents you from an hardware failure of your machine. Maybe this will change in a near future but for the moment it is the most important thing to understand about simplified remote restart.

Updating to the latest version of firmware

I was very surprised when I got my Power8 machines. After deploying these boxes I decided to give a try to simplified remote restart but It was just not possible. Since the Power8 Scale Out servers were release they were NOT simplified remote restart capable. The release of the SV830 firmware now enables the Simplified Remote restart on Power8 Scale Out machines. Please note that there is nothing about it in the patch note, so chmod666.org is the only place where you can get this information :-). Here is the patch note: here. Last word you will find on the internet that you need Power8 to use simplified remote restart. It’s true but partially true. YOU NEED A P8 MACHINE WITH AT LEAST A 820 FIRMWARE.

The first thing to do is to update your firmware to the SV830 version (on both systems participating in the simplified remote restart operation):

# updlic -o u -t sys -l latest -m p814-1 -r mountpoint -d /home/hscroot/SV830_048 -v
[..]
# lslic -m p814-1 -F activated_spname,installed_level,ecnumber
FW830.00,48,01SV830
# lslic -m p814-2 -F activated_spname,installed_level,ecnumber
FW830.00,48,01SV830

You can check the firmware version directly from the Hardware Management Console or in the ASMI:

After the firmware upgrade verify that you now have the Simplfied Remote Restart capability set to true.

# lssyscfg -r sys -F name,powervm_lpar_simplified_remote_restart_capable
p720-1,0
p814-1,1
p720-2,0
p814-2,1

Prerequisites

These prerequisites are true ONLY for Scale out systems:

To update to the firmware SV830_048 you need the latest Hardware Management Console release which is v8r8.3.0 plus MH01514 PTF.
Obviously on Scale out system SV830_048 is the minimum firmware requirement.
Minimum level of Virtual I/O Servers is 2.2.3.4 (for both source and destination systems).
PowerVM enterprise. (to be confirmed)

Enabling simplified remote restart of an existing partition

You probably want to enable simplified remote restart after an LPM migration/evacuation. After migrating your virtual machine(s) to a Power 8 with the Simplified Remote Restart Capability you have to enable this capability on all the virtual machines. This can only be done when the machine is shutdown, so you first have to stop the virtual machines (after a live partition mobility move) if you want to enable the SRR. It can’t be done without having to reboot the virtual machine:

List current partition running on the system and check which one are “simplified remote restart capable” (here only one is simplified remote restart capable):

# lssyscfg -r lpar -m p814-1 -F name,simplified_remote_restart_capable
vios1,0
vios2,0
lpar1,1
lpar2,0
lpar3,0
lpar4,0
lpar5,0
lpar6,0
lpar7,0

For each lpar not simplified remote restart capable change the simplified_remote_restart_capable attribute using the chssyscfg command. Please note that you can’t do this using the Hardware Management Console gui (in the latest 8r8.3.0, when enabling it by the Hardware management console the GUI is telling you that you need a reserved device storage which is needed by the Remote Restart Capability and not by the simplified version of remote restart. You have to use the command line ! (check screenshot below)

You can’t change this attribute while the machine is running:

You can’t do it with the GUI after the machine is shutdown:

The only way to enable this attribute is to do it by using the Hardware Management Console command line (please note in the output below that running lpar cannot be changed):

# for i in lpar2 lpar3 lpar4 lpar5 lpar6 lpar7 ; do chsyscfg -r lpar -m p824-2 -i "name=$i,simplified_remote_restart_capable=1" ; done
An error occurred while changing the partition named lpar6.
HSCLA9F8 The remote restart capability of the partition can only be changed when the partition is shutdown.
An error occurred while changing the partition named lpar7.
HSCLA9F8 The remote restart capability of the partition can only be changed when the partition is shutdown.
# lssyscfg -r lpar -m p824-1 -F name,simplified_remote_restart_capable,lpar_env | grep -v vioserver
lpar1,1,aixlinux
lpar2,1,aixlinux
lpar3,1,aixlinux
lpar4,1,aixlinux
lpar5,1,aixlinux
lpar6,0,aixlinux
lpar7,0,aixlinux

Remote restarting

If you are trying to do a live partition mobility operation back to a P7 or P8 box without the simplified remote restart capability it will not be possible. Enabling the simplified remote restart will force the virtual machine to stay on P8 boxes with simplified remote restart capability. This is one of the reason why most of customers are not doing it:

# migrlpar -o v -m p814-1 -t p720-1 -p lpar2
Errors:
HSCLB909 This operation is not allowed because managed system p720-1 does not support PowerVM Simplified Partition Remote Restart.

On the Hardware Management Console you can see that the virtual machine is simplified remote restart capable by checking its properties:

You can now try to remote restart your virtual machines to another server. As always the status of the server has to be different from Operating (Power Off, Error, Error – Dump in progress, Initializing). As always my advice is to validate before restarting:

# rrstartlpar -o validate -m p824-1 -t p824-2 -p lpar1
# echo $?
0
# rrstartlpar -o restart -m p824-1 -t p824-2 -p lpar1
HSCLA9CE The managed system is not in a valid state to support partition remote restart operations.

# lssyscfg -r sys -F name,state
p824-2,Operating
p824-1,Power Off
# rrstartlpar -o restart -m p824-1 -t p824-2 -p lpar1

By doing a remote restart operation the machine will boot automatically. You can check in the errpt that in most cases the partition ID will be changed (proving that you are on another machine):

# errpt | more
IDENTIFIER TIMESTAMP  T C RESOURCE_NAME  DESCRIPTION
A6DF45AA   0618170615 I O RMCdaemon      The daemon is started.
1BA7DF4E   0618170615 P S SRC            SOFTWARE PROGRAM ERROR
CB4A951F   0618170615 I S SRC            SOFTWARE PROGRAM ERROR
CB4A951F   0618170615 I S SRC            SOFTWARE PROGRAM ERROR
D872C399   0618170615 I O sys0           Partition ID changed and devices recreat

Be very careful with the ghostdev sys0 attribute. Every VM remote restarted needs to have ghostdev set to 0 to avoid an ODM wipe (If you remote restart an lpar with ghostdev set to 1 you will loose all ODM customization)

# lsattr -El sys0 -a ghostdev
ghostdev 0 Recreate ODM devices on system change / modify PVID True

When the source machine is up and running you have to clean the old definition of the remote restarted lpar by launching a cleanup operation. This will wipe the old lpar defintion:>

# rrstartlpar -o cleanup -m p814-1 -p lpar1

The RRmonitor (modified version)

There is a script delivered by IBM called rrMonitor, this one is looking at the PowerSystem‘s state and if this one is in particular state is restarting a specific virtual machine. This script is just not usable by a user because it has to be executed directly on the HMC (you need a pesh password to put the script on the hmc) and is only checking one particular virtual machine. I had to modify this script to ssh to the HMC and then check for every lpar on the machine and not just one in particular. You can download my modified version here : rrMonitor. Here is what’s the script is doing:

Checking the state of the source machine.
If this one is not “Operating”, the script search for every remote restartable lpars on the machine.
The script is launching remote restart operations to remote restart all the partitions.
The script is telling the user the command to cleanup the old lpar when the source machine will be running again.

# ./rrMonitor p814-1 p814-2 all 60 myhmc
Getting remote restartable lpars
lpar1 is rr simplified capable
lpar1 rr status is Remote Restartable
lpar2 is rr simplified capable
lpar2 rr status is Remote Restartable
lpar3 is rr simplified capable
lpar3 rr status is Remote Restartable
lpar4 is rr simplified capable
lpar4 rr status is Remote Restartable
Checking for source server state....
Source server state is Operating
Checking for source server state....
Source server state is Operating
Checking for source server state....
Source server state is Power Off In Progress
Checking for source server state....
Source server state is Power Off
It's time to remote restart
Remote restarting lpar1
Remote restarting lpar2
Remote restarting lpar3
Remote restarting lpar4
Thu Jun 18 20:20:40 CEST 2015
Source server p814-1 state is Power Off
Source server has crashed and hence attempting a remote restart of the partition lpar1 in the destination server p814-2
Thu Jun 18 20:23:12 CEST 2015
The remote restart operation was successful
The cleanup operation has to be executed on the source server once the server is back to operating state
The following command can be used to execute the cleanup operation,
rrstartlpar -m p814-1 -p lpar1 -o cleanup
Thu Jun 18 20:23:12 CEST 2015
Source server p814-1 state is Power Off
Source server has crashed and hence attempting a remote restart of the partition lpar2 in the destination server p814-2
Thu Jun 18 20:25:42 CEST 2015
The remote restart operation was successful
The cleanup operation has to be executed on the source server once the server is back to operating state
The following command can be used to execute the cleanup operation,
rrstartlpar -m sp814-1 -p lpar2 -o cleanup
Thu Jun 18 20:25:42 CEST 2015
[..]

Conclusion

As you can see the Simplified version of the remote restart feature is simpler that the normal one. My advice is to create all your lpars with the simplified remote restart attribute. It’s that easy :). If you plan to LPM back to P6 or P7 box, don’t use simplified remote restart. I think this functionality will become more popular when all the old P7 and P6 will be replaced by P8. As always I hope it helps.

Here are a couple of link with great documentations about Simplified Remote Restart:

Simplified Remote Restart Whitepaper: here
Original rrMonitor: here
Materials about lastest HMC release and a couple of videos related to the Simplified Remote Restart: here

↧

Tips and tricks for PowerVC 1.2.3 (PVID, ghostdev, clouddev, rest API, growing volumes, deleting boot volume) | PowerVC 1.2.3 Redbook

August 5, 2015, 5:29 pm

≫ Next: Updating AIX TL and SP using Chef

≪ Previous: Using the Simplified Remote Restart capability on Power8 Scale Out Servers

Writing a Redbook was one of my main goal. After working days and nights for more than 6 years on PowerSystems IBM gave me the opportunity to write a Redbook. I was looking on the Redbook residencies page since a very very long time to find the right one. As there was nothing new on AIX and PowerVM (which are my favorite topics) I decided to give a try to the latest PowerVC Redbook (this Redbook is an update, but a huge one. PowerVC is moving fast). I am a Redbook reader since I’m working on AIX. Almost all Redbooks are good, most of them are the best source of information for AIX and Power administrators. I’m sure that like me, you saw that part about becoming an author every time you are reading a RedBook. I can now say THAT IT IS POSSIBLE (for everyone). I’m now one of this guys and you can also become one. Just find the Redbook that will fit for you and apply on the Redbook webpage (http://www.redbooks.ibm.com/residents.nsf/ResIndex). I wanted to say a BIG Thank you to all the people who gave me the opportunity to do that, especially Philippe Hermes, Jay Kruemcke, Eddie Shvartsman, Scott Vetter, Thomas R Bosthworth. In addition to these people I wanted also to thanks my teammates on this Redbook: Guillermo Corti, Marco Barboni and Liang Xu, they are all true professional people, very skilled and open … this was a great team ! One more time thank you guys. Last, I take the opportunity here to thanks the people who believed in me since the very beginning of my AIX career: Julien Gabel, Christophe Rousseau, and JL Guyot. Thank you guys ! You deserve it, stay like you are. I’m now not an anonymous guy anymore.

You can download the Redbook at this address: http://www.redbooks.ibm.com/redpieces/pdfs/sg248199.pdf. I’ve learn something during the writing of the Redbook and by talking to the members of the team. Redbooks are not there to tell and explain you what’s “behind the scene”. A Redbook can not be too long, and needs to be written in almost 3 weeks, there is no place for everything. Some topics are better integrated in a blog post than in a Redbook, and Scott told me that a couple of time during the writing session. I totally agree with him. So here is this long awaited blog post. The are advanced topics about PowerVC read the Redbook before reading this post.

Last one thanks to IBM (and just IBM) for believing in me :-). THANK YOU SO MUCH.

ghostdev, clouddev and cloud-init (ODM wipe if using inactive live partition mobility or remote restart)

Everybody who is using cloud-init should be aware of this. Cloud-init is only supported with AIX version that have the clouddev attribute available on sys0. To be totally clear at the time of writing this blog post you will be supported by IBM only if you use AIX 7.1 TL3 SP5 or AIX 6.1 TL9 SP5. All other versions are not supported by IBM. Let me explain why and how you can still use cloud-init on older versions just by doing a little trick. But let’s first explain what the problem is:

Let’s say you have different machines some of them using AIX 7100-03-05 and some of them using 7100-03-04, both use cloud-init for the activation. By looking at cloud-init code at this address here we can say that:

After the cloud-init installation cloud-init is:
Changing clouddev to 1 if sys0 has a clouddev attribute:

# oslevel -s
7100-03-05-1524
# lsattr -El sys0 -a ghostdev
ghostdev 0 Recreate ODM devices on system change / modify PVID True
# lsattr -El sys0 -a clouddev
clouddev 1 N/A True

Changing ghostdev to 1 if sys0 don’t have a clouddev attribute:

# oslevel -s
7100-03-04-1441
# lsattr -El sys0 -a ghostdev
ghostdev 1 Recreate ODM devices on system change / modify PVID True
# lsattr -El sys0 -a clouddev
lsattr: 0514-528 The "clouddev" attribute does not exist in the predefined
        device configuration database.

This behavior can directly be observed in the cloud-init code:

Now that we are aware of that, let’s make a remote restart test between two P8 boxes. I take the opportunity here to present you one of the coolest feature of PowerVC 1.2.3. You can now remote restart your virtual machines directly from the PowerVC GUI if you have one of your host in a failure state. I highly encourage you to check my latest post about this subject if you don’t know how to setup remote restartable partitions http://chmod666.org/index.php/using-the-simplified-remote-restart-capability-on-power8-scale-out-servers/:

Only simplified remote restart can be managed by PowerVC 1.2.3, the “normal” version of remote restart is not handle by PowerVC 1.2.3
In the compute template configuration there is now a checkbox allowing you to create remote restartable partition. Be careful: you can’t go back to a P7 box without having to reboot the machine. So be sure your Virtual Machines will stay on P8 box if you check this option.

When the machine is shutdown or there is a problem on it you can click the “Remotely Restart Virtual Machines” button:

Select the machines you want to remote restart:

While the Virtual Machines are remote restarting, you can check the states of the VM and the state of the host:

After the evacuation the host is in “Remote Restart Evacuated State”:

Let’s now check the state of our two Virtual Machines:

The ghostdev one (the sys0 messages in the errpt indicates that the partition ID has changed AND DEVICES ARE RECREATED (ODM Wipe)) (no more ip address set on en0):

# errpt | more
IDENTIFIER TIMESTAMP  T C RESOURCE_NAME  DESCRIPTION
A6DF45AA   0803171115 I O RMCdaemon      The daemon is started.
1BA7DF4E   0803171015 P S SRC            SOFTWARE PROGRAM ERROR
CB4A951F   0803171015 I S SRC            SOFTWARE PROGRAM ERROR
CB4A951F   0803171015 I S SRC            SOFTWARE PROGRAM ERROR
D872C399   0803171015 I O sys0           Partition ID changed and devices recreat
# ifconfig -a
lo0: flags=e08084b,c0
        inet 127.0.0.1 netmask 0xff000000 broadcast 127.255.255.255
        inet6 ::1%1/0
         tcp_sendspace 131072 tcp_recvspace 131072 rfc1323 1

The clouddev one (the sys0 message in the errpt indicate that the partition ID has changed) (note that the errpt message does not indicate that the devices are recreated):

# errpt |more
60AFC9E5   0803232015 I O sys0           Partition ID changed since last boot.
# ifconfig -a
en0: flags=1e084863,480
        inet 10.10.10.20 netmask 0xffffff00 broadcast 10.244.248.63
         tcp_sendspace 262144 tcp_recvspace 262144 rfc1323 1
lo0: flags=e08084b,c0
        inet 127.0.0.1 netmask 0xff000000 broadcast 127.255.255.255
        inet6 ::1%1/0
         tcp_sendspace 131072 tcp_recvspace 131072 rfc1323 1

VSAE is designed to manage ghostdev only OS on the other hand cloud-init is designed to manage clouddev OS. To be perfectly clear here are how ghostdev and clouddev works. But we first need to answer a question. Why do we need to set clouddev or ghostdev to 1 ? The answer is pretty obvious, one of this attribute needs to be set to 1 before capturing the Virtual Machine. When the Virtual Machines is captured, one of this attributes is set to 1. When you will deploy a new Virtual Machine this flag is needed to wipe the ODM before reconfiguring the virtual machine with the parameters set in the PowerVC GUI (ip, hostname). In both clouddev and ghostdev cases it is obvious that we need to wipe the ODM at the machine build/deploy time. Then VSAE or cloud-init (using config drive datasource) is setting hostname, ip address previously wiped by clouddev and ghostdev attributes. This is working well for a new deploy because we need to wipe the ODM in all cases but what about an inactive live partition mobility or a remote restart operation ? The Virtual Machine has moved (not on the same host, and not with the same lpar ID) and we need to keep the ODM as it is. How is it working:

If you are using VSAE, this one is managing the ghostdev attribute for you. At the capture time ghostdev is set to 1 by VSAE (when you run the pre-capture script). When deploying a new VM, at the activation time, VSAE is setting ghostdev back to 0. Inactive live partition mobility and remote restart operations will work fine with ghostdev set to 0.
If you are using cloud-init on a supported system clouddev is set to 1 at the installation of cloud-init. As cloud-init is doing nothing with both attributes at the activation time IBM needs to find a way to avoid wiping the ODM after the remote restart operation. The clouddev device was introduced: this one is writing a flag in the NVRAM, so when a new VM is built, there is no flag in the NVRAM for this one, the ODM is wiped. When an already existing VM is remote restarted, the flag exists in the NVRAM, the ODM is not wiped. By using clouddev there is no post deploy action needed.
If you are using cloud-init on an unsupported system ghostdev is set to 1 at the installation of cloud-init. As cloud-init is doing nothing at post-deploy time, ghostdev will remains to 1 in all cases and the ODM will always be wiped.

There is a way to use cloud-init on unsupported system. Keep in mind that in this case you will not be supported by IBM. So do this at you own risk. To be totally honest I’m using this method in production to use the same activation engine for all my AIX version:

Pre-capture, set ghostdev to 1. What ever happens THIS IS MANDATORY.
Post-capture, reboot the captured VM and set ghostdev to 0.
Post-deploy on every Virtual machine set ghostdev to 0. You can put this in the activation input to do the job:

#cloud-config
runcmd:
 - chdev -l sys0 -a ghostdev=0

The PVID problem

I realized I had this problem after using PowerVC for a while. As PowerVC images for rootvg and other volume group are created using Storage Volume Controller flashcopy (in case of a SVC configuration, but there are similar mechanisms for other storage providers) the PVID for both rootvg and additional volume groups will always be the same for each new virtual machines (all new virtual machines will have the same PVID for their rootvg, and the same PVID for each captured volume group). I did contact IBM about this and the PowerVC team told me that this behavior is totally normal and was observed since the release of VMcontrol. They didn’t have any issues related to this, so if you don’t care about it, just do nothing and keep this behavior as it is. I recommend doing nothing about this!

It’s a shame but most AIX administrators like to keep things as they are and don’t want any changes. (In my humble opinion this is one of the reason AIX is so outdated compared to Linux, we need a community, not narrow-minded people keeping their knowledge for them, just to stay in their daily job routine without having anything to learn). If you are in this case, facing angry colleagues about this particular point you can use the solution proposed below to calm the passions of the few ones who do not want to change !. :-). This is my rant : CHANGE !

By default if you build two virtual machines and check the PVID of each one, you will notice that the PVID are the same:

Machine A:

root@machinea:/root# lspv
hdisk0          00c7102d2534adac                    rootvg          active
hdisk1          00c7102d00d14660                    appsvg          active

Machine B:

root@machineb:root# lspv
hdisk0          00c7102d2534adac                    rootvg          active
hdisk1          00c7102d00d14660                    appsvg         active

For the rootvg the PVID is always set to 00c7102d2534adac and for the appsvg the PVID is always set to 00c7102d00d14660.

For the rootvg the solution is to change the ghostdev (only the ghostdev) to 2, and to reboot the machine. Putting ghostdev to 2 will change the PVID of the rootvg at the reboot time (after the PVID is changed ghostdev will be automatically set back to 0)

# lsattr -El sys0 -a ghostdev
ghostdev 2 Recreate ODM devices on system change / modify PVID True
# lsattr -l sys0 -R -a ghostdev
0...3 (+1)

For the non rootvg volume group this is a little bit tricky but still possible, the solution is to use the recreatevg (-d option) command to change the PVID of all the physical volumes of your volume group. Before rebooting the server ensure that:

Umount all the filesystems in the volume group on which you want to change the PVID.
varyoff the volume group.
Get the physical volumes names composing the volume group.
export the volume group.
recreate the volume group (this action will change the PVID)
re-import the volume group.

Here is the shell commands doing the trick:

# vg=appsvg
# lsvg -l $vg | awk '$6 == "open/syncd" && $7 != "N/A" { print "fuser -k " $NF }' | sh
# lsvg -l $vg | awk '$6 == "open/syncd" && $7 != "N/A" { print "umount " $NF }' | sh
# varyoffvg $vg
# pvs=$(lspv | awk -v my_vg=$vg '$3 == my_vg {print $1}')
# recreatevg -y $vg -d $pvs
# importvg -y $vg $(echo ${pvs} | awk '{print $1}'

We now agree that you want to do this, but as you are a smart person you want to do it automatically using cloud-init and the activation input, there are two way to do it, the silly way (using shell) and the noble way (using cloudinit syntax):

PowerVC activation engine (shell way)

Use this short ksh script in the activation input, this is not my recommendation, but you can do it for simplicity:

PowerVC activation engine (cloudinit way)

Here is the cloud-init way. Important note: use the latest version of cloud-init, the first one I used had a problem with the cc_power_state_change.py not using the right parameters for AIX:

Working with REST Api

I will not show you here how to work with the PowerVC RESTful API. I prefer to share a couple of scripts on my github account. Nice examples are often better than how to tutorials. So check the scripts on the github if you want a detailed how to … scripts are well commented. Just a couple of things to say before closing this topic: the best way to work with RESTful api is to code in python, there are a lot existing python libs to work with RESTful api (httplib2, pycurl, request). For my own understanding I prefer in my script using the simple httplib. I will put all my command line tools in a github repository called pvcmd (for PowerVC command line). You can download the scripts at this address, or just use git to clone the repo. One more time it is a community project, feel free to change and share anything: https://github.com/chmod666org/pvcmd:

Growing data lun

To be totally honest here is what I do when I’m creating a new machine with PowerVC. My customers always needs one additionnal volume groups for applications (we will call it appsvg). I’ve create a multi volume image with this volume group created (with a bunch of filesystem in it). As most of customers are asking for the volume group to be 100g large the capture was made with this size. Unfortunately for me we often get requests to create a bigger volume groups let’s say 500 or 600 Gb. Instead of creating a new lun and extending the volume group PowerVC allows you to grow the lun to the desired size. For volume group other than the boot one you must use the RESTful API to extend the volume. To do this I’ve created a python script to called pvcgrowlun (feel free to check the code on github) https://github.com/chmod666org/pvcmd/blob/master/pvcgrowlun. At each virtual machine creation I’m checking if the customer needs a larger volume group and extend it using the command provided below.

While coding this script I got a problem using the os-extend parameter in my http request. PowerVC is not exactly using the same parameters as Openstack is, if you want to code by yourself be aware of this and check in the PowerVC online documentation if you are using “extended attributes” (Thanks to Christine L Wang for this one):

In the Openstack documentation the attribute is “os-extend” link here:

In the PowerVC documentation the attribute is “ibm-extend” link here:

Identify the lun you want to grow (as the script is taking the name of the volume as parameter) (I have one not published to list all the volumes, tell me if you want it). In my case the volume name is multi-vol-bf697dfa-0000003a-828641A_XXXXXX-data-1, and I want to change its size from 60 to 80. This is not stated in the offical PowerVC documentation but this will work for both boot and data lun.
Check the size of the lun is lesser than the desired size:

Run the script:

# pvcgrowlun -v multi-vol-bf697dfa-0000003a-828641A_XXXXX-data-1 -s 80 -p localhost -u root -P mysecretpassword
[info] growing volume multi-vol-bf697dfa-0000003a-828641A_XXXXX-data-1 with id 840d4a60-2117-4807-a2d8-d9d9f6c7d0bf
JSON Body: {"ibm-extend": {"new_size": 80}}
[OK] Call successful
None

Check the size is changed after the command execution:

Don’t forget to do the job in the operating system by running a “chvg -g” (check total PPS here):

# lsvg vg_apps
VOLUME GROUP:       vg_apps                  VG IDENTIFIER:  00f9aff800004c000000014e6ee97071
VG STATE:           active                   PP SIZE:        256 megabyte(s)
VG PERMISSION:      read/write               TOTAL PPs:      239 (61184 megabytes)
MAX LVs:            256                      FREE PPs:       239 (61184 megabytes)
LVs:                0                        USED PPs:       0 (0 megabytes)
OPEN LVs:           0                        QUORUM:         2 (Enabled)
TOTAL PVs:          1                        VG DESCRIPTORS: 2
STALE PVs:          0                        STALE PPs:      0
ACTIVE PVs:         1                        AUTO ON:        yes
MAX PPs per VG:     32768                    MAX PVs:        1024
LTG size (Dynamic): 256 kilobyte(s)          AUTO SYNC:      no
HOT SPARE:          no                       BB POLICY:      relocatable
MIRROR POOL STRICT: off
PV RESTRICTION:     none                     INFINITE RETRY: no
DISK BLOCK SIZE:    512                      CRITICAL VG:    no
# chvg -g appsvg
# lsvg appsvg
VOLUME GROUP:       appsvg                  VG IDENTIFIER:  00f9aff800004c000000014e6ee97071
VG STATE:           active                   PP SIZE:        256 megabyte(s)
VG PERMISSION:      read/write               TOTAL PPs:      319 (81664 megabytes)
MAX LVs:            256                      FREE PPs:       319 (81664 megabytes)
LVs:                0                        USED PPs:       0 (0 megabytes)
OPEN LVs:           0                        QUORUM:         2 (Enabled)
TOTAL PVs:          1                        VG DESCRIPTORS: 2
STALE PVs:          0                        STALE PPs:      0
ACTIVE PVs:         1                        AUTO ON:        yes
MAX PPs per VG:     32768                    MAX PVs:        1024
LTG size (Dynamic): 256 kilobyte(s)          AUTO SYNC:      no
HOT SPARE:          no                       BB POLICY:      relocatable
MIRROR POOL STRICT: off
PV RESTRICTION:     none                     INFINITE RETRY: no
DISK BLOCK SIZE:    512                      CRITICAL VG:    no

My own script to create VMs

I’m creating Virtual Machine every weeks, sometimes just a couple and sometime I got 10 Virtual Machines to create in a row. We are here using different storage connectivity groups, and different storage templates if the machine is in production, in development, and so on. We also have to choose the primary copy on the SVC side if the machine is in production (I am using a streched cluster between two distant sites, so I have to choose different storage templates depending on the site where the Virtual Machine is hosted). I make mistakes almost every time using the PowerVC gui (sometime I forgot to put the machine name, sometimes the connectivity group). I’m a lazy guy so I decided to code a script using the PowerVC rest api to create new machines based on a template file. We are planing to give the script to our outsourced teams to allow them to create machine, without knowing what PowerVC is \o/. The script is taking a file as parameter and create the virtual machine:

Create a file like the one below with all the information needed for your new virtual machine creation (name, ip address, vlan, host, image, storage connectivity group, ….):

# cat test.vm
name:test
ip_address:10.16.66.20
vlan:vlan6666
target_host:Default Group
image:multi-vol
storage_connectivity_group:npiv
virtual_processor:1
entitled_capacity:0.1
memory:1024
storage_template:storage1

Launch the script, the Virtual Machine will be created:

pvcmkvm -f test.vm -p localhost -u root -P mysecretpassword
name: test
ip_address: 10.16.66.20
vlan: vlan666
target_host: Default Group
image: multi-vol
storage_connectivity_group: npiv
virtual_processor: 1
entitled_capacity: 0.1
memory: 1024
storage_template: storage1
[info] found image multi-vol with id 041d830c-8edf-448b-9892-560056c450d8
[info] found network vlan666 with id 5fae84a7-b463-4a1a-b4dd-9ab24cdb66b5
[info] found host aggregation Default Group with id 1
[info] found storage template storage1 with id bfb4f8cc-cd68-46a2-b3a2-c715867de706
[info] found image multi-vol with id 041d830c-8edf-448b-9892-560056c450d8
[info] found a volume with id b3783a95-822c-4179-8c29-c7db9d060b94
[info] found a volume with id 9f2fc777-eed3-4c1f-8a02-00c9b7c91176
JSON Body: {"os:scheduler_hints": {"host_aggregate_id": 1}, "server": {"name": "test", "imageRef": "041d830c-8edf-448b-9892-560056c450d8", "networkRef": "5fae84a7-b463-4a1a-b4dd-9ab24cdb66b5", "max_count": 1, "flavor": {"OS-FLV-EXT-DATA:ephemeral": 10, "disk": 60, "extra_specs": {"powervm:max_proc_units": 32, "powervm:min_mem": 1024, "powervm:proc_units": 0.1, "powervm:max_vcpu": 32, "powervm:image_volume_type_b3783a95-822c-4179-8c29-c7db9d060b94": "bfb4f8cc-cd68-46a2-b3a2-c715867de706", "powervm:image_volume_type_9f2fc777-eed3-4c1f-8a02-00c9b7c91176": "bfb4f8cc-cd68-46a2-b3a2-c715867de706", "powervm:min_proc_units": 0.1, "powervm:storage_connectivity_group": "npiv", "powervm:min_vcpu": 1, "powervm:max_mem": 66560}, "ram": 1024, "vcpus": 1}, "networks": [{"fixed_ip": "10.244.248.53", "uuid": "5fae84a7-b463-4a1a-b4dd-9ab24cdb66b5"}]}}
{u'server': {u'links': [{u'href': u'https://powervc.lab.chmod666.org:8774/v2/1471acf124a0479c8d525aa79b2582d0/servers/fc3ab837-f610-45ad-8c36-f50c04c8a7b3', u'rel': u'self'}, {u'href': u'https://powervc.lab.chmod666.org:8774/1471acf124a0479c8d525aa79b2582d0/servers/fc3ab837-f610-45ad-8c36-f50c04c8a7b3', u'rel': u'bookmark'}], u'OS-DCF:diskConfig': u'MANUAL', u'id': u'fc3ab837-f610-45ad-8c36-f50c04c8a7b3', u'security_groups': [{u'name': u'default'}], u'adminPass': u'u7rgHXKJXoLz'}}

One of the major advantage of using this is batching Virtual Machine creation. By using the script you can create one hundred Virtual Machine in a couple of minutes. Awesome !

Working with Openstack commands

PowerVC is based on Openstack, so why not using the Openstack command to work with PowerVC. It is possible, but I repeat one more time that this is not supported by IBM at all. Use this trick at you own risk. I was working with cloud manager with openstack (ICMO) and a script including shells variables is provided to “talk” to the ICMO Openstack. Based on the same file I created the same one for PowerVC. Before using any Openstack commands create a powervcrc file that match you PowerVC environement:

# cat powervcrc
export OS_USERNAME=root
export OS_PASSWORD=mypasswd
export OS_TENANT_NAME=ibm-default
export OS_AUTH_URL=https://powervc.lab.chmod666.org:5000/v3/
export OS_IDENTITY_API_VERSION=3
export OS_CACERT=/etc/pki/tls/certs/powervc.crt
export OS_REGION_NAME=RegionOne
export OS_USER_DOMAIN_NAME=Default
export OS_PROJECT_DOMAIN_NAME=Default

Then source the powervcrc file, and you are ready to play with all Openstack commands:

# source powervcrc

You can then play with Openstack commands, here are a few nice example:

List virtual machines:

# nova list
+--------------------------------------+-----------------------+--------+------------+-------------+------------------------+
| ID                                   | Name                  | Status | Task State | Power State | Networks               |
+--------------------------------------+-----------------------+--------+------------+-------------+------------------------+
| dc5c9fce-c839-43af-8af7-e69f823e57ca | ghostdev0clouddev1    | ACTIVE | -          | Running     | vlan666=10.16.66.56    |
| d7d0fd7e-a580-41c8-b3d8-d7aab180d861 | ghostdevto1cloudevto1 | ACTIVE | -          | Running     | vlan666=10.16.66.57    |
| bf697dfa-f69a-476c-8d0f-abb2fdcb44a7 | multi-vol             | ACTIVE | -          | Running     | vlan666=10.16.66.59    |
| 394ab4d4-729e-44c7-a4d0-57bf2c121902 | deckard               | ACTIVE | -          | Running     | vlan666=10.16.66.60    |
| cd53fb69-0530-451b-88de-557e86a2e238 | priss                 | ACTIVE | -          | Running     | vlan666=10.16.66.61    |
| 64a3b1f8-8120-4388-9d64-6243d237aa44 | rachael               | ACTIVE | -          | Running     |                        |
| 2679e3bd-a2fb-4a43-b817-b56ead26852d | batty                 | ACTIVE | -          | Running     |                        |
| 5fdfff7c-fea0-431a-b99b-fe20c49e6cfd | tyrel                 | ACTIVE | -          | Running     |                        |
+--------------------------------------+-----------------------+--------+------------+-------------+------------------------+

Reboot a machine:

# nova reboot multi-vol

List the hosts:

# nova hypervisor-list
+----+---------------------+-------+---------+
| ID | Hypervisor hostname | State | Status  |
+----+---------------------+-------+---------+
| 21 | 828641A_XXXXXXX     | up    | enabled |
| 23 | 828641A_YYYYYYY     | up    | enabled |
+----+---------------------+-------+---------+

Migrate a virtual machine (run a live partition mobility operation):

# nova live-migration ghostdevto1cloudevto1 828641A_YYYYYYY

Evacuate and set a server in maintenance mode and move all the partitions to another host:

# nova maintenance-enable --migrate active-only --target-host 828641A_XXXXXX 828641A_YYYYYYY

Virtual Machine creation (output truncated):

# nova boot --image 7100-03-04-cic2-chef --flavor powervm.tiny --nic net-id=5fae84a7-b463-4a1a-b4dd-9ab24cdb66b5,v4-fixed-ip=10.16.66.51 novacreated
+-------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------+
| Property                            | Value                                                                                                                                            |
+-------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------+
| OS-DCF:diskConfig                   | MANUAL                                                                                                                                           |
| OS-EXT-AZ:availability_zone         | nova                                                                                                                                             |
| OS-EXT-SRV-ATTR:host                | -                                                                                                                                                |
| OS-EXT-SRV-ATTR:hypervisor_hostname | -                                                                                                                                                |
| OS-EXT-SRV-ATTR:instance_name       | novacreated-bf704dc6-00000040                                                                                                                    |
| OS-EXT-STS:power_state              | 0                                                                                                                                                |
| OS-EXT-STS:task_state               | scheduling                                                                                                                                       |
| OS-EXT-STS:vm_state                 | building                                                                                                                                         |
| accessIPv4                          |                                                                                                                                                  |
| accessIPv6                          |                                                                                                                                                  |
| adminPass                           | PDWuY2iwwqQZ                                                                                                                                     |
| avail_priority                      | -                                                                                                                                                |
| compliance_status                   | [{"status": "compliant", "category": "resource.allocation"}]                                                                                     |
| cpu_utilization                     | -                                                                                                                                                |
| cpus                                | 1                                                                                                                                                |
| created                             | 2015-08-05T15:56:01Z                                                                                                                             |
| current_compatibility_mode          | -                                                                                                                                                |
| dedicated_sharing_mode              | -                                                                                                                                                |
| desired_compatibility_mode          | -                                                                                                                                                |
| endianness                          | big-endian                                                                                                                                       |
| ephemeral_gb                        | 0                                                                                                                                                |
| flavor                              | powervm.tiny (ac01ba9b-1576-450e-a093-92d53d4f5c33)                                                                                              |
| health_status                       | {"health_value": "PENDING", "id": "bf704dc6-f255-46a6-b81b-d95bed00301e", "value_reason": "PENDING", "updated_at": "2015-08-05T15:56:02.307259"} |
| hostId                              |                                                                                                                                                  |
| id                                  | bf704dc6-f255-46a6-b81b-d95bed00301e                                                                                                             |
| image                               | 7100-03-04-cic2-chef (96f86941-8480-4222-ba51-3f0c1a3b072b)                                                                                      |
| metadata                            | {}                                                                                                                                               |
| name                                | novacreated                                                                                                                                      |
| operating_system                    | -                                                                                                                                                |
| os_distro                           | aix                                                                                                                                              |
| progress                            | 0                                                                                                                                                |
| root_gb                             | 60                                                                                                                                               |
| security_groups                     | default                                                                                                                                          |
| status                              | BUILD                                                                                                                                            |
| storage_connectivity_group_id       | -                                                                                                                                                |
| tenant_id                           | 1471acf124a0479c8d525aa79b2582d0                                                                                                                 |
| uncapped                            | -                                                                                                                                                |
| updated                             | 2015-08-05T15:56:02Z                                                                                                                             |
| user_id                             | 0688b01e6439ca32d698d20789d52169126fb41fb1a4ddafcebb97d854e836c9                                                                                 |
+-------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------+

LUN order, remove a boot lun

If you are moving to PowerVC you will probably need to migrate existing machines to your PowerVC environment. One of my customer is asking to move its machines from old boxes using vscsi, to new PowerVC managed boxes using NPIV. I am doing it with the help of a SVC for the storage side. Instead of creating the Virtual Machine profile on the HMC, and then doing the zoning and masking on the Storage Volume Controller and on the SAN switches, I decided to let PowerVC do the job for me. Unfortunately, PowerVC can’t only “carve” Virtual Machine, if you want to do so you have to build a Virtual Machine (rootvg include). This is what I am doing. During the migration process I have to replace the PowerVC created lun by the lun used for the migration …. and finally delete the PowerVC created boot lun. There is a trick to know if you want to do this:

Let’s say the lun created by PowerVC is the one named “volume-clouddev-test….” and the orignal rootvg is named “good_rootvg”. The Virtual Machine is booted on the “good_rootvg” lun and I want to remove the “volume-clouddev-test….”:

You first have to click the “Edit Details” button:

Then toggle the boot set to “YES” for the “good_rootvg” lun and click move up (the rootvg order must be set to 1, it is mandatory, the lun at order 1 can’t be deleted):

Toggle the boot set to “NO” for the PowerVC created rootvg:

If you are trying to detach the volume in first position you will got an error:

When the order are ok, you can detach and delete the lun created by PowerVC:

Conclusion

There are always good things to learn about PowerVC and related AIX topics. Tell me if these tricks are useful for you and I will continue to write posts like this one. You don’t need to understand all this details to work with PowerVC, most customers don’t. I’m sure you prefer understand what is going on “behind the scene” instead of just clicking a nice GUI. I hope it helps you to better understand what PowerVC is made of. And don’t be shy share you tricks with me. Next: more to come about Chef ! Up the irons !

↧

Updating AIX TL and SP using Chef

October 23, 2015, 4:26 pm

≫ Next: IBM Technical University for PowerSystems 2015 – Cannes (both sessions files included)

≪ Previous: Tips and tricks for PowerVC 1.2.3 (PVID, ghostdev, clouddev, rest API, growing volumes, deleting boot volume) | PowerVC 1.2.3 Redbook

Creating something to automate the update of a service pack or a technology level has always been a dream that never come true. You can trust me almost every customers that I know tried to make that dream come true. Different customers, same story everywhere. They tried to do something and then tripped up in a miserable way. A fact that is always true in those stories is that the decision of doing that is always taken by someone that do not understand that AIX cannot be managed like a workstation or any other OS (who said windows). A good example of that is an IBM (and you know that I’m an IBM fan) tool call BigFix/TEM (for Tivoli Endpoint Manager): I’m not an expert about TEM (so maybe I am wrong) but you can use this one to update your Windows OS, your Linux, your AIX and even your Iphones or Android devices. LET ME LAUGH! How can it be possible that someone think about this: updating an Iphone the same way as you update an AIX. A good joke! (To be clear I am always and will always support IBM but my role is also to say what I think). Another good example is the utilization of IBM Systems Director (unfortunately … or fortunately this one has been withdrawn since a couple of days). I tried this one myself a few years ago (you can check this post). System Director was (in my humble opinion) the least worst solution to update an AIX or a Virtual I/O Server in a automated way. So how are we going to do this in a world that is always asking to do more with less people ?. I had to find a solution a few months ago to update more than 700 hosts from AIX 6.1 to AIX 7.1, the job was to create something that anybody can launch without knowing anything about AIX (one more time who can even think this is possible ?). I tried to do things like writing scripts to automate nimadm and I’m pretty happy with this solution (almost 80% were ok without any errors, but there were tons of prerequisites before launching the scripts and we faced some problems that were inevitable (nimsh error, sendmail configuration, broken filesets) forcing the AIX L3 team to fix tons of migrations). As everybody knows now I’m working on Chef since a few months and this can be the solution to what our world is asking today : replacing hundred of peoples by a single man launching a magical thing that can do everything without knowing anything about anything and save money! This is obviously ironical but unfortunately this is the reality of what happends today in France. “Money” and “resource” rules everything without having any plans about the future (to be clear I’m here talking about a generality, nothing here can reflect what’s going on in my place). It is like it is and as a good soldier I’m going to give you solutions to face the reality of this harsh world. But now it’s action time ! I don’t want to be too pessimistic but this is unfortunately the reality of what is happening today and my anger about that only reflects the fact that I’m living in fear, the fear of becoming bad or the fear of doing a job I really don’t like. I think I have to find a solution about this problem. The picture below is clear enough to give you a good a example of what I’m trying to do with Chef.

How do we update machines

I’m not here to teach you how to update a service pack or a technology level (I’m sure everybody know that) but in an automated way we need to talk about the method and identify each needed steps to perform an update. As there is always one more way to do it I have identified three ways to update a machine (the multibos way, the nimclient way and finally the alt_disk_copy way). To be able to update using Chef we obviously need to have an available provider for each method (you can do this with the execute resource, but we’re here to have fun and to learn some new things). So we need one provider capable of managing multibos, one capable of managing nimclient, and one capable of managing alt_disk_copy. All of these three providers are available now and can be used to write different recipes doing what is necessary to update a machine. Obviously there are pre-update and post-update steps needed (removing efixes, checking filesets). Let’s identify the step required first:

Verify with lppchk the consistency of all installed packages.
Remove any installed efixes (using emgr provider)
The multibos way:

You don’t need to create a backup of the rootvg using the multibos way.
Mount the SP or TL directory from the NIM server (using Chef mount resource).
Create the multibos instance and update using the remote mounted directory (using multibos resource).

The nimclient way:

Create a backup of your rootvg (using the altdisk resource).
Use nimclient to run a cust operation (using niminit,nimclient resource).

The alt_disk_copy way:

You don’t new to create a backup of the rootvg using the alt_disk_copy way.
Mount the SP or TL directory from the NIM server (using Chef mount).
Create the altinst_rootvg volume group and update it using the remote mounted directory (using altdisk provider).

Reboot the machine.
Remove any unwanted bos, old_rootvg.

Reminder where to download the AIX Chef cookbook:

AIX cookbook with all recipes, resources and providers: https://github.com/chef-cookbooks/aix” title=”https://github.com/chef-cookbooks/aix
altdisk: https://github.com/chef-cookbooks/aix/blob/master/providers/altdisk.rb
multibos: https://github.com/chef-cookbooks/aix/blob/master/providers/multibos.rb
nimclient: https://github.com/chef-cookbooks/aix/blob/master/providers/nimclient.r
fixes: https://github.com/chef-cookbooks/aix/blob/master/providers/fixes.r
Recipes examples for updating: https://github.com/chef-cookbooks/aix/tree/master/recipes

Before trying to do all these steps in a single way let’s try to use the resources one by one to understand what each one is doing.

Fixes installation

This one is simple and allows you to install or remove fixes from your AIX machine, in the example below we are going to show how to do that with two Chef recipes: one for installing and the other one for removing! Super easy.

Installing fixes

In the recipe provides all the fixes name in an array and specify the name of the directory in which the filesets are (this can be an NFS mount point if you want to). Please note here that I’m using the cookbook_file resource to download the fixes, this resource allows you to download a file directly from the cookbook (so from the Chef server). Imagine using this single recipe to install a fix on all your machines. Quite easy ;-)

directory "/var/tmp/fixes" do
  action :create
end

cookbook_file "/var/tmp/fixes/IV75031s5a.150716.71TL03SP05.epkg.Z" do
  source 'IV75031s5a.150716.71TL03SP05.epkg.Z'
  action :create
end

cookbook_file "/var/tmp/fixes/IV77596s5a.150930.71TL03SP05.epkg.Z" do
  source 'IV77596s5a.150930.71TL03SP05.epkg.Z'
  action :create
end

aix_fixes "installing fixes" do
  fixes ["IV75031s5a.150716.71TL03SP05.epkg.Z", "IV77596s5a.150930.71TL03SP05.epkg.Z"]
  directory "/var/tmp/fixes"
  action :install
end

directory "/var/tmp/fixes" do
  recursive true
  action :delete
end

Removing fixes

The recipe is almost the same but with the remove action instead of the install action. Please note that you can specify which fixes to remove or use the keyword all to remove all the installed fixes (in the case of our recipe to update our servers we will use “all” as we want to remove all fixes before trying launch the update).

aix_fixes "remove fixes IV75031s5a and IV77596s5a" do
  fixes ["IV75031s5a", "IV77596s5a]
  action :remove
end

aix_fixes "remove all fixes" do
  fixes ["all"]
end

Alternate disks

In most AIX places I have seen the solution to backup your system before doing anything is to create an alternate disk using the alt_disk_copy command. Sometimes in some places where sysadmins love their job this disk is updated on the go to do a TL or SP upgrade. The altdisk resource I’ve coded for Chef take care of this. I’ll not detail with examples every actions available and will focus on create and cust:

create: This action create an alternate disk we will detail the attributes in the next section.
cleanup: Cleanup the alternate disk (remove it).
rename: Rename the alternate disk.
sleep: Put the alternate disk in sleep (umount every /alt_inst/* filesystem and varyoff the volume group)
wakeup: Wake up the alternate disk (varyon the volume group and mount every filesystems)
customize: Run a cust operation (the current resource is coded to use a directory to update the alternate disk with all the filesets present in a directory).

Creation

The alternate disk create action create an alternate disk an helps you to find an available disk for this creation. In any cases only free disks will be choosen (disks with no PVID and no volume group defined). Different types are available to choose the disk on which the alternate disk will be created:

Size: If type is size a disk by the exact same size of the value attribute will be used.
Name: If type is name a disk by the name of the value attribute will be used.
Auto: In auto mode available values for value are bigger and equals. If bigger is choose the first disk found with a size bigger than the current rootvg size will be used. If equals is choose the first disk found with a size equals to the current rootvg size is used.

aix_altdisk "cloning rootvg by name" do
  type :name
  value "hdisk3"
  action :create
end

aix_altdisk "cloning rootvg by size 66560" do
  type :size
  value "66560"
end

aix_altdisk "removing old alternates" do
  action :cleanup
end

aix_altdisk "cloning rootvg" do
  type :auto
  value "bigger"
  action :create
end

Customization

The customization action will update the previously created alternate disk with the filesets presents in an NFS mounted directory (from the NIM server). Please note in the recipe below that we are mounting the directory from NFS. The node[:nim_server] is an attribute of the node telling which nim server will be mounted. For instance you can define a nim server used for production environment and a nim server used for development environment.

# mounting /mnt
mount "/mnt" do
  device '#{node[:nim_server]}:/export/nim/lpp_source'
  fstype 'nfs'
  action :mount
end

# updating the current disk
aix_altdisk "altdisk_update" do
  image_location "/mnt/7100-03-05-1524"
  action :customize
end

mount "/mnt" do
  action :umount
end

niminit/nimclient

The niminit and nimclient resources are used to register the nimclient to the nim master and then run a nimclient operation from the client. In my humble opinion this is the best way to do the update at the time of writing this blog post. One cool thing is that you can specify on which adapter the nimclient will be configured by using some ohai attributes. It’s an elegant way to do that, one more time this is showing you the power of Chef ;-) . Let’s start with some examples:

niminit

aix_niminit node[:hostname] do
  action :remove
end

aix_niminit node[:hostname] do 
  master "nimcloud"
  connect "nimsh"
  pif_name node[:network][:default_interface]
  action :setup
end

nimclient

nimclient can first be used to install some filesets you may need. The provider is intelligent and can choose the good lpp_source for you. Please note that you will need lpp_source with a specific naming convention if you want to use this feature. To find the next/latest available sp/tl the provider is checking the current oslevel of the current machine and compare it with the available lpp_source present on you nim server. The naming convention needed is $(oslevel s)-lpp_source (ie. 7100-03-05-1524-lpp_source) (same principle is applicable to the spot when you need to use spot)

$ lsnim -t lpp_source | grep 7100
7100-03-00-0000-lpp_source             resources       lpp_source
7100-03-01-1341-lpp_source             resources       lpp_source
7100-03-02-1412-lpp_source             resources       lpp_source
7100-03-03-1415-lpp_source             resources       lpp_source
7100-03-04-1441-lpp_source             resources       lpp_source
7100-03-05-1524-lpp_source             resources       lpp_source

If your nim resources name are ok the lpp_source attribute can be:

latest_sp: the latest available service pack.
next_sp: the next available service.
latest_tl: the latest available technology level.
next_tl: the next available techonlogy level.
If you do not want to do this you can still specify the name of the lpp_source by hand.

Here are a few example to install packages

aix_nimclient "installing filesets" do
  installp_flags "aXYg"
  lpp_source "7100-03-04-1441-lpp_source"
  filesets ["openssh.base.client","openssh.base.server","openssh.license"]
  action :cust
end

aix_nimclient "installing filesets" do
  installp_flags "aXYg"
  lpp_source "7100-03-04-1441-lpp_source"
  filesets ["bos.compat.cmds", "bos.compat.libs"]
  action :cust
end

aix_nimclient "installing filesets" do
  installp_flags "aXYg"
  lpp_source "7100-03-04-1441-lpp_source"
  filesets ["Java6_64.samples"]
  action :cust
end

Please note that some filesets were already installed and the resource did not converge because of that ;-) . Let’s now try to update to the latest service pack:

aix_nimclient "updating to latest sp" do
  installp_flags "aXYg"
  lpp_source "latest_sp"
  fixes "update_all"
  action :cust
end

Tadam the machine was updated from 7100-03-04-1441 to 7100-03-05-1524 using a single a recipe and without knowing which service pack was available to update!

multibos

I really like the multibos way and I don’t know why today so few peoples are using it, anyway, I know some customers who are only working this way so I thought it was worth it working on a multibos resource. Here is a nice recipe creating a bos and updating it.

# creating dir for mount
directory "/var/tmp/mnt" do
  action :create
end

# mounting /mnt
mount "/var/tmp/mnt" do
  device "#{node[:nim_server]}:/export/nim/lpp_source"
  fstype 'nfs'
  action :mount
end

# removing standby multibos
aix_multibos "removing standby bos" do
  action :remove
end

# create multibos and updateit
aix_multibos "creating bos " do
  action :create
end

aix_multibos "update bos" do
  update_device "/var/tmp/mnt/7100-03-05-1524"
  action :update
end

# unmount /mnt
mount "/var/tmp/mnt" do
  action :umount
end

# deleting temp directory
directory "/var/tmp/mnt" do
  action :delete
end

Full recipes for updates

Let’s now write a big recipe doing all the things we need for an update. Remember that if one resource is failing the recipe stop by itself. For instance you’ll see in the recipe below that I’m doing an “lppchk -vm3″. If it returns something other than 0, the resources fail and the recipe fail. It’s obviously a normal behavior, it’s seems ok not to continue if there is a problem. So to sum up here are all the steps this recipe is doing: check fileset consistency, removing all fixes, committing filesets, creating an alternate disk, configuring the nimclient, running the update, deallocating resources

# if lppchk -vm return code is different
# than zero recipe will fail
# no guard needed here
execute "lppchk" do
  command 'lppchk -vm3'
end

# removing any efixes
aix_fixes "remvoving_efixes" do
  fixes ["all"]
  action :remove
end

# committing filesets
# no guard needed here
execute 'commit' do
  command 'installp -c all'
end

# cleaning exsiting altdisk
aix_altdisk "cleanup alternate rootvg" do
  action :cleanup
end

# creating an alternate disk using the
# first disk bigger than the actual rootvg
# bootlist to false as this disk is just a backup copy
aix_altdisk "altdisk_by_auto" do
  type :auto
  value "bigger"
  change_bootlist true
  action :create
end

# nimclient configuration
aix_niminit node[:hostname] do
  master "nimcloud"
  connect "nimsh"
  pif_name "en1"
  action :setup
end

# update to latest available tl/sp
aix_nimclient "updating to latest sp" do
  installp_flags "aXYg"
  lpp_source "latest_sp"
  fixes "update_all"
  action :cust
end

# dealloacate resource
aix_nimclient "deallocating resources" do
  action :deallocate
end

How about a single point of management “knife ssh”, “pushjobs”

Chef is and was designed on a pull model, it means that the client is asking to server to get the recipes and cookbooks and then execute them. This is the role of the chef-client. In a Linux environment, people are often running the client in demonized mode, it means that the client is waking up on a time interval basis and is executed (then every change to the cookbooks are run by the client). I’m almost sure that every AIX shop will be against this method because this one is dangerous. If you are doing that run the change first in test environment, then in dev, and finally in production. To be honest this is not the model I want to build where I am working. We want for some actions (like updates) a push model. By default Chef is delivered with a feature called push jobs. Push jobs is a way to run jobs like “execute the chef-client” from your knife workstation, unfortunately push jobs needs plugin to the chef-client and this one is only available on Linux OS …. not yet one AIX. Anyway we have an alternative, this one is the ssh knife plugin. This plugin that is in knife by default allows you to run commands on the nodes with ssh. Even better if you already have an ssh gateway with key sharing enabled knife ssh can use this gateway to communicate with the clients. Using knife ssh you’ll have the possibility to say “run chef-client on all my AIX 6.1 nodes” or “run this recipes installing this fix on all my AIX 7.1 nodes”, possibilities are infinite. Last note about knife ssh. This one is creating tunnels through your ssh gateway to communicate with the node, so if you use a shared key you have to copy the private key on the knife workstation (it tooks me time to understand that). Here are somes exmples:

On two nodes check the current os level:

Run the update with Chef:

Alternates disk have been created:

Both systems are up to date:

Conclusion

I think this blog post helped you to better understand Chef and what is Chef capable of. We are still on the very beginning of the Chef cookbook and I’m sure plenty of new things (recipes, providers) will come in the next few months. Try it by yourself and I’m sure you’ll like the way it work. I must admit that it is difficult to learn and to start but if you are doing this right you’ll get the benefit of an automation tool working on AIX … and honestly AIX needs an automation tool. I’m almost sure it will be Chef (in fact we have no other choice). Help us to write postinstall recipes, updates recipes and any other recipes you can think about. We need your help and it is happening now! You have the opportunity to be a part of this, a part of something new that will help AIX in the future. We don’t want a dying os, Chef will give AIX the opportunity to be an OS with a fully working automation tool. Go give it a try now!

↧

IBM Technical University for PowerSystems 2015 – Cannes (both sessions files included)

November 4, 2015, 3:44 pm

≫ Next: A first look at SRIOV vNIC adapters

≪ Previous: Updating AIX TL and SP using Chef

I’m traveling the world since my first IBM Technical University for PowerSystems in Dublin (4 years ago as far as I remember). I had the chance to be in Budapest last year and in Cannes this year (a little bit less funny for a French guy than Dublin and Budapest) but in a different way. I had this year the opportunity to be a speaker for two sessions (and two repeats) thanks to the kindness of Alex Abderrazag (thank you for trusting me Alex). My first plan was to go to Tokyo for the Openstack summit to talk about PowerVC but unfortunately for me I was not able to make it because of confidentiality issues I had with my current company (the goal here was to be a customer reference for PowerVC). I didn’t realized that creating two sessions from scratch on two topics which are pretty new would have been so hard for me. I thought it would take me a couple of hours for each one but it took me so many hours for each one that I now have to be impressed by people who are doing this as their daily job ;-) . Something that took me even more hours than creating the slides is the preparation of these two sessions (Speaker notes, practicing (special thanks here to the people who helped me to practice the sessions especially the fantastic Bill Miller ;-) ) and so on …). One last thing I didn’t realized is that you have to manage your stress. As it was my first time in a such a big event I can assure you that I was super stressed. One funny thing about the stress is that I didn’t have any stress anymore just one hour before the session. Before that moment I had to find solution to deal with the stress … and I just realized that I wasn’t stress because of the sessions but because I had to speak English in front of so much people (super tricky thing to do for a shy french guy, trust me !). My first sessions (on both topics) were full (no more chairs available in the room) and the repeat were ok too, so I think it was ok and I think I was not so bad at it ;-) .

I wanted here to thanks all the people who helped me to do this. Philippe Hermes (best pre-sales in France ;-) ) for believing in me and helping me to do that (re-reading my Powerpoint, and taking care of me during the event). Alex Abderrazag for allowing me to do that. Nigel Griffiths for re-reading the PowerVC session and giving me a couple of tips and tricks about being a speaker. Bill Miller and Alain Lechevalier for the rehearsal of both sessions and finally Rosa Davidson (she gave me the envy to do that). I’m not forgetting Jay Kruemcke who gave me some IBM shirts to do these sessions (and also for a lot of other things). Sorry for those whom I may have forgotten.

Many people asked me to share my Powerpoint files, you will find both files below in this post, here are the two presentations:

PowerVC for PowerVM deep dive – Tips & Tricks.
Using Chef Automation on AIX.

PowerVC for PowerVM deep dive – Tips & Tricks

This session is for PowerVC advanced users. You’ll find a lot of tips and tricks allowing you to customize your PowerVC. More than a couple of tips and tricks you’ll also find in this session how PowerVC works (images, activation, cloud-init, and so on). If you are not a PowerVC user this session can be a little bit difficult for you. But these tips and tricks are the lessons I learned from the field using PowerVC in a production environment:

Download [1.60 MB]

Using Chef Automation on AIX

This session will give you all the basis to understand what is Chef and what you can do with this tool. You’ll also find examples on how to update service pack and technology level on AIX using Chef. Good examples about using Chef for post installation tasks and how to use it with PowerVC are also provided in this session.

Download [2.40 MB]

Conclusion

I hope you enjoyed the session if you were at Cannes this year. On my side I really enjoyed doing that, it was a very good experience for me. I hope I’ll have the opportunity to do that again. Feel free to tell my if want to see me in future technical events like these one. The next step is now to do something at Edge … not so sure this dream will come true any time ;-) .

↧

A first look at SRIOV vNIC adapters

November 20, 2015, 4:08 pm

≫ Next: What’s new in VIOS 2.2.4.10 and PowerVM : Part 1 Virtual I/O Server Rules

≪ Previous: IBM Technical University for PowerSystems 2015 – Cannes (both sessions files included)

I have the chance to participate in the current Early Shipment Program (ESP) for Power Systems, especially the software part. One of my tasks is to test a new feature called SRIOV vNIC. For those who does not know anything about SRIOV this technology is comparable to LHEA except it is based on a industry standard (and have a couple of other features). By using SRIOV adapter you can divide a physical port into what we call a Virtual Function (or a Logical Port) and map this Virtual Function to a partition. You can also set “Quality Of Service” on these Virtual Functions. At the creation you will setup the Virtual Function allowing it to take a certain percentage of the physical port. These can be very useful if you want to be sure that your production server will always have a guaranteed bandwidth instead of using a Shared Ethernet Adapter where every clients partitions are competing for the bandwidth. Customers are also using SRIOV adapters for performance purpose ; as nothing is going through the Virtual I/O Server the latency added by this action is eliminated and CPU cycles are saved on the Virtual I/O Server side (Shared Ethernet Adapter consume a lot of CPU cycles). If you are not aware of what SRIOV is I encourage you to check the IBM Redbook about it (http://www.redbooks.ibm.com/abstracts/redp5065.html?Open. Unfortunately you can’t move a partition by using Live Partition Mobility if this one have a Virtual Function assigned to it. Using vNICs allows you to use SRIOV through the Virtual I/O Servers and enable the possibility to move your partition even if you are using an SRIOV logical port. The better of two worlds : performance/qos and virtualization. Is this the end of the Shared Ethernet Adapter ?

SRIOV vNIC, what’s this ?

Before talking about the technical details it is important to understand what vNICs are. When I’m explaining this to newbies I often refer to NPIV. Imagine something similar as the NPIV but for the network part. By using SRIOV vNIC:

A Virtual Function (SRIOV Logical Port) is created and assigned to the Virtual I/O Server.
A vNIC adapter is created in the client partition.
The Virtual Function and the vNIC adapter are linked (mapped) together.
This is a one to one relationship between a Virtual Function and a vNIC (like a vfcs adapter is a one to one relationship between your vfcs and the physical fiber channel adapter).

On the image below, the vNIC lpars are the “yellow” ones, you can see here that the SRIOV adapter is divided in different Virtual Function, and some of them are mapped to the Virtual I/O Server. The relationship between the Virtual Function and the vNIC is achieved by a vnicserver (this is a special Virtual I/O Server device).

One of the major advantage of using vNIC is that you eliminate the need of the Virtual I/O Server for data flows:

The network data flow is direct between the partition memory and the SRIOV adapter, there is no data copy passing through the Virtual I/O Server and it eliminate the CPU cost and the latency of doing that. This is achieved by LRDMA. Pretty cool !
The vNIC will inherits the bandwidth allocation of the Virtual Function (QoS). If the VF is configured with a capacity of 2% the vNIC will also have this capacity.

vNIC Configuration

Before checking all the details on how to configure an SRIOV vNIC adapter you have to check all the prerequisites. As this is a new feature you will need the latest level of …. everything. My advice is to stay up to date as much as possible.

vNIC Prerequisites

These outputs are taken from the early shipment program. All of this can be changed at the GA release:

Hardware Management Console v840:

# lshmc -V
lshmc -V
"version= Version: 8
 Release: 8.4.0
 Service Pack: 0
HMC Build level 20150803.3
","base_version=V8R8.4.0
"

Power 8 only, firmware 840 at least (both enterprise and scale out systems):

AIX 7.1TL4 or AIX 7.2:

# oslevel -s
7200-00-00-0000
# cat /proc/version
Oct 20 2015
06:57:03
1543A_720
@(#) _kdb_buildinfo unix_64 Oct 20 2015 06:57:03 1543A_720

Obviously at least on SRIOV capable adapter!

Using the HMC GUI

The configuration of a vNIC is done at the partition level. The configuration is only available on the enhanced version of the GUI. Select the virtual machine on which you want to add the vNIC and in the Virtual I/O tab you’ll see that a new Virtual NICs session is here. Click on “Virtual NICs” and a new panel will be opened with a new button called “Add Virtual NIC”, just click this one to add a Virtual NIC:

All the SRIOV capable port will be displayed on the next screen. Choose the SRIOV port you want (a virtual function will be created on this one. Don’t do anything more, the creation of a vNIC will automatically create a Virtual Function; assign it to Virtual I/O Server and do the mapping to the vNIC for you). Choose the Virtual I/O Server that will be used for this vNIC (the vNIC server will be created on this Virtual I/O Server. Don’t worry we will talk about vNIC redundancy later in this post) and the Virtual NIC Capacity (the percentage the Phyiscal SRIOV port that will be dedicated to this vNIC)(this has to be a multiple of 2)(be careful with that it can’t be changed afterwards and you’ll have to delete your vNIC to redo the configuration) :

The “Advanced Virtual NIC Settings” allows you to choose the Virtual NIC Adapter ID, choosing a MAC Address, and configuring the vlan restrictions and vlan tagging. In the example below I’m configuring my Virtual NIC in the vlan 310:

Using the HMC Command Line

As always the configuration can be achieved using the HMC command line, using lshwres to list vNIC and chhwres to create a vNIC.

List SRIOV adapters to get the adapter_id needed by the chhwres command:

# lshwres -r sriov --rsubtype adapter -m blade-8286-41A-21AFFFF
adapter_id=1,slot_id=21020014,adapter_max_logical_ports=48,config_state=sriov,functional_state=1,logical_ports=48,phys_loc=U78C9.001.WZS06RN-P1-C12,phys_ports=4,sriov_status=running,alternate_config=0
# lshwres -r virtualio  -m blade-8286-41A-21AFFFF --rsubtype vnic --level lpar --filter "lpar_names=72vm1"
lpar_name=72vm1,lpar_id=9,slot_num=7,desired_mode=ded,curr_mode=ded,port_vlan_id=310,pvid_priority=0,allowed_vlan_ids=all,mac_addr=ee3b8cd87707,allowed_os_mac_addrs=all,desired_capacity=2.0,backing_devices=sriov/vios1/2/1/1/27004008/2.0

Create the vNIC:

# chhwres -r virtualio -m blade-8286-41A-21AFFFF -o a -p 72vm1 --rsubtype vnic -v -a "port_vlan_id=310,backing_devices=sriov/vios2/1/1/1/2"

List the vNIC after create:

# lshwres -r virtualio  -m blade-8286-41A-21AFFFF --rsubtype vnic --level lpar --filter "lpar_names=72vm1"
lpar_name=72vm1,lpar_id=9,slot_num=7,desired_mode=ded,curr_mode=ded,port_vlan_id=310,pvid_priority=0,allowed_vlan_ids=all,mac_addr=ee3b8cd87707,allowed_os_mac_addrs=all,desired_capacity=2.0,backing_devices=sriov/vios1/2/1/1/27004008/2.0
lpar_name=72vm1,lpar_id=9,slot_num=2,desired_mode=ded,curr_mode=ded,port_vlan_id=310,pvid_priority=0,allowed_vlan_ids=all,mac_addr=ee3b8cd87702,allowed_os_mac_addrs=all,desired_capacity=2.0,backing_devices=sriov/vios2/1/1/1/2700400a/2.0

System and Virtual I/O Server Side:

On the Virtual I/O Server you can use two commands to check your vNIC configuration. You can first use the lsmap command to check the one to one relationship between the VF and the vNIC (you see on the output below that a VF and a vnicserver device are created)(you can also see the name of the vNIC in the client partition side) :

# lsdev | grep VF
ent4             Available   PCIe2 100/1000 Base-TX 4-port Converged Network Adapter VF (df1028e214103c04)
# lsdev | grep vnicserver
vnicserver0      Available   Virtual NIC Server Device (vnicserver)
# lsmap -vadapter vnicserver0 -vnic
Name          Physloc                            ClntID ClntName       ClntOS
------------- ---------------------------------- ------ -------------- -------
vnicserver0   U8286.41A.21FFFFF-V2-C32897             6 72nim1         AIX

Backing device:ent4
Status:Available
Physloc:U78C9.001.WZS06RN-P1-C12-T4-S16
Client device name:ent1
Client device physloc:U8286.41A.21FFFFF-V6-C3

You can get more details (QoS, vlan tagging, port states) by using the vnicstat command:

# vnicstat -b vnicserver0
[..]
--------------------------------------------------------------------------------
VNIC Server Statistics: vnicserver0
--------------------------------------------------------------------------------
Device Statistics:
------------------
State: active
Backing Device Name: ent4

Client Partition ID: 6
Client Partition Name: 72nim1
Client Operating System: AIX
Client Device Name: ent1
Client Device Location Code: U8286.41A.21FFFFF-V6-C3
[..]
Device ID: df1028e214103c04
Version: 1
Physical Port Link Status: Up
Logical Port Link Status: Up
Physical Port Speed: 1Gbps Full Duplex
[..]
Port VLAN (Priority:ID): 0:3331
[..]
VF Minimum Bandwidth: 2%
VF Maximum Bandwidth: 100%

On the client side you can list your vNIC and as always have details using the entstat command:

# lsdev -c adapter -s vdevice -t IBM,vnic
ent0 Available  Virtual NIC Client Adapter (vnic)
ent1 Available  Virtual NIC Client Adapter (vnic)
ent3 Available  Virtual NIC Client Adapter (vnic)
ent4 Available  Virtual NIC Client Adapter (vnic)
# entstat -d ent0 | more
[..]
ETHERNET STATISTICS (ent0) :
Device Type: Virtual NIC Client Adapter (vnic)
[..]
Virtual NIC Client Adapter (vnic) Specific Statistics:
------------------------------------------------------
Current Link State: Up
Logical Port State: Up
Physical Port State: Up

Speed Running:  1 Gbps Full Duplex

Jumbo Frames: Disabled
[..]
Port VLAN ID Status: Enabled
        Port VLAN ID: 3331
        Port VLAN Priority: 0

Redundancy

You will certainly agree that having a such new cool feature without having something that is fully redundant would be a shame. Hopefully we have here a solution with the return with a great fanfare of the Network Interface Backup (NIB). As I told you before each time a vNIC is created a vnicserver is created on one of the Virtual I/O Server. (At the vNIC creation you have to choose on which Virtual I/O server it will be created). So to be fully redundant and to have a failover feature the only way is to create two vNIC adapters (one using the first Virtual I/O Server and the second one using the second Virtual I/O Server, on top of this you then have to create a Network Interface Backup, like in the old times :-) ). Here are a couple of things and best practices to know before doing this.

You can’t use two VF coming from the same SRIOV adapter physical port (the NIB creation will be ok, but any configuration on top of this NIB will fail).
You can use two VF coming from the same SRIOV adapter but with two different logical ports (this is the example I will show below).
The best partice is to use two VF coming from two different SRIOV adapters (you can then afford to loose one of the two SRIOV adapter).

Verify on your partition that you have two vNIC adapters and check that the status are ok using the ‘entstat‘ command:

Both vNIC are available on the client partition:

# lsdev -c adapter -s vdevice -t IBM,vnic
ent0 Available  Virtual NIC Client Adapter (vnic)
ent1 Available  Virtual NIC Client Adapter (vnic)
# lsdev -c adapter -s vdevice -t IBM,vnic -F physloc
U8286.41A.21FFFFF-V6-C2
U8286.41A.21FFFFF-V6-C3

You can check on the first Virtual I/O Server that “Current Link State”, “Logical Port State” and “Physical Port State” are ok (all of them needs to be up):

# entstat -d ent0 | grep -p vnic
-------------------------------------------------------------
ETHERNET STATISTICS (ent0) :
Device Type: Virtual NIC Client Adapter (vnic)
Hardware Address: ee:3b:86:f6:45:02
Elapsed Time: 0 days 0 hours 0 minutes 0 seconds

Virtual NIC Client Adapter (vnic) Specific Statistics:
------------------------------------------------------
Current Link State: Up
Logical Port State: Up
Physical Port State: Up

Same on the second Virtual I/O Server:

# entstat -d ent1 | grep -p vnic
-------------------------------------------------------------
ETHERNET STATISTICS (ent1) :
Device Type: Virtual NIC Client Adapter (vnic)
Hardware Address: ee:3b:86:f6:45:03
Elapsed Time: 0 days 0 hours 0 minutes 0 seconds

Virtual NIC Client Adapter (vnic) Specific Statistics:
------------------------------------------------------
Current Link State: Up
Logical Port State: Up
Physical Port State: Up

Verify on both Virtual I/O Server that the two vNIC are coming from two different SRIOV adapters (for the purpose of this test I’m using two different ports on the same SRIOV adapters but it remains the same with two different adapters). You can see on the output below that on Virtual I/O Server 1 the vNIC is backed to the adapter on position 3 (T3) and that on Virtual I/O Server 2 the vNIC is backed to the adapter on position 4 (T4):

Once again use the lsmap command on the first Virtual I/O Server to check that (note that you can check the client name, and the client device):

# lsmap -vadapter vnicserver0 -vnic
Name          Physloc                            ClntID ClntName       ClntOS
------------- ---------------------------------- ------ -------------- -------
vnicserver0   U8286.41A.21AFF8V-V1-C32897             6 72nim1         AIX

Backing device:ent4
Status:Available
Physloc:U78C9.001.WZS06RN-P1-C12-T3-S13
Client device name:ent0
Client device physloc:U8286.41A.21AFF8V-V6-C2

Same thing on the second Virtual I/O Server:

# lsmap -vadapter vnicserver0 -vnic -fmt :
vnicserver0:U8286.41A.21AFF8V-V2-C32897:6:72nim1:AIX:ent4:Available:U78C9.001.WZS06RN-P1-C12-T4-S14:ent1:U8286.41A.21AFF8V-V6-C3

Finally create the Network Interface Backup and put and IP on top of it:

# mkdev -c adapter -s pseudo -t ibm_ech -a adapter_names=ent0 -a backup_adapter=ent1
ent2 Available
# mktcpip -h 72nim1 -a 10.44.33.223 -i en2 -g 10.44.33.254 -m 255.255.255.0 -s
en2
72nim1
inet0 changed
en2 changed
inet0 changed
[..]
# echo "vnic" | kdb
+-------------------------------------------------+
|       pACS       | Device | Link |    State     |
|------------------+--------+------+--------------|
| F1000A0032880000 |  ent0  |  Up  |     Open     |
|------------------+--------+------+--------------|
| F1000A00329B0000 |  ent1  |  Up  |     Open     |
+-------------------------------------------------+

Let’s now try different things to see if the redundancy is working ok. First let’s shutdown one of the Virtual I/O Server and let’s ping our machine from another one:

# ping 10.14.33.223
PING 10.14.33.223 (10.14.33.223) 56(84) bytes of data.
64 bytes from 10.14.33.223: icmp_seq=1 ttl=255 time=0.496 ms
64 bytes from 10.14.33.223: icmp_seq=2 ttl=255 time=0.528 ms
64 bytes from 10.14.33.223: icmp_seq=3 ttl=255 time=0.513 ms
[..]
64 bytes from 10.14.33.223: icmp_seq=40 ttl=255 time=0.542 ms
64 bytes from 10.14.33.223: icmp_seq=41 ttl=255 time=0.514 ms
64 bytes from 10.14.33.223: icmp_seq=47 ttl=255 time=0.550 ms
64 bytes from 10.14.33.223: icmp_seq=48 ttl=255 time=0.596 ms
[..]
--- 10.14.33.223 ping statistics ---
50 packets transmitted, 45 received, 10% packet loss, time 49052ms
rtt min/avg/max/mdev = 0.457/0.525/0.596/0.043 ms

# errpt | more
IDENTIFIER TIMESTAMP  T C RESOURCE_NAME  DESCRIPTION
59224136   1120200815 P H ent2           ETHERCHANNEL FAILOVER
F655DA07   1120200815 I S ent0           VNIC Link Down
3DEA4C5F   1120200815 T S ent0           VNIC Error CRQ
81453EE1   1120200815 T S vscsi1         Underlying transport error
DE3B8540   1120200815 P H hdisk0         PATH HAS FAILED
# echo "vnic" | kdb
(0)> vnic
+-------------------------------------------------+
|       pACS       | Device | Link |    State     |
|------------------+--------+------+--------------|
| F1000A0032880000 |  ent0  | Down |   Unknown    |
|------------------+--------+------+--------------|
| F1000A00329B0000 |  ent1  |  Up  |     Open     |
+-------------------------------------------------+

Same test with the addition of an address to ping, and I’m only loosing 4 packets:

# ping 10.14.33.223
[..]
64 bytes from 10.14.33.223: icmp_seq=41 ttl=255 time=0.627 ms
64 bytes from 10.14.33.223: icmp_seq=42 ttl=255 time=0.548 ms
64 bytes from 10.14.33.223: icmp_seq=46 ttl=255 time=0.629 ms
64 bytes from 10.14.33.223: icmp_seq=47 ttl=255 time=0.492 ms
[..]
# errpt | more
59224136   1120203215 P H ent2           ETHERCHANNEL FAILOVER
F655DA07   1120203215 I S ent0           VNIC Link Down
3DEA4C5F   1120203215 T S ent0           VNIC Error CRQ

vNIC Live Partition Mobility

By default you can use Live Partition Mobility with SRIOV vNIC, it is super simple and it is fully supported by IBM, as always I’ll show you how to do that using the HMC GUI and the command line:

Using the GUI

First validate the mobility operation, it will allow you to choose the destination SRIOV adapter/port on which to map your current vNIC. You have to choose:

The adapter (if you have more than one SRIOV adapter).
The Physical port on which the vNIC will be mapped.
The Virtual I/O Server on which the vnicserver will be created.

New options are now available in the mobility validation panel:

Modify each vNIC to match your destination SRIOV adapter and ports (choose the destination Virtual I/O Server here):

Then migrate:

IDENTIFIER TIMESTAMP  T C RESOURCE_NAME  DESCRIPTION
A5E6DB96   1120205915 I S pmig           Client Partition Migration Completed
4FB9389C   1120205915 I S ent1           VNIC Link Up
F655DA07   1120205915 I S ent1           VNIC Link Down
11FDF493   1120205915 I H ent2           ETHERCHANNEL RECOVERY
4FB9389C   1120205915 I S ent1           VNIC Link Up
4FB9389C   1120205915 I S ent0           VNIC Link Up
[..]
59224136   1120205915 P H ent2           ETHERCHANNEL FAILOVER
B50A3F81   1120205915 P H ent2           TOTAL ETHERCHANNEL FAILURE
F655DA07   1120205915 I S ent1           VNIC Link Down
3DEA4C5F   1120205915 T S ent1           VNIC Error CRQ
F655DA07   1120205915 I S ent0           VNIC Link Down
3DEA4C5F   1120205915 T S ent0           VNIC Error CRQ
08917DC6   1120205915 I S pmig           Client Partition Migration Started

The ping test during the lpm show only 9 ping lost, due to etherchannel failover (on of my port was down at the destination server):

# ping 10.14.33.223
64 bytes from 10.14.33.223: icmp_seq=23 ttl=255 time=0.504 ms
64 bytes from 10.14.33.223: icmp_seq=31 ttl=255 time=0.607 ms

Using the command line

I’m moving back the partition using the HMC command line interface, check the manpage for all the details. Here is the details for the vnic_mappings: slot_num/ded/[vios_lpar_name]/[vios_lpar_id]/[adapter_id]/[physical_port_id]/[capacity]:

Validate:

# migrlpar -o v -m blade-8286-41A-21AFFFF -t  runner-8286-41A-21AEEEE  -p 72nim1 -i 'vnic_mappings="2/ded/vios1/1/1/2/2,3/ded/vios2/2/1/3/2"'

Warnings:
HSCLA291 The selected partition may have an open virtual terminal session.  The management console will force termination of the partition's open virtual terminal session when the migration has completed.

Migrate:

# migrlpar -o m -m blade-8286-41A-21AFFFF -t  runner-8286-41A-21AEEEE  -p 72nim1 -i 'vnic_mappings="2/ded/vios1/1/1/2/2,3/ded/vios2/2/1/3/2"'

Port Labelling

One thing very annoying using LPM with vNIC is that you have to do the mapping of your vNIC each time you are moving. The default choices are never ok and the GUI will always show you the first port or the first adapter and you will have to do that job by yourself. Even worse with the command line the vnic_mappings can give you some headaches :-) . Hopefully there is a feature called port labelling. You can put a label on each SRIOV Physical port and all your machines. My advice is to tag the ports that are serving the same network and the same vlan with the same label on all your machines. During the mobility operation if labels are matching between two machine the adapter/port combination matching the label will be automatically chosen for the mobility and you will have nothing to do to map on your own. Super useful. The outputs below show you how to label your SRIOV ports:

# chhwres -m s00ka9942077-8286-41A-21C9F5V -r sriov --rsubtype physport -o s -a "adapter_id=1,phys_port_id=3,phys_port_label=adapter1port3"
# chhwres -m s00ka9942077-8286-41A-21C9F5V -r sriov --rsubtype physport -o s -a "adapter_id=1,phys_port_id=2,phys_port_label=adapter1port2"

# lshwres -m s00ka9942077-8286-41A-21C9F5V -r sriov --rsubtype physport --level eth -F adapter_id,phys_port_label
1,adapter1port2
1,adapter1port3

At the validation time source and destination ports will automatically be matched:

What about performance

One of the main reason I’m looking for SRIOV vNIC adapter is performance. As all of our design is based on the fact that we need to move all of our virtual machines from a host to one another we need a solution allowing both mobility and performance. If you have tried to run a TSM server in a virtualized environment you’ll probably understand what I mean about performance and virtualization. In the case of TSM you need a lot of network bandwidth. My current customer and my previous one tried to do that using Shared Ethernet Adapters and of course this solution did not work because a classic Virtual Ethernet Adapter is not able to provide enough bandwidth for a single Virtual I/O client. I’m not an expert about network performance but the result you will see below are pretty obvious to understand and will show you the power of vNIC and SRIOV (I know some optimization can be done on the SEA side but it’s just a super simple test).

Methodology

I will try here to compare a classic Virtual Ethernet Adapter with a vNIC in the same configuration, both environments are the same, using same machines, same switches on so on:

Two machines are used to do the test. In case of vNIC both are using a single vNIC bacedk to a 10Gb adapter, in case of Virtual Ethernet Adapter both are backed to a SEA build on top of a 10Gb adapter.
The two machines are running on two different s814.
Entitlement and memory are the same for source and destination machines.
In the case of vNIC the capacity of the VF is set at 100% and the physical port of the SRIOV adapter is dedicated to the vNIC.
In the case of vent the SEA is dedicated to the test virtual machine.
In both cases a MTU of 1500 is utilized.
The tool used for the performance test is iperf (MTU 1500, Window Size 64K, and 10 TCP thread)

SEA test for reference only

iperf server:

iperf client:

vNIC SRIOV test

We are here running the exact same test:

iperf server:

iperf client:

By using a vNIC I get 300% of the bandwidth I get with an virtual ethernet adapter. Just awesome ;-) no tuning (out of the box configuration). Nothing more to add about it it’s pretty obvious that the usage of vNIC for performance will be a must.

Conclusion

Are SRIOV vNICs the end of the SEAs ? Maybe, but not yet ! For some cases like performance and QoS it will be very useful and adopted (I’m pretty sure I will use that for my current customer to virtualized the TSM servers). But today in my opinion SRIOV lacks a real redundancy feature at the adapter level. What I want is a heartbeat communication between the two SRIOV adapters. Having such a feature on a SRIOV adapter will finish to convince customers to move from SEA to SRIOV vNIC. I know nothing about the future but I hope something like that will be available in the next few years. To sum up SRIOV vNICs are powerful, easy to use and simplify the configuration and management of your Power Servers. Please wait for the GA and try this new killer functionality. As always I hope it helps.

↧

What’s new in VIOS 2.2.4.10 and PowerVM : Part 1 Virtual I/O Server Rules

December 10, 2015, 1:04 pm

≫ Next: NovaLink ‘HMC Co-Management’ and PowerVC 1.3.0.1Dynamic Resource Optimizer

≪ Previous: A first look at SRIOV vNIC adapters

I will post a series of mini blog posts about new features of PowerVM and Virtual I/O Server that are release this month. By this I mean Hardware Management Console 840 + Power firmware 840 + Virtual I/O Sever 2.2.4.10. As writing blog posts is not a part of my job and that I’m doing in that in my spare time some of the topics I will talk about have already been covered by other AIX bloggers but I think the more materials we have and the better it is. Other ones like this first one will be new to you. So please accept my apologize if topics are not what I’m calling “0 day” (the day of release). Anyway writing things help me to understand better and I add little details I have not seen in others blog post or in official documentation. Last point I will always try in these mini posts to give something new to you at least my point of view as an IBM customer. I hope it will be useful for you.

The first topic I want to talk about is Virtual I/O Server Rules. With the latest version three new commands called “rules” and “rulescfgset” and “rulesdeploy” are now available in the Virtual I/O Servers. Theses ones helps you configure your devices attributes by creating, deploying, or checking rules (with the current configuration). I’m 100% sure that every time you are installing a Virtual I/O Server you are doing the same thing over and over again: you check your buffers attributes, you check attributes on fiber channels adapters and so on. The rules is a way to be sure everything is the same on all your Virtual I/O Servers (you can create a rule file (xml format) that can be deploy on every Virtual I/O Server you install). Even better, if you are a PowerVC user like me you want to be sure that any new device created by PowerVC are created with the attributes you want (for instance buffer for Virtual Ethernet Adapters). In the “old days” you have to use the chdef command, you can now do this by using the rules. Better than giving you a list of command I’ll show you here what I’m now doing on my Virtual I/O Server in 2.2.4.10.

Creating and modifying existing default rules

Before starting here are (a non exhaustive list) the attributes I’m changing on all my Virtual I/O Servers at deploy time. I now want to do that using the rules (these are just examples, you can do much more using the rules):

On fcs Adapters I’m changing the max_xfer_size attribute to 0x200000.
On fcs Adapters I’m changing the num_cmd_elems attribute to 2048.
On fscsi Devices I’m changing the dyntrk attribute to yes.
On fscsi Devices I’m changing the fc_err_recov to fast_fail.
On Virtual Ethernet Adapters I’m changing the max_buf_tiny attribute to 4096.
On Virtual Ethernet Adapters I’m changing the min_buf_tiny attribute to 4096.
On Virtual Ethernet Adapters I’m changing the max_buf_small attribute to 4096.
On Virtual Ethernet Adapters I’m changing the min_buf_small attribute to 4096.
On Virtual Ethernet Adapters I’m changing the max_buf_medium attribute to 512.
On Virtual Ethernet Adapters I’m changing the min_buf_medium attribute to 512.
On Virtual Ethernet Adapters I’m changing the max_buf_large attribute to 128.
On Virtual Ethernet Adapters I’m changing the min_buf_large attribute to 128.
On Virtual Ethernet Adapters I’m changing the max_buf_huge attribute to 128.
On Virtual Ethernet Adapters I’m changing the min_buf_huge attribute to 128.

Modify existing attributes using rules

By default a “factory” default rule file now exist in the Virtual I/O Server. This one is located in /home/padmin/rules/vios_current_rules.xml, you can check the content of the file (it’s an xml file) and list the rules contains in it:

# ls -l /home/padmin/rules
total 40
-r--r-----    1 root     system        17810 Dec 08 18:40 vios_current_rules.xml
$ oem_setup_env
# head -10 /home/padmin/rules/vios_current_rules.xml
<?xml version="1.0" encoding="UTF-8"?>
<Profile origin="get" version="3.0.0" date="2015-12-08T17:40:37Z">
 <Catalog id="devParam.disk.fcp.mpioosdisk" version="3.0">
  <Parameter name="reserve_policy" value="no_reserve" applyType="nextboot" reboot="true">
   <Target class="device" instance="disk/fcp/mpioosdisk"/>
  </Parameter>
 </Catalog>
 <Catalog id="devParam.disk.fcp.mpioapdisk" version="3.0">
  <Parameter name="reserve_policy" value="no_reserve" applyType="nextboot" reboot="true">
   <Target class="device" instance="disk/fcp/mpioapdisk"/>
[..]

$ rules -o list -d

Let’s now say you have an existing Virtual I/O Server with en existing SEA configured on it. You want two things by using the rules:

Applying the rules to modify to the existing devices.
Be sure that new devices will be created using the rules.

For the purpose of this example we will work here on the buffers attributes of a Virtual Network Adapter (same concepts are applying to other devices type). So we have an SEA with Virtual Network Adapters and we want to change the buffers attributes. Let’s first check the current values of the Virtual Adapters:

$ lsdev -type adapter | grep -i Shared
ent13            Available   Shared Ethernet Adapter
$ lsdev -dev ent13 -attr virt_adapters
value

ent8,ent9,ent10,ent11

$ lsdev -dev ent8 -attr max_buf_huge,max_buf_large,max_buf_medium,max_buf_small,max_buf_tiny,min_buf_huge,min_buf_large,min_buf_medium,min_buf_small,min_buf_tiny
value

64
64
256
2048
2048
24
24
128
512
512
$ lsdev -dev ent9 -attr max_buf_huge,max_buf_large,max_buf_medium,max_buf_small,max_buf_tiny,min_buf_huge,min_buf_large,min_buf_medium,min_buf_small,min_buf_tiny
value

64
64
256
2048
2048
24
24
128
512
512

Let’s now check the value in the current Virtual I/O Servers rules:

$ rules -o list | grep buf
adapter/vdevice/IBM,l-lan      max_buf_tiny         2048
adapter/vdevice/IBM,l-lan      min_buf_tiny         512
adapter/vdevice/IBM,l-lan      max_buf_small        2048
adapter/vdevice/IBM,l-lan      min_buf_small        512

For the tiny and small buffer I can change the rules easily using the rules command (using modify operation):

$ rules -o modify -t adapter/vdevice/IBM,l-lan -a max_buf_tiny=4096
$ rules -o modify -t adapter/vdevice/IBM,l-lan -a min_buf_tiny=4096
$ rules -o modify -t adapter/vdevice/IBM,l-lan -a max_buf_small=4096
$ rules -o modify -t adapter/vdevice/IBM,l-lan -a min_buf_small=4096

I’m re-running the rules command to check rules are now modified :

$ rules -o list | grep buf
adapter/vdevice/IBM,l-lan      max_buf_tiny         4096
adapter/vdevice/IBM,l-lan      min_buf_tiny         4096
adapter/vdevice/IBM,l-lan      max_buf_small        4096
adapter/vdevice/IBM,l-lan      min_buf_small        4096

I can check the current values of my system against the current defined rules by using the diff operation:

# rules -o diff -s
devParam.adapter.vdevice.IBM,l-lan:max_buf_tiny device=adapter/vdevice/IBM,l-lan    2048 | 4096
devParam.adapter.vdevice.IBM,l-lan:min_buf_tiny device=adapter/vdevice/IBM,l-lan     512 | 4096
devParam.adapter.vdevice.IBM,l-lan:max_buf_small device=adapter/vdevice/IBM,l-lan   2048 | 4096
devParam.adapter.vdevice.IBM,l-lan:min_buf_small device=adapter/vdevice/IBM,l-lan    512 | 4096

Creating new attributes using rules

In the current Virtual I/O Server rules embedded with the current Virtual I/O Server release there are no existing rules for the medium, large and huge buffer. Unfortunately for me I’m modifying these attributes by default and I want a rule capable of doing that. The goal is now to create a new set of rules for the other buffers not already present in the default file … Let’s try to do that using the add operation:

# rules -o add -t adapter/vdevice/IBM,l-lan -a max_buf_medium=512
The rule is not supported or does not exist.

Annoying, I can’t add a rule for the medium buffer (same for the large and huge ones). The available attributes for each device is based on the current AIX artex catalog. You can check all the files present in the catalog to check what are the available attributes for each device type, you can see in the output below that there is nothing in the current ARTEX catalog for the medium buffer.

$ oem_setup_env
# cd /etc/security/artex/catalogs
# ls -ltr | grep l-lan
-r--r-----    1 root     security       1261 Nov 10 00:30 devParam.adapter.vdevice.IBM,l-lan.xml
# grep medium devParam.adapter.vdevice.IBM,l-lan.xml
#

To show that this is possible to add new rules I’ll show you a simple example to add the new ‘src_lun_val’ and ‘dst_lun_val’ on the vioslpm0 device. First I check that I can add this rules by looking in the ARTEX catalog:

$ oem_setup_env
# cd /etc/security/artex/catalogs
# ls -ltr | grep lpm
-r--r-----    1 root     security       2645 Nov 10 00:30 devParam.pseudo.vios.lpm.xml
# grep -iE "src_lun_val|dest_lun_val" devParam.pseudo.vios.lpm.xml
  <ParameterDef name="dest_lun_val" type="string" targetClass="device" cfgmethod="attr" reboot="true">
  <ParameterDef name="src_lun_val" type="string" targetClass="device" cfgmethod="attr" reboot="true">

Then I’m checking the ‘range’ of authorized values for both attributes:

# lsattr -l vioslpm0 -a src_lun_val -R
on
off
# lsattr -l vioslpm0 -a dest_lun_val -R
on
off
restart_off
lpm_off

I’m searching the type using the lsdev command (here pseudo/vios/lpm):

# lsdev -P | grep lpm
pseudo         lpm             vios           VIOS LPM Adapter

I’m finally adding the rules and checking the differences:

$ rules -o add -t pseudo/vios/lpm -a src_lun_val=on
$ rules -o add -t pseudo/vios/lpm -a dest_lun_val=on
$ rules -o diff -s
devParam.adapter.vdevice.IBM,l-lan:max_buf_tiny device=adapter/vdevice/IBM,l-lan    2048 | 4096
devParam.adapter.vdevice.IBM,l-lan:min_buf_tiny device=adapter/vdevice/IBM,l-lan     512 | 4096
devParam.adapter.vdevice.IBM,l-lan:max_buf_small device=adapter/vdevice/IBM,l-lan   2048 | 4096
devParam.adapter.vdevice.IBM,l-lan:min_buf_small device=adapter/vdevice/IBM,l-lan    512 | 4096
devParam.pseudo.vios.lpm:src_lun_val device=pseudo/vios/lpm                          off | on
devParam.pseudo.vios.lpm:dest_lun_val device=pseudo/vios/lpm                 restart_off | on

But what about my buffers, is there any possibility to add these attributes in the current ARTEX catalog. The answer is yes. By looking in catalog used for Virtual Ethernet Adapters (file named: devParam.adapter.vdevice.IBM,l-lan.xml) you will see that a catalog file named ‘vioent.cat’ is utilized by this xml file. Check the content of this catalog file by using the dspcat command and find if there is anything related to medium, large and huge buffers (all the catalogs files are location in /usr/lib/methods):

$ oem_setup_env
# cd /usr/lib/methods
# dspcat vioent.cat |grep -iE "medium|large|huge"
1 : 10 Minimum Huge Buffers
1 : 11 Maximum Huge Buffers
1 : 12 Minimum Large Buffers
1 : 13 Maximum Large Buffers
1 : 14 Minimum Medium Buffers
1 : 15 Maximum Medium Buffers

Modify the xml file located in the ARTEX catalog and add the necessary information for these three new buffers type:

$ oem_setup_env
# vi /etc/security/artex/catalogs/devParam.adapter.vdevice.IBM,l-lan.xml
<?xml version="1.0" encoding="UTF-8"?>

<Catalog id="devParam.adapter.vdevice.IBM,l-lan" version="3.0" inherit="devCommon">

  <ShortDescription><NLSCatalog catalog="vioent.cat" setNum="1" msgNum="1">Virtual I/O Ethernet Adapter (l-lan)</NLSCatalog></ShortDescription>

  <ParameterDef name="min_buf_huge" type="integer" targetClass="device" cfgmethod="attr" reboot="true">
    <Description><NLSCatalog catalog="vioent.cat" setNum="1" msgNum="10">Minimum Huge Buffers</NLSCatalog></Description>
  </ParameterDef>

  <ParameterDef name="max_buf_huge" type="integer" targetClass="device" cfgmethod="attr" reboot="true">
    <Description><NLSCatalog catalog="vioent.cat" setNum="1" msgNum="11">Maximum Huge Buffers</NLSCatalog></Description>
  </ParameterDef>

  <ParameterDef name="min_buf_large" type="integer" targetClass="device" cfgmethod="attr" reboot="true">
    <Description><NLSCatalog catalog="vioent.cat" setNum="1" msgNum="12">Minimum Large Buffers</NLSCatalog></Description>
  </ParameterDef>

  <ParameterDef name="max_buf_large" type="integer" targetClass="device" cfgmethod="attr" reboot="true">
    <Description><NLSCatalog catalog="vioent.cat" setNum="1" msgNum="13">Maximum Large Buffers</NLSCatalog></Description>
  </ParameterDef>

  <ParameterDef name="min_buf_medium" type="integer" targetClass="device" cfgmethod="attr" reboot="true">
    <Description><NLSCatalog catalog="vioent.cat" setNum="1" msgNum="14">Minimum Medium Buffers<</NLSCatalog></Description>
  </ParameterDef>

  <ParameterDef name="max_buf_medium" type="integer" targetClass="device" cfgmethod="attr" reboot="true">
    <Description><NLSCatalog catalog="vioent.cat" setNum="1" msgNum="15">Maximum Medium Buffers</NLSCatalog></Description>
  </ParameterDef>

[..]
  <ParameterDef name="max_buf_tiny" type="integer" targetClass="device" cfgmethod="attr" reboot="true">
    <Description><NLSCatalog catalog="vioent.cat" setNum="1" msgNum="19">Maximum Tiny Buffers</NLSCatalog></Description>
  </ParameterDef>

Then I’m retrying to add the rules of the medium,large and huge buffers …. and it’s working great:

# rules -o add -t adapter/vdevice/IBM,l-lan -a max_buf_medium=512
# rules -o add -t adapter/vdevice/IBM,l-lan -a min_buf_medium=512
# rules -o add -t adapter/vdevice/IBM,l-lan -a max_buf_huge=128
# rules -o add -t adapter/vdevice/IBM,l-lan -a min_buf_huge=128
# rules -o add -t adapter/vdevice/IBM,l-lan -a max_buf_large=128
# rules -o add -t adapter/vdevice/IBM,l-lan -a min_buf_large=128

Deploying the rules

Now that a couple of rules are defined let’s now apply them on the Virtual I/O server. First check the differences you will get after applying the rules by using the diff operation of the rules command:

$ rules -o diff -s
devParam.adapter.vdevice.IBM,l-lan:max_buf_tiny device=adapter/vdevice/IBM,l-lan    2048 | 4096
devParam.adapter.vdevice.IBM,l-lan:min_buf_tiny device=adapter/vdevice/IBM,l-lan     512 | 4096
devParam.adapter.vdevice.IBM,l-lan:max_buf_small device=adapter/vdevice/IBM,l-lan   2048 | 4096
devParam.adapter.vdevice.IBM,l-lan:min_buf_small device=adapter/vdevice/IBM,l-lan    512 | 4096
devParam.adapter.vdevice.IBM,l-lan:max_buf_medium device=adapter/vdevice/IBM,l-lan   256 | 512
devParam.adapter.vdevice.IBM,l-lan:min_buf_medium device=adapter/vdevice/IBM,l-lan   128 | 512
devParam.adapter.vdevice.IBM,l-lan:max_buf_huge device=adapter/vdevice/IBM,l-lan      64 | 128
devParam.adapter.vdevice.IBM,l-lan:min_buf_huge device=adapter/vdevice/IBM,l-lan      24 | 128
devParam.adapter.vdevice.IBM,l-lan:max_buf_large device=adapter/vdevice/IBM,l-lan     64 | 128
devParam.adapter.vdevice.IBM,l-lan:min_buf_large device=adapter/vdevice/IBM,l-lan     24 | 128
devParam.pseudo.vios.lpm:src_lun_val device=pseudo/vios/lpm                          off | on
devParam.pseudo.vios.lpm:dest_lun_val device=pseudo/vios/lpm                 restart_off | on

Let’s now deploy the rules using the deploy operation of the rules command, you can notice that for some rules a mandatory reboot is needed to change the existing devices this is the case for the buffers, but not for the vioslpm0 attributes (we can check again that we now have no differences … some attributes are applied using the -P attribute of the chdev command):

$ rules -o deploy 
A manual post-operation is required for the changes to take effect, please reboot the system.
$ lsdev -dev ent8 -attr min_buf_small
value

4096
 lsdev -dev vioslpm0 -attr src_lun_val
value

on
$ rules -o diff -s

Don’t forget to reboot the Virtual I/O Server and check everything is ok after the reboot (check the kernel values by using enstat):

$ shutdown -force -restart
[..]
$ for i in ent8 ent9 ent10 ent11 ; do lsdev -dev $i -attr max_buf_huge,max_buf_large,max_buf_medium,max_buf_small,max_buf_tiny,min_buf_huge,min_buf_large,min_buf_medium,min_buf_small,min_buf_tiny ; done
[..]
128
128
512
4096
4096
128
128
512
4096
4096
$ entstat -all ent13 | grep -i buf
[..]
No mbuf Errors: 0
  Transmit Buffers
    Buffer Size             65536
    Buffers                    32
      No Buffers                0
  Receive Buffers
    Buffer Type              Tiny    Small   Medium    Large     Huge
    Min Buffers              4096     4096      512      128      128
    Max Buffers              4096     4096      512      128      128

For the fibre channels adapters I’m using theses rules:

$ rules -o modify -t driver/iocb/efscsi -a dyntrk=yes
$ rules -o modify -t driver/qliocb/qlfscsi -a dyntrk=yes
$ rules -o modify -t driver/qiocb/qfscsi -a dyntrk=yes
$ rules -o modify -t driver/iocb/efscsi -a fc_err_recov=fast_fail
$ rules -o modify -t driver/qliocb/qlfscsi -a fc_err_recov=fast_fail
$ rules -o modify -t driver/qiocb/qfscsi -a fc_err_recov=fast_fail

What about new devices ?

Let’s now create a new SEA by adding new Virtual Ethernet Adapter using DLPAR and check the devices are created with the good values. (I’m not showing you here how to create the VEA I’m doing it the GUI for simplicity) (14,15,16,17 are the new ones):

$ lsdev | grep ent
ent12            Available   EtherChannel / IEEE 802.3ad Link Aggregation
ent13            Available   Shared Ethernet Adapter
ent14            Available   Virtual I/O Ethernet Adapter (l-lan)
ent15            Available   Virtual I/O Ethernet Adapter (l-lan)
ent16            Available   Virtual I/O Ethernet Adapter (l-lan)
ent17            Available   Virtual I/O Ethernet Adapter (l-lan)
$ lsdev -dev ent14 -attr
buf_mode        min            Receive Buffer Mode                        True
copy_buffs      32             Transmit Copy Buffers                      True
max_buf_control 64             Maximum Control Buffers                    True
max_buf_huge    128            Maximum Huge Buffers                       True
max_buf_large   128            Maximum Large Buffers                      True
max_buf_medium  512            Maximum Medium Buffers                     True
max_buf_small   4096           Maximum Small Buffers                      True
max_buf_tiny    4096           Maximum Tiny Buffers                       True
min_buf_control 24             Minimum Control Buffers                    True
min_buf_huge    128            Minimum Huge Buffers                       True
min_buf_large   128            Minimum Large Buffers                      True
min_buf_medium  512            Minimum Medium Buffers                     True
min_buf_small   4096           Minimum Small Buffers                      True
min_buf_tiny    4096           Minimum Tiny Buffers                       True
$  mkvdev -sea ent0 -vadapter ent14 ent15 ent16 ent17 -default ent14 -defaultid 14 -attr ha_mode=sharing largesend=1 large_receive=yes
ent18 Available
$ entstat -all ent18 | grep -i buf
No mbuf Errors: 0
  Transmit Buffers
    Buffer Size             65536
    Buffers                    32
      No Buffers                0
  Receive Buffers
    Buffer Type              Tiny    Small   Medium    Large     Huge
    Min Buffers              4096     4096      512      128      128
    Max Buffers              4096     4096      512      128      128
  Buffer Mode: Min
[..]

Deploying these rules to another Virtual I/O Server

The goal is now to use this rule file and deploy it on all my Virtual I/O Servers to be sure all the attributes are the same on all the Virtual I/O Servers.

I’m copying my rule file and copy it to another Virtual I/O Server:

$ oem_setup_env
# cp /home/padmin/rules
# scp /home/padmin/rules/custom_rules.xml anothervios:/home/padmin/rules
custom_rules.xml                   100%   19KB  18.6KB/s   00:00
# scp /etc/security/artex/catalogs/devParam.adapter.vdevice.IBM,l-lan.xml anothervios:/etc/security/artex/catalogs/
devParam.adapter.vdevice.IBM,l-lan.xml
devParam.adapter.vdevice.IBM,l-lan.xml    100% 2737     2.7KB/s   00:00

I’m now connecting to the new Virtual I/O Server and applying the rules:

$ rules -o import -f /home/padmin/rules/custom_rules.xml
$ rules -o diff -s
devParam.adapter.vdevice.IBM,l-lan:max_buf_tiny device=adapter/vdevice/IBM,l-lan    2048 | 4096
devParam.adapter.vdevice.IBM,l-lan:min_buf_tiny device=adapter/vdevice/IBM,l-lan     512 | 4096
devParam.adapter.vdevice.IBM,l-lan:max_buf_small device=adapter/vdevice/IBM,l-lan   2048 | 4096
devParam.adapter.vdevice.IBM,l-lan:min_buf_small device=adapter/vdevice/IBM,l-lan    512 | 4096
devParam.adapter.vdevice.IBM,l-lan:max_buf_medium device=adapter/vdevice/IBM,l-lan   256 | 512
devParam.adapter.vdevice.IBM,l-lan:min_buf_medium device=adapter/vdevice/IBM,l-lan   128 | 512
devParam.adapter.vdevice.IBM,l-lan:max_buf_huge device=adapter/vdevice/IBM,l-lan      64 | 128
devParam.adapter.vdevice.IBM,l-lan:min_buf_huge device=adapter/vdevice/IBM,l-lan      24 | 128
devParam.adapter.vdevice.IBM,l-lan:max_buf_large device=adapter/vdevice/IBM,l-lan     64 | 128
devParam.adapter.vdevice.IBM,l-lan:min_buf_large device=adapter/vdevice/IBM,l-lan     24 | 128
devParam.pseudo.vios.lpm:src_lun_val device=pseudo/vios/lpm                          off | on
devParam.pseudo.vios.lpm:dest_lun_val device=pseudo/vios/lpm                 restart_off | on
$ rules -o deploy
A manual post-operation is required for the changes to take effect, please reboot the system.
$ entstat -all ent18 | grep -i buf
[..]
    Buffer Type              Tiny    Small   Medium    Large     Huge
    Min Buffers               512      512      128       24       24
    Max Buffers              2048     2048      256       64       64
[..]
$ shutdown -force -restart
$ entstat -all ent18 | grep -i buf
[..]
   Buffer Type              Tiny    Small   Medium    Large     Huge
    Min Buffers              4096     4096      512      128      128
    Max Buffers              4096     4096      512      128      128
[..]

rulescfgset

If you don’t care at all about creating your own rules you can just use the rulecfgset command as padmin to apply default Virtual I/O Server rules, my advice for newbies is to do that just after the Virtual I/O Server is installed. By doing that you will be sure to have the default IBM rules. It is a good pratice to do that every time you will deploy a new Virtual I/O Server.

# rulescfgset

Conclusion

Use rules ! It is a good way to be sure your Virtual I/O Server devices attributes are the same. I hope my example are good enough to convince you to use it. For PowerVC user like me using rules is a must. As PowerVC is creating devices for you, you want to be sure all your devices are created with the exact same attributes. My example about Virtual Ethernet Adapter buffers is just a mandatory thing to do now for PowerVC users. As always I hope it helps.

↧

NovaLink ‘HMC Co-Management’ and PowerVC 1.3.0.1Dynamic Resource Optimizer

January 18, 2016, 4:54 pm

≫ Next: What’s new in VIOS 2.2.4.10 and PowerVM : Part 2 Shared Processor Pool weighting

≪ Previous: What’s new in VIOS 2.2.4.10 and PowerVM : Part 1 Virtual I/O Server Rules

Everybody now knows that I’m using PowerVC a lot in my current company. My environment is growing bigger and bigger and we are now managing more than 600 virtual machines with PowerVC (the goal is to reach ~ 3000 this year). Some of them were build by PowerVC itself and some of them were migrated through an homemade python script calling the PowerVC rest api and moving our old vSCSI machines to the new full NPIV/Live Partition Mobility/PowerVC environment (Still struggling with the “old mens” to move on SSP, but I’m alone versus everybody on this one). I’m happy with that but (there is always a but) I’m facing a lot problems. The first one is that we are doing more and more stuffs with PowerVC (Virtual Machine creation, virtual machines resizing, adding additional disks, moving machine with LPM, and finally using this python scripts to migrate the old machines to the new environment). I realized that the machine hosting the PowerVC was slower and slower and the more actions we do the more the PowerVC was “unresponsive”. By this I mean that the GUI was slow, creating objects was slower and slower. By looking at CPU graphs in lpar2rrd we noticed that the CPU consumption was growing as fast as we were doing stuffs on PowerVC (check the graph below). The second problem was my teams (unfortunately for me, we have here different teams doing different sort of stuffs here and everybody is using the Hardware Management Consoles it’s own way, some people are renaming the machine making them unusable with PowerVC, some people were changing the profiles disabling the synchronization, even worse we have some third party tools used for capacity planning making the Hardware Management Console unusable by PowerVC). The solution to all these problems is to use NovaLink and especially the NovaLink Co-Management. By doing this the Hardware Management Consoles will be restricted to a read-only view and PowerVC will stop querying the HMCs and will directly query the NovaLink partitions on each hosts instead of querying the Hardware Management Consoles.

What is NovaLink ?

If you are using PowerVC you know that this one is based on OpenStack. Until now all the Openstack services where running on the PowerVC host. If you check on the PowerVC today you can see that there is one Nova per managed host. In the example below I’m managing ten hosts so I have ten different Nova processes running :

# ps -ef | grep [n]ova-compute
nova       627     1 14 Jan16 ?        06:24:30 /usr/bin/python /usr/bin/nova-compute --config-file /etc/nova/nova-9117MMD_10D6666.conf --log-file /var/log/nova/nova-compute.log --log-file /var/log/nova/nova-compute-9117MMD_10D6666.log
nova       649     1 14 Jan16 ?        06:30:25 /usr/bin/python /usr/bin/nova-compute --config-file /etc/nova/nova-9117MMD_65E6666.conf --log-file /var/log/nova/nova-compute.log --log-file /var/log/nova/nova-compute-9117MMD_65E6666.log
nova       664     1 17 Jan16 ?        07:49:27 /usr/bin/python /usr/bin/nova-compute --config-file /etc/nova/nova-9117MMD_1086666.conf --log-file /var/log/nova/nova-compute.log --log-file /var/log/nova/nova-compute-9117MMD_1086666.log
nova       675     1 19 Jan16 ?        08:40:27 /usr/bin/python /usr/bin/nova-compute --config-file /etc/nova/nova-9117MMD_06D6666.conf --log-file /var/log/nova/nova-compute.log --log-file /var/log/nova/nova-compute-9117MMD_06D6666.log
nova       687     1 18 Jan16 ?        08:15:57 /usr/bin/python /usr/bin/nova-compute --config-file /etc/nova/nova-9117MMD_6576666.conf --log-file /var/log/nova/nova-compute.log --log-file /var/log/nova/nova-compute-9117MMD_6576666.log
nova       697     1 21 Jan16 ?        09:35:40 /usr/bin/python /usr/bin/nova-compute --config-file /etc/nova/nova-9117MMD_6556666.conf --log-file /var/log/nova/nova-compute.log --log-file /var/log/nova/nova-compute-9117MMD_6556666.log
nova       712     1 13 Jan16 ?        06:02:23 /usr/bin/python /usr/bin/nova-compute --config-file /etc/nova/nova-9117MMD_10A6666.conf --log-file /var/log/nova/nova-compute.log --log-file /var/log/nova/nova-compute-9117MMD_10A6666.log
nova       728     1 17 Jan16 ?        07:49:02 /usr/bin/python /usr/bin/nova-compute --config-file /etc/nova/nova-9117MMD_1016666.conf --log-file /var/log/nova/nova-compute.log --log-file /var/log/nova/nova-compute-9117MMD_1016666.log
nova       752     1 17 Jan16 ?        07:34:45 /usr/bin/python /usr/bin/nova-compute --config-file /etc/nova/nova-9117MMD_1036666.conf --log-file /var/log/nova/nova-compute.log --log-file /var/log/nova/nova-compute-9119MHE_1036666.log
nova       779     1 13 Jan16 ?        05:54:52 /usr/bin/python /usr/bin/nova-compute --config-file /etc/nova/nova-9117MMD_6596666.conf --log-file /var/log/nova/nova-compute.log --log-file /var/log/nova/nova-compute-9119MHE_6596666.log
# ps -ef | grep [n]ova-compute | wc -l
10

The goal of NovaLink is to move these processes on a dedicated partition running on each managed host (each PowerSystems). This partition is called the NovaLink partition. This one is running on an Ubuntu 15.10 Linux OS (Little endian) (so only available on Power8 hosts) and is in charge to run the Openstack nova processes. By doing that you will distribute the load across all the NovaLink partitions instead of charging one PowerVC host. Even better my understanding is that the NovaLink partition is able to communicate directly with the FSP. By using NovaLink you will be able to stop using the Hardware Management Consoles anymore and avoid the slowness of theses ones. As the NovaLink partition is hosted on the host itself the RMC connections are can now use a direct link (ipv6) through the PowerHypervisor. No more RMC connection problem at all ;-), it’s just awesome. NovaLink allows you to choose between two modes of management:

Full Nova Management: You install your new host directly with NovaLink on it and you will not need an Hardware Management Console Anymore (In this case the NovaLink installation is in charge to deploy the Virtual I/O Servers and the SEAs).
Nova Co-Management: Your host is already installed and you give the write access (setmaster) to the NovaLink partition, the Hardware Management Console will be limited in this mode (you will not be able to create partition anymore or modify profile, it’s not a “read only” mode as you will be able to start and stop the partitions and still do some stuffs with HMC but you will be very limited).
You can still mix NovaLink and Non-NovaLink management hosts, and still have P7/P6 managed by HMCs, P8 managed by HMCs, P8 Nova Co-Managed and P8 full Nova Managed ;-).

Prerequisites

As always upgrade your systems to the latest code level if you want to use NovaLink and NovaLink Co-Management

Power 8 only with firmware version 840. (or later)
Virtual I/O Server 2.2.4.10 or later
For NovaLink co-management HMC V8R8.4.0
Obviously install NovaLink on each NovaLink managed system (install the latest patch version of NovaLink)
PowerVC 1.3.0.1 or later

NovaLink installation on an existing system

I’ll show you here how to install a NovaLink partition on an existing deployed system. Installing a new system from scratch is also possible. My advice is that you look at this address to start: , and check this youtube video showing you how a system is installed from scratch :

The goal of this post is to show you how to setup a co-managed system on an already existing system with Virtual I/O Servers already deployed on the host. My advice is to be very careful. The first thing you’ll need to do is to created a partition (2VP 0.5EC and 5GB Memory) (I’m calling it nova in the example below) and use the Virtual Optical device to load the NovaLink system on this one. In the example below the machine is “SSP” backed. Be very careful when do that: setup the profile name, and all the configuration stuffs before moving to co-managed mode … after that it will be harder for you to change things as the new pvmctl command will be very new to you:

# mkvdev -fbo -vadapter vhost0
vtopt0 Available
# lsrep
Size(mb) Free(mb) Parent Pool         Parent Size      Parent Free
    3059     1579 rootvg                   102272            73216

Name                                                  File Size Optical         Access
PowerVM_NovaLink_V1.1_122015.iso                           1479 None            rw
vopt_a19a8fbb57184aad8103e2c9ddefe7e7                         1 None            ro
# loadopt -disk PowerVM_NovaLink_V1.1_122015.iso -vtd vtopt0
# lsmap -vadapter vhost0 -fmt :
vhost0:U8286.41A.21AFF8V-V2-C40:0x00000003:nova_b1:Available:0x8100000000000000:nova_b1.7f863bacb45e3b32258864e499433b52: :N/A:vtopt0:Available:0x8200000000000000:/var/vio/VMLibrary/PowerVM_NovaLink_V1.1_122015.iso: :N/A

At the gurb page select the first entry:

Wait for the machine to boot:

Choose to perform an installation:

Accept the licenses

padmin user:/li>
Put you network configuration:

Accept to install the Ubuntu system:

You can then modify anything you want in the configuration file (in my case the timezone):

By default NovaLink (I think not 100% sure) is designed to be installed on SAS disk, so without multipathing. If like me you decide to install the NovaLink partition in a “boot-on-san” lpar my advice is to launch the installation without any multipathing enabled (only one vscsi adapter or one virtual fibre channel adapter). After the installation is completed install the Ubuntu multipathd service and configure the second vscsi or virtual fibre channel adapter. If you don’t do that you may experience problem at the installation time (RAID error). Please remember that you have to do that before enabling the co-management. Last thing about the installation it may takes a lot of time to finish. So be patient (especially the preseed step).

Updating to the latest code level

The iso file provider in the Entitled Software Support is not updated to the latest available NovaLink code. Make a copy of the official repository available at this address: ftp://public.dhe.ibm.com/systems/virtualization/Novalink/debian. Serve the content of this ftp server on you how http server (use the command below to copy it):

# wget --mirror ftp://public.dhe.ibm.com/systems/virtualization/Novalink/debian

Modify the /etc/apt/sources.list (and source.list.d) and comment all the available deb repository to on only keep your copy

root@nova:~# grep -v ^# /etc/apt/sources.list
deb http://deckard.lab.chmod666.org/nova/Novalink/debian novalink_1.0.0 non-free
root@nova:/etc/apt/sources.list.d# apt-get upgrade
Reading package lists... Done
Building dependency tree
Reading state information... Done
Calculating upgrade... Done
The following packages will be upgraded:
  pvm-cli pvm-core pvm-novalink pvm-rest-app pvm-rest-server pypowervm
6 upgraded, 0 newly installed, 0 to remove and 0 not upgraded.
Need to get 165 MB of archives.
After this operation, 53.2 kB of additional disk space will be used.
Do you want to continue? [Y/n]
Get:1 http://deckard.lab.chmod666.org/nova/Novalink/debian/ novalink_1.0.0/non-free pypowervm all 1.0.0.1-151203-1553 [363 kB]
Get:2 http://deckard.lab.chmod666.org/nova/Novalink/debian/ novalink_1.0.0/non-free pvm-cli all 1.0.0.1-151202-864 [63.4 kB]
Get:3 http://deckard.lab.chmod666.org/nova/Novalink/debian/ novalink_1.0.0/non-free pvm-core ppc64el 1.0.0.1-151202-1495 [2,080 kB]
Get:4 http://deckard.lab.chmod666.org/nova/Novalink/debian/ novalink_1.0.0/non-free pvm-rest-server ppc64el 1.0.0.1-151203-1563 [142 MB]
Get:5 http://deckard.lab.chmod666.org/nova/Novalink/debian/ novalink_1.0.0/non-free pvm-rest-app ppc64el 1.0.0.1-151203-1563 [21.1 MB]
Get:6 http://deckard.lab.chmod666.org/nova/Novalink/debian/ novalink_1.0.0/non-free pvm-novalink ppc64el 1.0.0.1-151203-408 [1,738 B]
Fetched 165 MB in 7s (20.8 MB/s)
(Reading database ... 72094 files and directories currently installed.)
Preparing to unpack .../pypowervm_1.0.0.1-151203-1553_all.deb ...
Unpacking pypowervm (1.0.0.1-151203-1553) over (1.0.0.0-151110-1481) ...
Preparing to unpack .../pvm-cli_1.0.0.1-151202-864_all.deb ...
Unpacking pvm-cli (1.0.0.1-151202-864) over (1.0.0.0-151110-761) ...
Preparing to unpack .../pvm-core_1.0.0.1-151202-1495_ppc64el.deb ...
Removed symlink /etc/systemd/system/multi-user.target.wants/pvm-core.service.
Unpacking pvm-core (1.0.0.1-151202-1495) over (1.0.0.0-151111-1375) ...
Preparing to unpack .../pvm-rest-server_1.0.0.1-151203-1563_ppc64el.deb ...
Unpacking pvm-rest-server (1.0.0.1-151203-1563) over (1.0.0.0-151110-1480) ...
Preparing to unpack .../pvm-rest-app_1.0.0.1-151203-1563_ppc64el.deb ...
Unpacking pvm-rest-app (1.0.0.1-151203-1563) over (1.0.0.0-151110-1480) ...
Preparing to unpack .../pvm-novalink_1.0.0.1-151203-408_ppc64el.deb ...
Unpacking pvm-novalink (1.0.0.1-151203-408) over (1.0.0.0-151112-304) ...
Processing triggers for ureadahead (0.100.0-19) ...
ureadahead will be reprofiled on next reboot
Setting up pypowervm (1.0.0.1-151203-1553) ...
Setting up pvm-cli (1.0.0.1-151202-864) ...
Installing bash completion script /etc/bash_completion.d/python-argcomplete.sh
Setting up pvm-core (1.0.0.1-151202-1495) ...
addgroup: The group `pvm_admin' already exists.
Created symlink from /etc/systemd/system/multi-user.target.wants/pvm-core.service to /usr/lib/systemd/system/pvm-core.service.
0513-071 The ctrmc Subsystem has been added.
Adding /usr/lib/systemd/system/ctrmc.service for systemctl ...
0513-059 The ctrmc Subsystem has been started. Subsystem PID is 3096.
Setting up pvm-rest-server (1.0.0.1-151203-1563) ...
The user `wlp' is already a member of `pvm_admin'.
Setting up pvm-rest-app (1.0.0.1-151203-1563) ...
Setting up pvm-novalink (1.0.0.1-151203-408) ...

NovaLink and HMC Co-Management configuration

Before adding the hosts on PowerVC you still need to do the most important thing. After the installation is finished enable the co-management mode to be able to have a system managed by NovaLink and still connected to an Hardware Management Console:

Enable the powerm_mgmt_capable attribute on the Nova partition:

# chsyscfg -r lpar -m br-8286-41A-2166666 -i "name=nova,powervm_mgmt_capable=1"
# lssyscfg -r lpar -m br-8286-41A-2166666 -F name,powervm_mgmt_capable --filter "lpar_names=nova"
nova,1

Enable co-management (please not here that you have to setmaster (you’ll see that the curr_master_name is the HMC) and then relmaster (you’ll see that the curr_master_name is the NovaLink Partition, this is that state where we want to be)):

# lscomgmt -m br-8286-41A-2166666
is_master=null
# chcomgmt -m br-8286-41A-2166666 -o setmaster -t norm --terms agree
# lscomgmt -m br-8286-41A-2166666
is_master=1,curr_master_name=myhmc1,curr_master_mtms=7042-CR8*2166666,curr_master_type=norm,pend_master_mtms=none
# chcomgmt -m br-8286-41A-2166666 -o relmaster
# lscomgmt -m br-8286-41A-2166666
is_master=0,curr_master_name=nova,curr_master_mtms=3*8286-41A*2166666,curr_master_type=norm,pend_master_mtms=none

Going back to HMC managed system

You can go back to an Hardware Management Console managed system whenever you want (set the master to the HMC, delete the nova partition and release the master from the HMC).

# chcomgmt -m br-8286-41A-2166666 -o setmaster -t norm --terms agree
# lscomgmt -m br-8286-41A-2166666
is_master=1,curr_master_name=myhmc1,curr_master_mtms=7042-CR8*2166666,curr_master_type=norm,pend_master_mtms=none
# chlparstate -o shutdown -m br-8286-41A-2166666 --id 9 --immed
# rmsyscfg -r lpar -m br-8286-41A-2166666 --id 9
# chcomgmt -o relmaster -m br-8286-41A-2166666
# lscomgmt -m br-8286-41A-2166666
is_master=0,curr_master_mtms=none,curr_master_type=none,pend_master_mtms=none

Using NovaLink

After the installation you are now able to login on the NovaLink partition. (You can gain root access with “sudo su -” command). A command new called pvmctl is available on the NovaLink partition allowing you to perform any actions (stop, start virtual machine, list Virtual I/O Servers, ….). Before trying to add the host double check that the pvmctl command is working ok.

padmin@nova:~$ pvmctl lpar list
Logical Partitions
+------+----+---------+-----------+---------------+------+-----+-----+
| Name | ID |  State  |    Env    |    Ref Code   | Mem  | CPU | Ent |
+------+----+---------+-----------+---------------+------+-----+-----+
| nova | 3  | running | AIX/Linux | Linux ppc64le | 8192 |  2  | 0.5 |
+------+----+---------+-----------+---------------+------+-----+-----+

Adding hosts

On the PowerVC side add the NovaLink host by choosing the NovaLink option:

Some deb (ibmpowervc-power)packages will be installed on configured on the NovaLink machine:

By doing this, on each NovaLink machine you can check that a nova-compute process is here. (By adding the host the deb was installed and configured on the NovaLink host:

# ps -ef | grep nova
nova      4392     1  1 10:28 ?        00:00:07 /usr/bin/python /usr/bin/nova-compute --config-file /etc/nova/nova.conf --log-file /var/log/nova/nova-compute.log
root      5218  5197  0 10:39 pts/1    00:00:00 grep --color=auto nova
# grep host_display_name /etc/nova/nova.conf
host_display_name = XXXX-8286-41A-XXXX
# tail -1 /var/log/apt/history.log
Start-Date: 2016-01-18  10:27:54
Commandline: /usr/bin/apt-get -o Dpkg::Options::=--force-confdef -o Dpkg::Options::=--force-confold -y install --force-yes --allow-unauthenticated ibmpowervc-powervm
Install: python-keystoneclient:ppc64el (1.6.0-2.ibm.ubuntu1, automatic), python-oslo.reports:ppc64el (0.1.0-1.ibm.ubuntu1, automatic), ibmpowervc-powervm:ppc64el (1.3.0.1), python-ceilometer:ppc64el (5.0.0-201511171217.ibm.ubuntu1.199, automatic), ibmpowervc-powervm-compute:ppc64el (1.3.0.1, automatic), nova-common:ppc64el (12.0.0-201511171221.ibm.ubuntu1.213, automatic), python-oslo.service:ppc64el (0.11.0-2.ibm.ubuntu1, automatic), python-oslo.rootwrap:ppc64el (2.0.0-1.ibm.ubuntu1, automatic), python-pycadf:ppc64el (1.1.0-1.ibm.ubuntu1, automatic), python-nova:ppc64el (12.0.0-201511171221.ibm.ubuntu1.213, automatic), python-keystonemiddleware:ppc64el (2.4.1-2.ibm.ubuntu1, automatic), python-kafka:ppc64el (0.9.3-1.ibm.ubuntu1, automatic), ibmpowervc-powervm-monitor:ppc64el (1.3.0.1, automatic), ibmpowervc-powervm-oslo:ppc64el (1.3.0.1, automatic), neutron-common:ppc64el (7.0.0-201511171221.ibm.ubuntu1.280, automatic), python-os-brick:ppc64el (0.4.0-1.ibm.ubuntu1, automatic), python-tooz:ppc64el (1.22.0-1.ibm.ubuntu1, automatic), ibmpowervc-powervm-ras:ppc64el (1.3.0.1, automatic), networking-powervm:ppc64el (1.0.0.0-151109-25, automatic), neutron-plugin-ml2:ppc64el (7.0.0-201511171221.ibm.ubuntu1.280, automatic), python-ceilometerclient:ppc64el (1.5.0-1.ibm.ubuntu1, automatic), python-neutronclient:ppc64el (2.6.0-1.ibm.ubuntu1, automatic), python-oslo.middleware:ppc64el (2.8.0-1.ibm.ubuntu1, automatic), python-cinderclient:ppc64el (1.3.1-1.ibm.ubuntu1, automatic), python-novaclient:ppc64el (2.30.1-1.ibm.ubuntu1, automatic), python-nova-ibm-ego-resource-optimization:ppc64el (2015.1-201511110358, automatic), python-neutron:ppc64el (7.0.0-201511171221.ibm.ubuntu1.280, automatic), nova-compute:ppc64el (12.0.0-201511171221.ibm.ubuntu1.213, automatic), nova-powervm:ppc64el (1.0.0.1-151203-215, automatic), openstack-utils:ppc64el (2015.2.0-201511171223.ibm.ubuntu1.18, automatic), ibmpowervc-powervm-network:ppc64el (1.3.0.1, automatic), python-oslo.policy:ppc64el (0.5.0-1.ibm.ubuntu1, automatic), python-oslo.db:ppc64el (2.4.1-1.ibm.ubuntu1, automatic), python-oslo.versionedobjects:ppc64el (0.9.0-1.ibm.ubuntu1, automatic), python-glanceclient:ppc64el (1.1.0-1.ibm.ubuntu1, automatic), ceilometer-common:ppc64el (5.0.0-201511171217.ibm.ubuntu1.199, automatic), openstack-i18n:ppc64el (2015.2-3.ibm.ubuntu1, automatic), python-oslo.messaging:ppc64el (2.1.0-2.ibm.ubuntu1, automatic), python-swiftclient:ppc64el (2.4.0-1.ibm.ubuntu1, automatic), ceilometer-powervm:ppc64el (1.0.0.0-151119-44, automatic)
End-Date: 2016-01-18  10:28:00

The command line interface

You can do ALL the stuffs you were doing on the HMC using the pvmctl command. The syntax is pretty simple: pvcmtl |OBJECT| |ACTION| where the OBJECT can be vios, vm, vea(virtual ethernet adapter), vswitch, lu (logical unit), or anything you want and ACTION can be list, delete, create, update. Here are a few examples :

List the Virtual I/O Servers:

# pvmctl vios list
Virtual I/O Servers
+--------------+----+---------+----------+------+-----+-----+
|     Name     | ID |  State  | Ref Code | Mem  | CPU | Ent |
+--------------+----+---------+----------+------+-----+-----+
| s00ia9940825 | 1  | running |          | 8192 |  2  | 0.2 |
| s00ia9940826 | 2  | running |          | 8192 |  2  | 0.2 |
+--------------+----+---------+----------+------+-----+-----+

List the partitions (note the -d for display-fields allowing me to print somes attributes):

# pvmctl vm list
Logical Partitions
+----------+----+----------+----------+----------+-------+-----+-----+
|   Name   | ID |  State   |   Env    | Ref Code |  Mem  | CPU | Ent |
+----------+----+----------+----------+----------+-------+-----+-----+
| aix72ca> | 3  | not act> | AIX/Lin> | 00000000 |  2048 |  1  | 0.1 |
|   nova   | 4  | running  | AIX/Lin> | Linux p> |  8192 |  2  | 0.5 |
| s00vl99> | 5  | running  | AIX/Lin> | Linux p> | 10240 |  2  | 0.2 |
| test-59> | 6  | not act> | AIX/Lin> | 00000000 |  2048 |  1  | 0.1 |
+----------+----+----------+----------+----------+-------+-----+-----+
# pvmctl list vm -d name id 
[..]
# pvmctl vm list -i id=4 --display-fields LogicalPartition.name
name=aix72-1-d3707953-00000090
# pvmctl vm list  --display-fields LogicalPartition.name LogicalPartition.id LogicalPartition.srr_enabled SharedProcessorConfiguration.desired_virtual SharedProcessorConfiguration.uncapped_weight
name=aix72capture,id=3,srr_enabled=False,desired_virtual=1,uncapped_weight=64
name=nova,id=4,srr_enabled=False,desired_virtual=2,uncapped_weight=128
name=s00vl9940243,id=5,srr_enabled=False,desired_virtual=2,uncapped_weight=128
name=test-5925058d-0000008d,id=6,srr_enabled=False,desired_virtual=1,uncapped_weight=128

Delete the virtual adapter on the partition name nova (note the –parent-id to select the partition) with a certain uuid which was found with (pvmclt list vea):

# pvmctl vea delete --parent-id name=nova --object-id uuid=fe7389a8-667f-38ca-b61e-84c94e5a3c97

Power off the lpar named aix72-2:

# pvmctl vm power-off -i name=aix72-2-536bf0f8-00000091
Powering off partition aix72-2-536bf0f8-00000091, this may take a few minutes.
Partition aix72-2-536bf0f8-00000091 power-off successful.

Delete the lpar named aix72-2:

# pvmctl vm delete -i name=aix72-2-536bf0f8-00000091

Delete the vswitch named MGMTVSWITCH:

# pvmctl vswitch delete -i name=MGMTVSWITCH

Open a console:

#  mkvterm --id 4
vterm for partition 4 is active.  Press Control+] to exit.
|
Elapsed time since release of system processors: 57014 mins 10 secs
[..]

Power on an lpar:

# pvmctl vm power-on -i name=aix72capture
Powering on partition aix72capture, this may take a few minutes.
Partition aix72capture power-on successful.

Is this a dream ? No more RMC connectivty problem anymore

I’m 100% sure that you always have problems with RMC connectivity due to firwall issues, ports not opened, and IDS blocking RMC ongoing or outgoing traffic. NovaLink is THE solution that will solve all the RMC problems forever. I’m not joking it’s a major improvement for PowerVM. As the NovaLink partition is installed on each hosts this one can communicate through a dedicated IPv6 link with all the partitions hosted on the host. A dedicated virtual switch called MGMTSWITCH is used to allow the RMC flow to transit between all the lpars and the NovaLink partition. Of course this Virtual Switch must be created and one Virtual Ethernet Adapter must also be created on the NovaLink partition. These are the first two actions to do if you want to implement this solution. Before starting here are a few things you need to know:

For security reason the MGMTSWITCH must be created in Vepa mode. If you are not aware of what are VEPA and VEB modes here is a reminder:
In VEB mode all the the partitions connected to the same vlan can communicate together. We do not want that as it is a security issue.
The VEPA mode gives us the ability to isolate lpars that are on the same subnet. lpar to lpar traffic is forced out of the machine. This is what we want.
The PVID for this VEPA network is 4094
The adapter in the NovaLink partition must be a trunk adapter.
It is mandatory to name the VEPA vswitch MGMTSWITCH.
At the lpar creation if the MGMTSWITCH exists a new Virtual Ethernet Adapter will be automatically created on the deployed lpar.
To be correctly configured the deployed lpar needs the latest level of rsct code (3.2.1.0 for now).
The latest cloud-init version must be deploy on the captured lpar used to make the image.
You don’t need to configure any addresses on this adapter (on the deployed lpars the adapter is configured with the local-link address (it’s the same thing as 169.254.0.0/16 addresses used in IPv4 format but for IPv6)(please note that any IPv6 adapter must “by design” have a local-link address).

Create the virtual switch called MGMTSWITCH in Vepa mode:

# pvmctl vswitch create --name MGMTSWITCH --mode=Vepa
# pvmctl vswitch list  --display-fields VirtualSwitch.name VirtualSwitch.mode 
name=ETHERNET0,mode=Veb
name=vdct,mode=Veb
name=vdcb,mode=Veb
name=vdca,mode=Veb
name=MGMTSWITCH,mode=Vepa

Create a virtual ethernet adapter on the NovaLink partition with the PVID 4094 and a trunk priorty set to 1 (it’s a trunk adapter). Note that we now have two adapters on the NovaLink partition (one in IPv4 (routable) and the other one in IPv6 (non-routable):

# pvmctl vea create --pvid 4094 --vswitch MGMTSWITCH --trunk-pri 1 --parent-id name=nova
# pvmctl vea list --parent-id name=nova
--------------------------
| VirtualEthernetAdapter |
--------------------------
  is_tagged_vlan_supported=False
  is_trunk=False
  loc_code=U8286.41A.216666-V3-C2
  mac=EE3B84FD1402
  pvid=666
  slot=2
  uuid=05a91ab4-9784-3551-bb4b-9d22c98934e6
  vswitch_id=1
--------------------------
| VirtualEthernetAdapter |
--------------------------
  is_tagged_vlan_supported=True
  is_trunk=True
  loc_code=U8286.41A.216666-V3-C34
  mac=B6F837192E63
  pvid=4094
  slot=34
  trunk_pri=1
  uuid=fe7389a8-667f-38ca-b61e-84c94e5a3c97
  vswitch_id=4

Configure the local-link IPv6 address in the NovaLink partition:

# more /etc/network/interfaces
[..]
auto eth1
iface eth1 inet manual
 up /sbin/ifconfig eth1 0.0.0.0
# ifup eth1
# ifconfig eth1
eth1      Link encap:Ethernet  HWaddr b6:f8:37:19:2e:63
          inet6 addr: fe80::b4f8:37ff:fe19:2e63/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:17 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:0 (0.0 B)  TX bytes:1454 (1.4 KB)
          Interrupt:34

Capture an AIX host with the latest version of rsct installed (3.2.1.0) or later and the latest version of cloud-init installed. This version of RMC/rsct handle this new feature so this is mandatory to have it installed on the captured host. When PowerVC will deploy a Virtual Machine on a Nova managed host with this version of rsct installed a new adapter with the PVID 4094 in the virtual switch MGMTSWITCH will be created and finally all the RMC traffic will use this adapter instead of your public IP address:

# lslpp -L rsct*
  Fileset                      Level  State  Type  Description (Uninstaller)
  ----------------------------------------------------------------------------
  rsct.core.auditrm          3.2.1.0    C     F    RSCT Audit Log Resource
                                                   Manager
  rsct.core.errm             3.2.1.0    C     F    RSCT Event Response Resource
                                                   Manager
  rsct.core.fsrm             3.2.1.0    C     F    RSCT File System Resource
                                                   Manager
  rsct.core.gui              3.2.1.0    C     F    RSCT Graphical User Interface
  rsct.core.hostrm           3.2.1.0    C     F    RSCT Host Resource Manager
  rsct.core.lprm             3.2.1.0    C     F    RSCT Least Privilege Resource
                                                   Manager
  rsct.core.microsensor      3.2.1.0    C     F    RSCT MicroSensor Resource
                                                   Manager
  rsct.core.rmc              3.2.1.1    C     F    RSCT Resource Monitoring and
                                                   Control
  rsct.core.sec              3.2.1.0    C     F    RSCT Security
  rsct.core.sensorrm         3.2.1.0    C     F    RSCT Sensor Resource Manager
  rsct.core.sr               3.2.1.0    C     F    RSCT Registry
  rsct.core.utils            3.2.1.1    C     F    RSCT Utilities

When this image will be deployed a new adapter will be created in the MGMTSWITCH virtual switch, an IPv6 local-link address will be configured on it. You can check the cloud-init activation to see the IPv6 address is configured at the activation time:

# pvmctl vea list --parent-id name=aix72-2-0a0de5c5-00000095
--------------------------
| VirtualEthernetAdapter |
--------------------------
  is_tagged_vlan_supported=True
  is_trunk=False
  loc_code=U8286.41A.216666-V5-C32
  mac=FA620F66FF20
  pvid=3331
  slot=32
  uuid=7f1ec0ab-230c-38af-9325-eb16999061e2
  vswitch_id=1
--------------------------
| VirtualEthernetAdapter |
--------------------------
  is_tagged_vlan_supported=True
  is_trunk=False
  loc_code=U8286.41A.216666-V5-C33
  mac=46A066611B09
  pvid=4094
  slot=33
  uuid=560c67cd-733b-3394-80f3-3f2a02d1cb9d
  vswitch_id=4
# ifconfig -a
en0: flags=1e084863,14c0
        inet 10.10.66.66 netmask 0xffffff00 broadcast 10.14.33.255
         tcp_sendspace 262144 tcp_recvspace 262144 rfc1323 1
en1: flags=1e084863,14c0
        inet6 fe80::c032:52ff:fe34:6e4f/64
         tcp_sendspace 262144 tcp_recvspace 262144 rfc1323 1
sit0: flags=8100041
        inet6 ::10.10.66.66/96
[..]

Note that the local-link address is configured at the activation time (fe80 starting addresses):

# more /var/log/cloud-init-output.log
[..]
auto eth1

iface eth1 inet6 static
    address fe80::c032:52ff:fe34:6e4f
    hwaddress ether c2:32:52:34:6e:4f
    netmask 64
    pre-up [ $(ifconfig eth1 | grep -o -E '([[:xdigit:]]{1,2}:){5}[[:xdigit:]]{1,2}') = "c2:32:52:34:6e:4f" ]
        dns-search fr.net.intra
# entstat -d ent1 | grep -iE "switch|vlan"
Invalid VLAN ID Packets: 0
Port VLAN ID:  4094
VLAN Tag IDs:  None
Switch ID: MGMTSWITCH

To be sure all is working correctly here is a proof test. I’m taking down the en0 interface on which the IPv4 public address is configured. Then I’m launching a tcpdump on the en1 (on the MGMTSWITCH address). Finally I’m resizing the Virtual Machine with PowerVC. AND EVERYTHING IS WORKING GREAT !!!! AWESOME !!! :-) (note the fe80 to fe80 communication):

# ifconfig en0 down detach ; tcpdump -i en1 port 657
tcpdump: WARNING: en1: no IPv4 address assigned
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on en1, link-type 1, capture size 96 bytes
22:00:43.224964 IP6 fe80::9850:f6ff:fe9c:5739.32819 > fe80::d09e:aff:fecf:a868.rmc: S 4049792650:4049792650(0) win 65535 
22:00:43.225022 IP6 fe80::d09e:aff:fecf:a868.rmc > fe80::9850:f6ff:fe9c:5739.32819: S 2055569200:2055569200(0) ack 4049792651 win 28560 
22:00:43.225051 IP6 fe80::9850:f6ff:fe9c:5739.32819 > fe80::d09e:aff:fecf:a868.rmc: . ack 1 win 32844 
22:00:43.225547 IP6 fe80::9850:f6ff:fe9c:5739.32819 > fe80::d09e:aff:fecf:a868.rmc: P 1:209(208) ack 1 win 32844 
22:00:43.225593 IP6 fe80::d09e:aff:fecf:a868.rmc > fe80::9850:f6ff:fe9c:5739.32819: . ack 209 win 232 
22:00:43.225638 IP6 fe80::d09e:aff:fecf:a868.rmc > fe80::9850:f6ff:fe9c:5739.32819: P 1:97(96) ack 209 win 232 
22:00:43.225721 IP6 fe80::9850:f6ff:fe9c:5739.32819 > fe80::d09e:aff:fecf:a868.rmc: P 209:377(168) ack 97 win 32844 
22:00:43.225835 IP6 fe80::d09e:aff:fecf:a868.rmc > fe80::9850:f6ff:fe9c:5739.32819: P 97:193(96) ack 377 win 240 
22:00:43.225910 IP6 fe80::9850:f6ff:fe9c:5739.32819 > fe80::d09e:aff:fecf:a868.rmc: P 377:457(80) ack 193 win 32844 
22:00:43.226076 IP6 fe80::d09e:aff:fecf:a868.rmc > fe80::9850:f6ff:fe9c:5739.32819: P 193:289(96) ack 457 win 240 
22:00:43.226154 IP6 fe80::9850:f6ff:fe9c:5739.32819 > fe80::d09e:aff:fecf:a868.rmc: P 457:529(72) ack 289 win 32844 
22:00:43.226210 IP6 fe80::d09e:aff:fecf:a868.rmc > fe80::9850:f6ff:fe9c:5739.32819: P 289:385(96) ack 529 win 240 
22:00:43.226276 IP6 fe80::9850:f6ff:fe9c:5739.32819 > fe80::d09e:aff:fecf:a868.rmc: P 529:681(152) ack 385 win 32844 
22:00:43.226335 IP6 fe80::d09e:aff:fecf:a868.rmc > fe80::9850:f6ff:fe9c:5739.32819: P 385:481(96) ack 681 win 249 
22:00:43.424049 IP6 fe80::9850:f6ff:fe9c:5739.32819 > fe80::d09e:aff:fecf:a868.rmc: . ack 481 win 32844 
22:00:44.725800 IP6 fe80::d09e:aff:fecf:a868.rmc > fe80::9850:f6ff:fe9c:5739.rmc: UDP, length 88
22:00:44.726111 IP6 fe80::9850:f6ff:fe9c:5739.rmc > fe80::d09e:aff:fecf:a868.rmc: UDP, length 88
22:00:50.137605 IP6 fe80::d09e:aff:fecf:a868.rmc > fe80::9850:f6ff:fe9c:5739.rmc: UDP, length 632
22:00:50.137900 IP6 fe80::9850:f6ff:fe9c:5739.rmc > fe80::d09e:aff:fecf:a868.rmc: UDP, length 88
22:00:50.183108 IP6 fe80::9850:f6ff:fe9c:5739.rmc > fe80::d09e:aff:fecf:a868.rmc: UDP, length 408
22:00:51.683382 IP6 fe80::9850:f6ff:fe9c:5739.rmc > fe80::d09e:aff:fecf:a868.rmc: UDP, length 408
22:00:51.683661 IP6 fe80::d09e:aff:fecf:a868.rmc > fe80::9850:f6ff:fe9c:5739.rmc: UDP, length 88

To be sure security requirements are met from the lpar I’m pinging the NovaLink host (the first one) which is answering and then I’m pinging the second lpar (the second ping) which is not working. (And this is what we want !!!).

# ping fe80::d09e:aff:fecf:a868
PING fe80::d09e:aff:fecf:a868 (fe80::d09e:aff:fecf:a868): 56 data bytes
64 bytes from fe80::d09e:aff:fecf:a868: icmp_seq=0 ttl=64 time=0.203 ms
64 bytes from fe80::d09e:aff:fecf:a868: icmp_seq=1 ttl=64 time=0.206 ms
64 bytes from fe80::d09e:aff:fecf:a868: icmp_seq=2 ttl=64 time=0.216 ms
^C
--- fe80::d09e:aff:fecf:a868 ping statistics ---
3 packets transmitted, 3 packets received, 0% packet loss
round-trip min/avg/max = 0/0/0 ms
# ping fe80::44a0:66ff:fe61:1b09
PING fe80::44a0:66ff:fe61:1b09 (fe80::44a0:66ff:fe61:1b09): 56 data bytes
^C
--- fe80::44a0:66ff:fe61:1b09 ping statistics ---
2 packets transmitted, 0 packets received, 100% packet loss

PowerVC 1.3.0.1 Dynamic Resource Optimizer

In addition to the NovaLink part of this blog post I also wanted to talk about the killer app of 2016. Dynamic Resource Optimizer. This feature can be used on any PowerVC 1.3.0.1 managed hosts (you obviously need at least to hosts). DRO is in charge to re-balance your Virtual Machines across all the available hosts (in the host-group). To sum up if a host is experiencing an heavy load and reaching a certain amount of CPU consumption over a period of time, DRO will move your virtual machines to re-balance the load across all the available hosts (this is done at a host level). Here are a few details about DRO:

The DRO configuration is done at a host level.
You setup a threshold (in the capture below) to reach to trigger the Live Partition Moblity or Mobily Cores movements (Power Entreprise Pool).

To be triggered this threshold must be reached a certain number of time (stabilization) over a period you are defining (run interval).
You can choose to move virtual machines using Live Partition Mobilty, or to move “cores” using Power Entreprise Pool (you can do both; moving CPU will always be preferred as moving partitions)
DRO can be run in advise mode (nothing is done, a warning is thrown in the new DRO events tab) or in active mode (which is doing the job and moving things).
Your most critical virtual machines can be excluded from DRO:

How is DRO choosing which machines are moved

I’m running DRO in production since now one month and I had the time to check what is going on behind the scene. How is DRO choosing which machines are moved when a Live Partition Moblity operation must be run to face an heavy load on a host ? To do so I decided to launch 3 different cpuhog (16 forks, 4VP, SMT4) processes (which are eating CPU ressource) on three different lpars with 4VP each. On the PowerVC I can check that before launching this processes the CPU consumption is ok on this host (the three lpars are running on the same host) :

# cat cpuhog.pl
#!/usr/bin/perl

print "eating the CPUs\n";

foreach $i (1..16) {
      $pid = fork();
      last if $pid == 0;
      print "created PID $pid\n";
}

while (1) {
      $x++;
}
# perl cpuhog.pl
eating the CPUs
created PID 47514604
created PID 22675712
created PID 3015584
created PID 21496152
created PID 25166098
created PID 26018068
created PID 11796892
created PID 33424106
created PID 55444462
created PID 65077976
created PID 13369620
created PID 10813734
created PID 56623850
created PID 19333542
created PID 58393312
created PID 3211988

I’m waiting a couple of minutes and I realize that the virtual machines on which the cpuhog processes were launched are the ones which are migrated. So we can say that PowerVC is moving the machine that are eating CPU (another strategy could be to move all the non-eating CPU machines to let the working ones do their job without launching a mobility operation).

# errpt | head -3
IDENTIFIER TIMESTAMP  T C RESOURCE_NAME  DESCRIPTION
A5E6DB96   0118225116 I S pmig           Client Partition Migration Completed
08917DC6   0118225116 I S pmig           Client Partition Migration Started

After the moves are ok I can see that the load is now ok on the host. DRO has done the job for me and moved the lpar to met the configured thresold ;-)

The images below will show you a good example of the “power” of PowerVC and DRO. To update my Virtual I/O Servers to the latest version the PowerVC maintenance mode was used to free up the Virtual I/O Servers. After leaving the maintenance mode the DRO was doing the job to re-balance the Virtual Machines across all the hosts (The red arrows symbolize the maintenance mode action and the purple ones the DRO actions). You can also see that some lpars were moved across 4 different hosts during this process. All these pictures are taken from real life experience on my production systems. This not a lab environment, this is one part of my production. So yes DRO and PowerVC 1.3.0.1 are production ready. Hell yes!

Conclusion

As my environment is growing bigger the next step for me will be to move on NovaLink on my P8 hosts. Please note that the NovaLink Co-Management feature is today a “TechPreview” but should be released GA very soon. Talking about DRO I was waiting for that for years and it finally happens. I can assure you that it is production ready, to prove this I’ll just give you this number. To upgrade my Virtual I/O Servers to 2.2.4.10 release using PowerVC maintenance mode and DRO more than 1000 Live Partition Mobility moves were performed without any outage on production servers and during working hours. Nobody in my company was aware of this during the operations. It was a seamless experience for everybody.

↧

What’s new in VIOS 2.2.4.10 and PowerVM : Part 2 Shared Processor Pool weighting

April 5, 2016, 2:49 pm

≫ Next: Continuous integration for your Chef AIX cookbooks (using PowerVC, Jenkins, test-kitchen and gitlab)

≪ Previous: NovaLink ‘HMC Co-Management’ and PowerVC 1.3.0.1Dynamic Resource Optimizer

First of all before beginning this blog post I owe you an explanation about these two months without new posts. These two months were very busy. On the personal side I was forced to move from my current apartment and had to find another one which was suitable for me (and I can assure you that this is not something really easy in Paris). As I was visiting apartments almost 3 days a week the time kept for writing blog posts (please remember that I’m doing that in my “after hours” work) was taken for something else :-(. At work things were crazy too, we had to build twelve new E870 boxes (with the provisioning toolkit and SRIOV adapters) and make them work with our current implementation of PowerVC. Then I had to do a huge vscsi to NPIV migration (more than 500 AIX machines to migrate from vscsi to NPIV and then move to P8 boxes in less than three weeks … yes more than 500 machines in less than 3 weeks (4000 zones created …). Thanks to the help of STG Lab Services consultant (Bonnie LeBarron) this was achieved using a modified version of her script (to fit our need (zoning and mapping part) (latest hmc releases)). I’m back in business now and I have planned a couple of blog posts this month. This first of this series is about the Shared Processor Pool weighting on the latest Power8 firmware versions. You’ll see that it changes a lot of things compared to P7 boxes.

A short history of Shared Processor Pool weighting

This long story began a few years ago for me (I’ll say at least 4 years ago) (I was planing to do a blog post about it a few years ago but decided not to do it because I was thinking this topic was considered as “sensible”, now that we have documentation and an official statement on this there is no reason to hide this anymore). I was working for a bank using two P795 with a lot of cores activated. We were using Multiple Shared Processor Pool in an unconventional way (as far as I remember two pools per customers one for Oracle and one for WAS, and we had more than 5 or 6 customers, so each box had at least 10 MSPP). As you may already know I only believe what I can see. So I decided to make tests on my own. By reading the Redbook I realized that there was not enough information about pool and partition weighting. We were like a lot of today’s customers having different weights for development (32), qualification (64), pre-production (128), production (192) and finally Virtual I/O Server (255). As we were using Shared Processor Pool I was expecting that when the Shared Processor Pool is full (contention) the weight will work and will prioritize the partition with the higher weight. What was my surprise when I realized the weighting was not working inside a Shared Processor Pool but only in the DefaultPool (Pool 0). Remember forever this statement on Power7 partition weighting is only working when the default pool is full. There is no “intelligence” in a Shared Processor Pool and you have to be very careful with the size of the pool because of that. On Power7 pools are used ONLY for licensing purpose. I then decided to contact my preferred IBM pre-sales in France to tell him about this incredible discovery. I had no answer for one month, then (as always) he came back with the answer of someone who already knows the truth about this. He introduced me to a performance expert (she was a performance expert at this time and is now specialized in security) and she was telling me that I was absolutely right with my discovery but that only a few people were aware of this. I decided to say nothing about it … but was sure that IBM realized there was a something to clarify about this. Then last year at the IBM Technical Collaboration Council I saw a PowerPoint slide telling that latest IBM Power8 firmware will add this long awaited feature. Partition weighting will work inside a Shared Processor Pool. Finally after waiting for more than four years I have what I want. As I was working on a new project in my current job I had to create a lot of Shared Processor Pool in a mixed Power7 (P770) and Power8 (E870) environment. It was the time to check if this new feature was really working and compare the differences between a Power8 (with latest firmware) and a Power7 machine (with latest firmware). The way we are implementing and monitoring the Shared Processor Pool on a Power8 will now be very different than it was on Power7 box. I think that this is really important and that everybody now needs to understand the differences for their future implementation. But let’s first have a look in the Redbooks to check the official statements:

The Redbook talking about this is “IBM PowerVM Virtualization Introduction and Configuration”, here is the key paragraph to understand (page 113 and 114):

It was super hard to find but there is place were IBM is talking about this. I’m below quoting this link: https://www.ibm.com/support/knowledgecenter/9119-MME/p8hat/p8hat_sharedproc.htm

When the firmware is at level 8.3.0, or earlier, uncapped weight is used only when more virtual processors consume unused resources than the available physical processors in the shared processor pool. If no contention exists for processor resources, the virtual processors are immediately distributed across the physical processors, independent of their uncapped weights. This can result in situations where the uncapped weights of the logical partitions do not exactly reflect the amount of unused capacity.

For example, logical partition 2 has one virtual processor and an uncapped weight of 100. Logical partition 3 also has one virtual processor, but an uncapped weight of 200. If logical partitions 2 and 3 both require more processing capacity, and there is not enough physical processor capacity to run both logical partitions, logical partition 3 receives two more processing units for every additional processing unit that logical partition 2 receives. If logical partitions 2 and 3 both require more processing capacity, and there is enough physical processor capacity to run both logical partitions, logical partition 2 and 3 receive an equal amount of unused capacity. In this situation, their uncapped weights are ignored.

When the firmware is at level 8.4.0, or later, if multiple partitions are assigned to a shared processor pool, the uncapped weight is used as an indicator of how the processor resources must be distributed among the partitions in the shared processor pool with respect to the maximum amount of capacity that can be used by the shared processor pool. For example, logical partition 2 has one virtual processor and an uncapped weight of 100. Logical partition 3 also has one virtual processor, but an uncapped weight of 200. If logical partitions 2 and 3 both require more processing capacity, logical partition 3 receives two additional processing units for every additional processing unit that logical partition 2 receives.

The server distributes unused capacity among all of the uncapped shared processor partitions that are configured on the server, regardless of the shared processor pools to which they are assigned. For example, if you configure logical partition 1 to the default shared processor pool and you configure logical partitions 2 and 3 to a different shared processor pool, all three logical partitions compete for the same unused physical processor capacity in the server, even though they belong to different shared processor pools.

Testing methodology

We now need to demonstrate that the behavior of the weighting is different between a Power7 and a Power8 machine, here is how we are going to proceed :

On a Power8 machine (E870 SC840_056) we create a Shared Processor Pool with a “Maximum Processing unit” set to 1.
On a Power7 machine we create a Shared Processor Pool with a “Maximum Processing unit” set to 1.
We create two partitions in the P8 pool (1VP, 0.1EC) called mspp1 and mspp2.
We create two partitions in the P7 pool (1VP, 0.1EC) called mspp3 and mspp4.
Using ncpu providev with the nstress tools (http://public.dhe.ibm.com/systems/power/community/wikifiles/PerfTools/nstress_AIX6_April_2014.tar) we create an heavy load on each partition. Obviously this load can’t be higher than 1 processing unit in total (sum of each physc).
We then use these testing scenarios (each test has a duration of 15 minutes, we are recording cpu and pool stats with nmon and lpar2rrd)

First partition with a weight of 128, the second partition with a weight of 128 (test with the same weight).
First partition with a weight of 64, the second partition with a weight of 128 (test weight multiply by two 1/2).
First partition with a weight of 32, the second partition with a weight of 128 (test weight multiply by four 1/4).
First partition with a weight of 1, the second partition with a weight of 2 (we try here to prove that the ratio between two values is more important that the value itself. Values of 1 and 2 should give us the same result as 64 and 128)
First partition with a weight of 1, the second partition with a weight of 255 (a ratio of 1:255) (you’ll see here that the result is pretty interesting ).

You’ll see that It will not be necessary to do all these tests on the P7 box …. :-)

The Power8 case

Prerequistes

Firmware P8 SC840* or SV840* are mandatory to enable the weighting in a Shared Processor Pool on a machine without contention for processor resources (no contention in the DefaultPool). This means that all P6, P7 and P8 (with a firmware < 840) machines do not have this feature coded in the firmware. My advice is to update all your P8 machines to the latest level to enable this new behavior.

Tests

For each test, we prove the weight of each partition using the lparstat command, then we capture a nmon file every 30 seconds and we launch ncpu for a duration of 15 minutes with four CPUs (we are in SMT4) on both P8 and P7 box. We will show you here that weight are taken into account in a Power8 MSPP, but are not taken into account in a Power7 MSPP.

#lparstat -i | grep -iE "Variable Capacity Weight|^Partition"
Partition Name                             : mspp1-23bad3d7-00000898
Partition Number                           : 3
Partition Group-ID                         : 32771
Variable Capacity Weight                   : 255
Desired Variable Capacity Weight           : 255
# /usr/bin/nmon -F /admin/nmon/$(hostname)_weight255.nmon -s30 -c30 -t ; ./ncpu -p 4 -s 900
# lparstat 1 10

Both weights at 128, you can check in the picture below that the “physc” value are strictly equal (0.5 for both lpars) (the ratio of 1 between the two weight is respected) :

One partition to 64 and one partition to 128, you can check in the pictures below (lparstat output, and nmon analyser graph) that we now have different values for the physc value (0.36 for the mssp2 lpar and 0.64 for the mssp1 lpar). We now have a ratio of 2, mspp1 physc is two time the mspp2 physc (the weights are respected in the Shared Processor Pool):

This lpar2rrd graph show you the weighting behavior on a Power8 machine (test one: both weights equal to 128, and test two: with two different weights of 128 and 64).

One partition to 32 and one partition to 128: you can check in the picture below that the ratio of 3 (32:128) is respected (physc value to 0.26 and 0.74).

One partition to 1 and one partition to 2. The results here are exactly the same as the second test (128 and 64 weights), it proves you that the important thing to configure are the ratio between the weights and not the value itself (using 1 2 3 weights will give you the exact same results as 2 4 6):

Finally one partition to 1 and one partition to 255. Be careful here the ratio is big enough to have an unresponsive lpar when loading both partitions. I do not recommend putting such high ratios because of this:

The Power7 case

Let’s do one test on a Power7 machine with on lpar with a weight of 1 and the other one with a weight of 255 … you’ll see a huge difference here … and I think it is clear enough to avoid doing all the test scenarios on the Power7 machine.

Tests

You can see here that I’m doing the exact same test, weight to 1 and 255, now both partition have an equal physc value (0.5 for both partitions). On a Power7 box the weights will be taken into account only if the DefaultPool (pool0) is full (contention). The pictures below show you the reality of the Multiple Shared Processors pool running on a Power7 box. On Power7 MSPP must be used only for licensing purpose and nothing else.

Conclusion

I hope you better understand the Multiple Shared Processor Pools differences between Power8 and Power7. Now that you are aware of this my advice is to have different strategies when you are implementing MSPP on Power7 and Power8. On Power7 double check and monitor your MSPP to be sure the pools are never full and that you can get enough capacity to run you load. On a Power8 box setup you weights wisely on your different environments (backup, production, development). You can then be sure that the production will be prioritized whatever appends even if you reduce your MSPP sizes, by doing this you’ll maximize licensing costs. As always I hope it help.

↧

Continuous integration for your Chef AIX cookbooks (using PowerVC, Jenkins, test-kitchen and gitlab)

May 2, 2016, 3:19 pm

≫ Next: Enhance your AIX packages management with yum and nim over http

≪ Previous: What’s new in VIOS 2.2.4.10 and PowerVM : Part 2 Shared Processor Pool weighting

My Journey to integrate Chef on AIX is still going on and I’m working more than ever on these topics. I know that using such tools is not something widely adopted by AIX customers. But what I also know is that whatever happens you will in a near -or distant- future use an automation tool. These tools are so widely used in the Linux world that you just can’t ignore it. The way you were managing your AIX ten years ago is not the same as what you are doing today, and what you do today will not be what you’ll do in the future. The AIX world needs a facelift to survive, a huge step has already be done (and is still ongoing) with PowerVC thanks to a fantastic team composed by very smart people at IBM (@amarteyp; @drewthorst, @jwcroppe, and all the other persons in this team!) The AIX world is now compatible with Openstack and with this other things are coming … such as automation. When all of these things will be ready AIX we will be able to offer something comparable to Linux. Openstack and automation are the first brick to what we call today “devops” (to be more specific it’s the ops part of the devops word).

I will today focus on how to manage your AIX machines using Chef. By using the word “how” I mean what are the best practices and infrastructures to build to start using Chef on AIX. If you remember my session about Chef on AIX at the IBM Technical University in Cannes I was saying that by using Chef your infrastructure will be testable, repeatable, and versionnable. We will focus on this blog post on how to do that. To test your AIX Chef cookbooks you will need to understand what is the test kitchen (we will use the test kitchen to drive PowerVC to build virtual machines on the fly and run the chef recipes on it). To repeat this over and over to be sure everything is working (code review, be sure that your cookbook is converging) ok without having to do anything we will use Jenkins to automate these tests. Then to version your cookbooks development we will use gitlab.

To better understand why I’m doing such a thing there is nothing better than a concrete example. My goal is to do all my AIX post-installation tasks using Chef (motd configuration, dns, devices attributes, fileset installation, enabling services … everything that you are today doing using korn shells scripts). Who has never experienced someone changing one of these scripts (most of the time without warning the other members of the team) resulting in a syntax error then resulting in an outage for all your new builds. Doing this is possible if you are in a little team creating one machine per month but is inconceivable in an environment driven by PowerVC where sysadmin are not doing anything “by hand”. In such an environment if someone is doing this kind of error all the new builds are failing …. even worse you’ll probably not be aware of this until someone who is connecting on the machine will say that there is an error (most of the time the final customer). By using continuous integration your AIX build will be tested at every change, all this changes will be stored in a git repository and even better you will not be able to put a change in production without passing all these tests. Even if using this is just mandatory to do that for people using PowerVC today people who are not can still do the same thing. By doing that you’ll have a clean and proper AIX build (post-install) and no errors will be possible anymore, so I highly encourage you to do this even if you are not adopting the Openstack way or even if today you don’t see the benefits. In the future this effort will pay. Trust me.

The test-kitchen

What is the kitchen

The test-kitchen is a tool that allows you to run your AIX Chef cookbooks and recipes in a quick way without having to do manual task. During the development of your recipes if you don’t use the test kitchen you’ll have many tasks to do manually. Build a virtual machine, install the chef client, copy the cookbook and the recipes, run it, check everything is in the state that you want. Imagine doing that on different AIX version (6.1, 7.1, 7.2) everytime you are changing something in your post-installation recipes (I was doing that before and I can assure you that creating and destroy machine over and over and over is just a waste of time). The test kitchen is here to do the job for you. It will build the machine for you (using the PowerVC kitchen driver), install the chef-client (using an omnibus server), copy the content of your cookbook (the files), run a bunch of recipe (described in what we call suites) and then test it (using bats, or serverspec). You can configure your kitchen to test different kind of images (6.1, 7.1, 7.2) and differents suites (cookbooks, recipes) depending on the environment you want to test. By default the test kitchen is using a Linux tool called Vagrant to build your VM. Obsiouvly Vagrant is not able to build an AIX machine, that’s why we will use a modified version of the kitchen-openstack driver (modified by my self) called kitchen-powervc to build the virtual machines:

Installing the kitchen and the PowerVC driver

If you have an access to an enterprise proxy you can directly download and install the gem files from your host (in my case this is a Linux on Power … so Linux on Power is working great for this).

Install the test kitchen :

# gem install --http-proxy http://bcreau:mypasswd@proxy:8080 test-kitchen
Successfully installed test-kitchen-1.7.2
Parsing documentation for test-kitchen-1.7.2
1 gem installed

Install kitchen-powervc :

# gem install --http-proxy http://bcreau:mypasswd@proxy:8080 kitchen-powervc
Successfully installed kitchen-powervc-0.1.0
Parsing documentation for kitchen-powervc-0.1.0
1 gem installed

Install kitchen-openstack :

# gem install --http-proxy http://bcreau:mypasswd@proxy:8080 kitchen-openstack
Successfully installed kitchen-openstack-3.0.0
Fetching: fog-core-1.38.0.gem (100%)
Successfully installed fog-core-1.38.0
Fetching: fuzzyurl-0.8.0.gem (100%)
Successfully installed fuzzyurl-0.8.0
Parsing documentation for kitchen-openstack-3.0.0
Installing ri documentation for kitchen-openstack-3.0.0
Parsing documentation for fog-core-1.38.0
Installing ri documentation for fog-core-1.38.0
Parsing documentation for fuzzyurl-0.8.0
Installing ri documentation for fuzzyurl-0.8.0
3 gems installed

If you don’t have the access to an enterprise proxy you can still download the gems from home and install it on your work machine:

# gem install test-kitchen kitchen-powervc kitchen-openstack -i repo --no-ri --no-rdoc
# # copy the files (repo directory) on your destination machine
# gem install *.gem

Setup the kitchen (.kitchen.yml file)

The kitchen configuration file is the .kitchen.yml, when you’ll run the kitchen command, the kitchen will look at this file. You have to put it in the chef-repo (where the cookbook directory is, the kitchen will copy the file from the cookbook to the test machine that’s why it’s important to put this file at the root of the chef-repo.) This file is separated in different sections:

The driver section. In this section you will configure howto created virtual machines. In our case how to connect to PowerVC (credentials, region). You’ll also tell in this section which image you want to use (PowerVC images), which flavor (PowerVC template) and which network will be used at the VM creation (please note that you can put some driver_config in the platform section, to tell which image or which ip you want to use for each specific platform.:

name: the name of the driver (here powervc).
openstack*: the PowerVC url, user, password, region, domain.
image_ref: the name of the image (we will put this in driver_config in the platform section).
flavor_ref: the name of the PowerVC template used at the VM creation.
fixed_ip: the ip_address used for the virtual machine creation.
server_name_prefix: each vm created by the kitchen will be prefixed by this parameter.
network_ref: the name of the PowerVC vlan to be used at the machine creation.
public_key_path: The kitchen needs to connect to the machine with ssh, you need to provide the public key used.
private_key_path: Same but for the private key.
username: The ssh username (we will use root, but you can use another user and then tell the kitchen to use sudo)
user_data: The activation input used by cloud-init we will in this one put the public key to be sure you can access the machine without password (it’s the PowerVC activation input).

driver:
  name: powervc
  server_wait: 100
  openstack_username: "root"
  openstack_api_key: "root"
  openstack_auth_url: "https://mypowervc:5000/v3/auth/tokens"
  openstack_region: "RegionOne"
  openstack_project_domain: "Default"
  openstack_user_domain: "Default"
  openstack_project_name: "ibm-default"
  flavor_ref: "mytemplate"
  server_name_prefix: "chefkitchen"
  network_ref: "vlan666"
  public_key_path: "/home/chef/.ssh/id_dsa.pub"
  private_key_path: "/home/chef/.ssh/id_dsa"
  username: "root"
  user_data: userdata.txt

#cloud-config
ssh_authorized_keys:
  - ssh-dss AAAAB3NzaC1kc3MAAACBAIVZx6Pic+FyUisoNrm6Znxd48DQ/YGNRgsed+fc+yL1BVESyTU5kqnupS8GXG2I0VPMWN7ZiPnbT1Fe2D[..]

The provisioner section: This section can be use to specify if you want to user chef-zero or chef-solo as a provisioner. You can also specify an omnibus url (use to download and install the chef-client at the machine creation time). In my case the omnibus url is a link to an http server “serving” a script (install.sh) installing the chef client fileset for AIX (more details later in the blog post). I’m also putting “sudo” to false as I’ll connect with the root user:

provisioner:
  name: chef_solo
  chef_omnibus_url: "http://myomnibusserver:8080/chefclient/install.sh"
  sudo: false

The platefrom section: The plateform section will describe each plateform that the test-kitchen can create (I’m putting here the image_ref and the fixed_ip for each plateform (AIX 6.1, AIX 7.1, AIX 7.2)

platforms:
  - name: aix72
    driver_config:
      image_ref: "kitchen-aix72"
      fixed_ip: "10.66.33.234"
  - name: aix71
    driver_config:
      image_ref: "kitchen-aix71"
      fixed_ip: "10.66.33.235"
  - name: aix61
    driver_config:
      image_ref: "kitchen-aix61"
      fixed_ip: "10.66.33.236"

The suite section: this section describe which cookbook and which recipes you want to run in the machines created by the test-kitchen. For the simplicity of this example I’m just running two recipe the first on called root_authorized_keys (creating the /root directory, changing the home directory of root and the putting a public key in the .ssh directory) and the second one call gem_source (we will check later in the post why I’m also calling this recipe):

suites:
  - name: aixcookbook
    run_list:
    - recipe[aix::root_authorized_keys]
    - recipe[aix::gem_source]
    attributes: { gem_source: { add_urls: [ "http://10.14.66.100:8808" ], delete_urls: [ "https://rubygems.org/" ] } }

The busser section: this section describe how to run you tests (more details later in the post ;-)

busser:
  sudo: false

After configuring the kitchen you can check the yml file is ok by listing what’s configured on the kitchen:

# kitchen list
Instance           Driver   Provisioner  Verifier  Transport  Last Action
aixcookbook-aix72  Powervc  ChefSolo     Busser    Ssh        
aixcookbook-aix71  Powervc  ChefSolo     Busser    Ssh        
aixcookbook-aix61  Powervc  ChefSolo     Busser    Ssh

Anatomy of a kitchen run

A kitchen run is divided into five steps. At first we are creating a virtual machine (the create action), then we are installing the chef-client (using an omnibus url) and running some recipes (converge), then we are installing testing tools on the virtual machine (in my case serverspec) (setup) and we are running the tests (verify). Finally if everything was ok we are deleting the virtual machines (destroy). Instead of running all theses steps one by one you can use the “test” option. This one will do destroy,create,converge,setup,verify,destroy in on single “pass”. Let’s check in details each steps:

Create: This will create the virtual machine using PowerVC. If you choose to use the “fixed_ip” option in the .kitchen.yml file this ip will be choose at the machine creation time. If you prefer to pick an ip from the network (in the pool) don’t set the “fixed_ip”. You’ll see the details in the picture below. You can at the end test the connectivity (transport) (ssh) to the machine using “kitchen login”. The ssh public key was automatically added using the userdata.txt file used by cloud-init at the machine creation time. After the machine is created you can use the “kitchen list” command to check the machine was successfully created:

# kitchen create

Converge: This will converge the kitchen (on more time converge = chef-client installation and running chef-solo with the suite configuration describing which recipe will be launched). The converge action will download the chef client and install it on the machine (using the omnibus url) and run the recipe specified in the suite stanza of the .kitchen.yml file. Here is the script I use for the omnibus installation this script is “served” by an http server:

# cat install.sh
#!/usr/bin/ksh
echo "[omnibus] [start] starting omnibus install"
echo "[omnibus] downloading chef client http://chefomnibus:8080/chefclient/lastest"
perl -le 'use LWP::Simple;getstore("http://chefomnibus:8080/chefclient/latest", "/tmp/chef.bff")'
echo "[omnibus] installing chef client"
installp -aXYgd /tmp/ chef
echo "[omnibus] [end] ending omnibus install"

The http server is serving this install.sh file. Here is the httpd.conf configuration file for the omnibus installation on AIX:

# ls -l /apps/chef/chefclient
total 647896
-rw-r--r--    1 apache   apache     87033856 Dec 16 17:15 chef-12.1.2-1.powerpc.bff
-rwxr-xr-x    1 apache   apache     91922944 Nov 25 00:24 chef-12.5.1-1.powerpc.bff
-rw-------    2 apache   apache     76375040 Jan  6 11:23 chef-12.6.0-1.powerpc.bff
-rwxr-xr-x    1 apache   apache          364 Apr 15 10:23 install.sh
-rw-------    2 apache   apache     76375040 Jan  6 11:23 latest
# cat httpd.conf
[..]
     Alias /chefclient/ "/apps/chef/chefclient/"
     
         Options Indexes FollowSymlinks MultiViews
       AllowOverride None
       Require all granted

# kitchen converge

Setup and verify: these actions will run a bunch of tests to verify the machine is in the state you want. The test I am writing are checking that the root home directory was created and the key was successfully created in the .ssh directory. In a few words you need to write tests checking that your recipes are working well (in chef words: “check that the machine is in the correct state”). In my case I’m using serverspec to describe my tests (there are different tools using for testing, you can also use bats). To describe the tests suite just create serverspec files (describing the tests) in the chef-repo directory (in ~/test/integration//serverspec in my case ~/test/integration/aixcookbook/serverspec). All the serverspec test files are suffixed by _spec:

# ls test/integration/aixcookbook/serverspec/
root_authorized_keys_spec.rb  spec_helper.rb

The “_spec” file describe the tests that will be run by the kitchen. In my very simple tests here I’m just checking my files exists and the content of the public_key is the same as my public_key (the key created by cloud-init in AIX is located in ~/.ssh and my test recipe here is changing the root home directory and putting the key in the right place). By looking at the file you can see that the serverspec language is very simple to understand:

# ls test/integration/aixcookbook/serverspec/
root_authorized_keys_spec.rb  spec_helper.rb

# cat spec_helper.rb
require 'serverspec'
set :backend, :exec
# cat root_authorized_keys_spec.rb
require 'spec_helper'

describe file('/root/.ssh') do
  it { should exist }
  it { should be_directory }
  it { should be_owned_by 'root' }
end

describe file('/root/.ssh/authorized_keys') do
  it { should exist }
  it { should be_owned_by 'root' }
  it { should contain 'from="1[..]" ssh-rsa AAAAB3NzaC1[..]' }
end

The kitchen will try to install needed ruby gems for serverspec (serverspec needs to be installed on the server to run the automated test). As my server has no connectivity to the internet I need to run my own gem server. I’m lucky all the gem needed are installed on my chef workstation (if you have no internet access from the workstation use the tip described at the beginning of this blog post). I just need to run a local gem server by running “gem server” on the chef workstation. The server is listening on port 8808 and will serve all the needed gems:

# gem list | grep -E "busser|serverspec"
busser (0.7.1)
busser-bats (0.3.0)
busser-serverspec (0.5.9)
serverspec (2.31.1)
# gem server
Server started at http://0.0.0.0:8808

If you look on the output above you can see that the recipe gem_server was executed. This recipe change the gem source on the virtual machine (from https://rubygems.org to my own local server). In the .kitchen.yml file the urls to add and remove to the gem source are specified in the suite attributes:

# cat gem_source.rb
ruby_block 'Changing gem source' do
  block do
    node['gem_source']['add_urls'].each do |url|
      current_sources = Mixlib::ShellOut.new('/opt/chef/embedded/bin/gem source')
      current_sources.run_command
      next if current_sources.stdout.include?(url)
      add = Mixlib::ShellOut.new("/opt/chef/embedded/bin/gem source --add #{url}")
      add.run_command
      Chef::Application.fatal!("Adding gem source #{url} failed #{add.status}") unless add.status == 0
      Chef::Log.info("Add gem source #{url}")
    end

    node['gem_source']['delete_urls'].each do |url|
      current_sources = Mixlib::ShellOut.new('/opt/chef/embedded/bin/gem source')
      current_sources.run_command
      next unless current_sources.stdout.include?(url)
      del = Mixlib::ShellOut.new("/opt/chef/embedded/bin/gem source --remove #{url}")
      del.run_command
      Chef::Application.fatal!("Removing gem source #{url} failed #{del.status}") unless del.status == 0
      Chef::Log.info("Remove gem source #{url}")
    end
  end
  action :run
end

# kitchen setup
# kitchen verify

Destroy: This will destroy the virtual machine on PowerVC.

# kitchen destroy

Now that you understand how the kitchen is working and that you are now able to run it to create and test AIX machines you are ready to use the kitchen to develop and create the chef cookbook that will fit your infrastructure. To run the all the steps “create,converge,setup,verify,destroy”, just use the “kitchen test” command:

# kitchen test

As you are going to change a lot of things in your cookbook you’ll need to version the code you are creating, for this we will use a gitlab server.

Gitlab: version your AIX cookbook

Unfortunately for you and for me I didn’t had the time to run gitlab on a Linux on Power machine. I’m sure it is possible (if you find a way to do this please mail me). Anyway my version of gitlab is running on an x86 box. The goal here is to allow the chef workstation (in my environment this user is “chef”) user to push all the new developments (providers, recipes) to the git development branch for this we will:

Allow the chef user to push its source to the git server trough ssh (we are creating a chefworkstation user and adding the key to authorize this user to push the changes to the git repository with ssh).

Create a new repository called aix-cookbook.

Push your current work to the master branch. The master branch will be the production branch.

# git config --global user.name "chefworkstation"
# git config --global user.email "chef@myworkstation.chmod666.org"
# git init
# git add -A .
# git commit -m "first commit"
# git remote add origin git@gitlabserver:chefworkstation/aix-cookbook.git
# git push origin master

Create a development branch (you’ll need to push all your new development to this branch, and you’ll never have to do anything else on the master branch as Jenkins is going to do the job for us.

# git checkout -b dev
# git commit -a
# git push origin dev

The git server is ready: we have a repository accessible by the chef user. Two branch created the dev one (the one we are working on used for all our development) and the master branch used for production that will be never touched by us and will be only updated (by jenkins) if all the tests (foodcritic, rubocop and the test-kitchen) are ok

Automating the continous integration with Jenkins

What is Jenkins

The goal of Jenkins is to automate all tests and run them over and over again every time a change is applied onto the cookbook you are developing. By using Jenkins you will be sure that every change will be tested and you will never push something that is not working or not passing the tests you have defined in your production environment. To be sure the cookbook is working as desired we will use three different tools. foodcritic will check the will check your chef cookbook for common problems by checking rules that are defined within the tools (this rules will check that everything is ok for the chef execution, so you will be sure that there is no syntax error, and that all the coding convention will be respected), rubocop will check the ruby syntax, and then we will run a kitchen test to be sure that the developement branch is working with the kitchen and that all our serverspec tests are ok. Jenkins will automate the following steps:

Pull the dev branch from git server (gitlab) if anything has changed on this branch.
Run foodcritic on the code.
If foodcritic tests are ok this will trigger the next step.
Pull the dev branch again
Run rubocop on the code.
If rubocop tests are ok this will trigger the next step.
Run the test-kitchen
This will build a new machine on PowerVC and test the cookbook against it (kitchen test).
If the test kitchen is ok push the dev branch to the master branch.
You are ready for production

First: Foodcritic

The first test we are running is foodcritic. Better than trying to do my own explanation of this with my weird english I prefer to quote the chef website:

Foodcritic is a static linting tool that analyzes all of the Ruby code that is authored in a cookbook against a number of rules, and then returns a list of violations. Because Foodcritic is a static linting tool, using it is fast. The code in a cookbook is read, broken down, and then compared to Foodcritic rules. The code is not run (a chef-client run does not occur). Foodcritic does not validate the intention of a recipe, rather it evaluates the structure of the code, and helps enforce specific behavior, detect portability of recipes, identify potential run-time failures, and spot common anti-patterns.

# foodcritic -f correctness ./cookbooks/
FC014: Consider extracting long ruby_block to library: ./cookbooks/aix/recipes/gem_source.rb:1

In Jenkins here are the steps to create a foodcritic test:

Pull dev branch from gitlab:

Check for changes (the Jenkins test will be triggered only if there was a change in the git repository):

Run foodcritic

After the build parse the code (to archive and record the evolution of the foodcritic errors) and run the rubocop project if the build is stable (passed without any errors):

To configure the parser go in the Jenkins configuration and add the foodcritic compiler warnings:

Second: Rubocop

The second test we are running is rubocop it’s a Ruby static code analyzer, based on the community Ruby style guide. Here is an example below

# rubocop .
Inspecting 71 files
..CCCCWWCWC.WC..CC........C.....CC.........C.C.....C..................C

Offenses:

cookbooks/aix/providers/fixes.rb:31:1: C: Assignment Branch Condition size for load_current_resource is too high. [20.15/15]
def load_current_resource
^^^
cookbooks/aix/providers/fixes.rb:31:1: C: Method has too many lines. [19/10]
def load_current_resource ...
^^^^^^^^^^^^^^^^^^^^^^^^^
cookbooks/aix/providers/sysdump.rb:11:1: C: Assignment Branch Condition size for load_current_resource is too high. [25.16/15]
def load_current_resource

In Jenkins here are the steps to create a rubocop test:

Do the same thing as foodcritic except for the build and post-build action steps:
Run rubocop:

After the build parse the code and run the test-kitchen project even if the build is fails (rubocop will generate tons of things to correct … once you are ok with rubocop change this to “trigger only if the build is stable”) :

Third: test-kitchen

I don’t have to explain again what is the test-kitchen ;-) . It is the third test we are creating with Jenkins and if this one is ok we are pushing the changes in production:

Do the same thing as foodcritic except for the build and post-build action steps:
Run the test-kitchen:

If the test kitchen is ok push dev branch to master branch (dev to production):

More about Jenkins

The three tests are now linked together. On the Jenkins home page you can check the current state of your tests. Here are a couple of screenshots:

Conclusion

I know that for most of you working this way is something totally new. As AIX sysadmins we are used to our ksh and bash scripts and we like the way it is today. But as the world is changing and as you are going to manage more and more machines with less and less admins you will understand how powerful it is to use automation and how powerful it is to work in a “continuous integration” way. Even if you don’t like this “concept” or this new work habit … give it a try and you’ll see that working this way is worth the effort. First for you … you’ll discover a lot of new interesting things, second for your boss that will discover that working this way is safer and more productive. Trust me AIX needs to face Linux today and we are not going anywhere without having a proper fight versus the Linux guys :-) (yep it’s a joke).

↧

Enhance your AIX packages management with yum and nim over http

July 21, 2016, 6:04 pm

≫ Next: Putting NovaLink in Production & more PowerVC (1.3.1.2) tips and tricks

≪ Previous: Continuous integration for your Chef AIX cookbooks (using PowerVC, Jenkins, test-kitchen and gitlab)

As AIX is getting older and older our old favorite OS is still trying to struggle versus the mighty Linux and the fantastic Solaris (no sarcasm in that sentence I truly believe what I say). You may have notice that -with time- IBM is slowly but surely moving from proprietary code to something more open (ie. PowerVC/Openstack projects, integration with Chef, Linux on Power and tons of other examples). I’m a little bit deviating from the main topic of this blog post but speaking about open source I have many things to say. If someone from my company is reading this post please note that it is my point of view … but I’m still sure that we are going the WRONG way not being more open, and not publishing on github. Starting from now every AIX IT shop in world must consider using OpenSource software (git, chef, ansible, zsh and so on) instead of maintaining homemade tools, or worse paying for tools that are 100 % of the time worse than OpenSource tools. Even better, every IT admin and every team must consider sharing their sources with the rest of the world for one single good reason: “Alone we can do so little, together we can do so much”. Every company not considering this today is doomed. Take example on Bloomberg, Facebook (sharing to the world all their Chef’s cookbooks), twitter, they’re all using github to share their opensource projects. Even military, police and banks are doing the same. They’re still secure but they are open to world ready work to make and create things better and better. All of this to introduce you to new things coming on AIX. Instead of reinventing the wheel IBM had the great idea to use already well implanted tools. It was the case for Openstack/PowerVC and it is also for the tools I’ll talk about in this post. It is the case for yum (yellowdog updater modified). Instead of installing rpm packages by hand you now have the possibility to use yum and to definitely end the rpm dependency nightmare that we all had since AIX 5L was released. Next instead of using the proprietary nimsh protocol to install filesets (bff package) you can now tell the nim server and nimclient to this over http/https (secure is only for the authentication as far as I know) (an open protocol :-) ). By doing this you will enhance the way you are managing packages on AIX. Do this now on every AIX system you install, yum everywhere and stop using NFS … we’re now in an http world :-)

yum: the yellow dog updater modified

I’m not going to explain you what yum is. If you don’t know you’re not in the right place. Just note that my advice starting from now is to use yum to install every software of the AIX toolbox (ftp://ftp.software.ibm.com/aix/freeSoftware/aixtoolbox/RPMS/ppc/). IBM is providing an official repository than can be mirrored on your own site to avoid having to use a proxy or having an access to the internet from you servers (you must admit that this is almost impossible and every big company will try to avoid this). Let’s start by trying to install yum:

Installing yum

IBM is providing an archive with all the needed rpm mandatory to use and install yum on an AIX server, you can find this archive here: ftp://public.dhe.ibm.com/aix/freeSoftware/aixtoolbox/ezinstall/ppc/yum_bundle_v1.tar. Just download it and install every rpm in it and yum will be available on you system, simple as that:

A specific version of rpm binary command is mandatory to use yum. Before doing anything update the rpm.rte fileset. As AIX is rpm “aware” it already have an rpm database, but this one will not be manageable by yum. The installation of rpm in a version greater than 4.9.1.3 is needed. This installation will migrate the existing rpm database to a new one usable by yum. The fileset in the right version can be found here ftp://ftp.software.ibm.com/aix/freeSoftware/aixtoolbox/INSTALLP/ppc/

By default the rpm command is installed by an AIX fileset:

# which rpm
/usr/bin/rpm
# lslpp -w /usr/bin/rpm
  File                                        Fileset               Type
  ----------------------------------------------------------------------------
  /usr/bin/rpm                                rpm.rte               File
# rpm --version
RPM version 3.0.5

The rpm database is located in /usr/opt/freeware/packages :

# pwd
/usr/opt/freeware/packages
# ls -ltr
total 5096
-rw-r--r--    1 root     system         4096 Jul 01 2011  triggerindex.rpm
-rw-r--r--    1 root     system         4096 Jul 01 2011  conflictsindex.rpm
-rw-r--r--    1 root     system        20480 Jul 21 00:54 nameindex.rpm
-rw-r--r--    1 root     system        20480 Jul 21 00:54 groupindex.rpm
-rw-r--r--    1 root     system      2009224 Jul 21 00:54 packages.rpm
-rw-r--r--    1 root     system       647168 Jul 21 00:54 fileindex.rpm
-rw-r--r--    1 root     system        20480 Jul 21 00:54 requiredby.rpm
-rw-r--r--    1 root     system        81920 Jul 21 00:54 providesindex.rpm

Install the rpm.rte fileset in the right version (4.9.1.3):

# file rpm.rte.4.9.1.3
rpm.rte.4.9.1.3: backup/restore format file
# installp -aXYgd . rpm.rte
+-----------------------------------------------------------------------------+
                    Pre-installation Verification...
+-----------------------------------------------------------------------------+
Verifying selections...done
Verifying requisites...done
Results...

SUCCESSES
---------
  Filesets listed in this section passed pre-installation verification
  and will be installed.

  Selected Filesets
  -----------------
  rpm.rte 4.9.1.3                             # RPM Package Manager
[..]
#####################################################
        Rebuilding RPM Data Base ...
        Please wait for rpm_install background job termination
        It will take a few minutes
[..]
Installation Summary
--------------------
Name                        Level           Part        Event       Result
-------------------------------------------------------------------------------
rpm.rte                     4.9.1.3         USR         APPLY       SUCCESS
rpm.rte                     4.9.1.3         ROOT        APPLY       SUCCESS

After the installation check you have the correct version of rpm, you can also notice some changes in the rpm database files:

# rpm --version
RPM version 4.9.1.3
# ls -ltr /usr/opt/freeware/packages
total 25976
-rw-r--r--    1 root     system         4096 Jul 01 2011  triggerindex.rpm
-rw-r--r--    1 root     system         4096 Jul 01 2011  conflictsindex.rpm
-rw-r--r--    1 root     system        20480 Jul 21 00:54 nameindex.rpm
-rw-r--r--    1 root     system        20480 Jul 21 00:54 groupindex.rpm
-rw-r--r--    1 root     system      2009224 Jul 21 00:54 packages.rpm
-rw-r--r--    1 root     system       647168 Jul 21 00:54 fileindex.rpm
-rw-r--r--    1 root     system        20480 Jul 21 00:54 requiredby.rpm
-rw-r--r--    1 root     system        81920 Jul 21 00:54 providesindex.rpm
-rw-r--r--    1 root     system            0 Jul 21 01:08 .rpm.lock
-rw-r--r--    1 root     system         8192 Jul 21 01:08 Triggername
-rw-r--r--    1 root     system         8192 Jul 21 01:08 Conflictname
-rw-r--r--    1 root     system        28672 Jul 21 01:09 Dirnames
-rw-r--r--    1 root     system       221184 Jul 21 01:09 Basenames
-rw-r--r--    1 root     system         8192 Jul 21 01:09 Sha1header
-rw-r--r--    1 root     system         8192 Jul 21 01:09 Requirename
-rw-r--r--    1 root     system         8192 Jul 21 01:09 Obsoletename
-rw-r--r--    1 root     system         8192 Jul 21 01:09 Name
-rw-r--r--    1 root     system         8192 Jul 21 01:09 Group
-rw-r--r--    1 root     system       815104 Jul 21 01:09 Packages
-rw-r--r--    1 root     system         8192 Jul 21 01:09 Sigmd5
-rw-r--r--    1 root     system         8192 Jul 21 01:09 Installtid
-rw-r--r--    1 root     system        86016 Jul 21 01:09 Providename
-rw-r--r--    1 root     system       557056 Jul 21 01:09 __db.004
-rw-r--r--    1 root     system     83894272 Jul 21 01:09 __db.003
-rw-r--r--    1 root     system      7372800 Jul 21 01:09 __db.002
-rw-r--r--    1 root     system        24576 Jul 21 01:09 __db.001

Then install yum. Please note that I already have some rpm installed on my current system that’s why I’m not installing db, or gdbm. If your system is free of any rpm install all the rpm found in the archive:

# tar xvf yum_bundle_v1.tar
x curl-7.44.0-1.aix6.1.ppc.rpm, 584323 bytes, 1142 media blocks.
x db-4.8.24-3.aix6.1.ppc.rpm, 2897799 bytes, 5660 media blocks.
x gdbm-1.8.3-5.aix5.2.ppc.rpm, 56991 bytes, 112 media blocks.
x gettext-0.10.40-8.aix5.2.ppc.rpm, 1074719 bytes, 2100 media blocks.
x glib2-2.14.6-2.aix5.2.ppc.rpm, 1686134 bytes, 3294 media blocks.
x pysqlite-1.1.7-1.aix6.1.ppc.rpm, 51602 bytes, 101 media blocks.
x python-2.7.10-1.aix6.1.ppc.rpm, 23333701 bytes, 45574 media blocks.
x python-devel-2.7.10-1.aix6.1.ppc.rpm, 15366474 bytes, 30013 media blocks.
x python-iniparse-0.4-1.aix6.1.noarch.rpm, 37912 bytes, 75 media blocks.
x python-pycurl-7.19.3-1.aix6.1.ppc.rpm, 162093 bytes, 317 media blocks.
x python-tools-2.7.10-1.aix6.1.ppc.rpm, 830446 bytes, 1622 media blocks.
x python-urlgrabber-3.10.1-1.aix6.1.noarch.rpm, 158584 bytes, 310 media blocks.
x readline-6.1-2.aix6.1.ppc.rpm, 489547 bytes, 957 media blocks.
x sqlite-3.7.15.2-2.aix6.1.ppc.rpm, 1334918 bytes, 2608 media blocks.
x yum-3.4.3-1.aix6.1.noarch.rpm, 1378777 bytes, 2693 media blocks.
x yum-metadata-parser-1.1.4-1.aix6.1.ppc.rpm, 62211 bytes, 122 media blocks.

# rpm -Uvh curl-7.44.0-1.aix6.1.ppc.rpm glib2-2.14.6-2.aix5.2.ppc.rpm pysqlite-1.1.7-1.aix6.1.ppc.rpm python-2.7.10-1.aix6.1.ppc.rpm python-devel-2.7.10-1.aix6.1.ppc.rpm python-iniparse-0.4-1.ai
x6.1.noarch.rpm python-pycurl-7.19.3-1.aix6.1.ppc.rpm python-tools-2.7.10-1.aix6.1.ppc.rpm python-urlgrabber-3.10.1-1.aix6.1.noarch.rpm yum-3.4.3-1.aix6.1.noarch.rpm yum-metadata-parser-1.1.4-
1.aix6.1.ppc.rpm
# Preparing...                ########################################### [100%]
   1:python                 ########################################### [  9%]
   2:pysqlite               ########################################### [ 18%]
   3:python-iniparse        ########################################### [ 27%]
   4:glib2                  ########################################### [ 36%]
   5:yum-metadata-parser    ########################################### [ 45%]
   6:curl                   ########################################### [ 55%]
   7:python-pycurl          ########################################### [ 64%]
   8:python-urlgrabber      ########################################### [ 73%]
   9:yum                    ########################################### [ 82%]
  10:python-devel           ########################################### [ 91%]
  11:python-tools           ########################################### [100%]

Yum is now ready to be configured and used :-)

# which yum
/usr/bin/yum
# yum --version
3.4.3
  Installed: yum-3.4.3-1.noarch at 2016-07-20 23:24
  Built    : None at 2016-06-22 14:13
  Committed: Sangamesh Mallayya  at 2014-05-29

Setting up yum and you private yum repository for AIX

A private repository

As nobody wants to use the official IBM repository available directly on internet the goal here is to create your own repository. Download all the content of the official repository and “serve” this directory (the one where you download all the rpms) on an private http server (yum is using http/https obviously :-) ).

Using wget download the content of the whole official repository. You can notice here that IBM is providing the metadata needed (repodata directory) (if you don’t have this repodata directory yum can’t work properly. This one can be created using the createrepo command available on akk good Linux distros ):

# wget -r ftp://public.dhe.ibm.com/aix/freeSoftware/aixtoolbox/RPMS/ppc/
# ls -ltr
[..]
drwxr-xr-x    2 root     system         4096 Jul 11 22:08 readline
drwxr-xr-x    2 root     system          256 Jul 11 22:08 rep-gtk
drwxr-xr-x    2 root     system         4096 Jul 11 22:08 repodata
drwxr-xr-x    2 root     system         4096 Jul 11 22:08 rpm
drwxr-xr-x    2 root     system         4096 Jul 11 22:08 rsync
drwxr-xr-x    2 root     system          256 Jul 11 22:08 ruby
drwxr-xr-x    2 root     system          256 Jul 11 22:09 rxvt
drwxr-xr-x    2 root     system         4096 Jul 11 22:09 samba
drwxr-xr-x    2 root     system          256 Jul 11 22:09 sawfish
drwxr-xr-x    2 root     system          256 Jul 11 22:09 screen
drwxr-xr-x    2 root     system          256 Jul 11 22:09 scrollkeeper

Configure you web server (here it’s just an alias because I’m using my http server for other things):

# more httpd.conf
[..]
Alias /aixtoolbox/  "/apps/aixtoolbox/"

    Options Indexes FollowSymLinks MultiViews
    AllowOverride None
    Require all granted

Restart your webserver and check you repository is accessible:

That’s it the private repository is ready.

Configuring yum

On the client just modify the /opt/freeware/etc/yum/yum.conf or add a file in /opt/freeware/etc/yum/yum.repos.d to point to your private repository:

# cat /opt/freeware/etc/yum/yum.conf
[main]
cachedir=/var/cache/yum
keepcache=1
debuglevel=2
logfile=/var/log/yum.log
exactarch=1
obsoletes=1

[AIX_Toolbox]
name=AIX ToolBox Repository
baseurl=http://nimserver:8080/aixtoolbox/
enabled=1
gpgcheck=0

# PUT YOUR REPOS HERE OR IN separate files named file.repo
# in /etc/yum/repos.d

That’s it the client is ready.

Chef recipe to install and configre yum

My readers all knows that I’m using Chef as a configuration management tools. As you are going to do this on every single system you have I think giving you the Chef recipe installing and configuring yum can be useful (if you don’t care about it just skip it and go to the next session). If you are not using a configuration management tool maybe this simple example will help you to move on and stop doing this by hand or writing ksh scripts. I have to do that on tons of system so for me it’s just mandatory. Here is my recipe to do all the job, configuring and installing yum, and installing some RPM:

directory '/var/tmp/yum' do
  action :create
end

remote_file '/var/tmp/yum/rpm.rte.4.9.1.3'  do
  source "http://#{node['nimserver']}/powervc/rpm.rte.4.9.1.3"
  action :create
end

execute "Do the toc" do
  command 'inutoc /var/tmp/yum'
  not_if { File.exist?('/var/tmp/yum/.toc') }
end

bff_package 'rpm.rte' do
  source '/var/tmp/yum/rpm.rte.4.9.1.3'
  action :install
end

tar_extract "http://#{node['nimserver']/powervc/yum_bundle_v1.tar" do
  target_dir '/var/tmp/yum'
  compress_char ''
  user 'root'
  group 'system'
end

# installing some rpm needed for yum
for rpm in [ 'curl-7.44.0-1.aix6.1.ppc.rpm', 'python-pycurl-7.19.3-1.aix6.1.ppc.rpm', 'python-urlgrabber-3.10.1-1.aix6.1.noarch.rpm', 'glib2-2.14.6-2.aix5.2.ppc.rpm', 'yum-metadata-parser-1.1.4-1.aix6.1.ppc.rpm', 'python-iniparse-0.4-1.aix6.1.noarch.rpm', 'pysqlite-1.1.7-1.aix6.1.ppc.rpm'  ]
  execute "installing yum" do
    command "rpm -Uvh /var/tmp/yum/#{rpm}"
    not_if "rpm -qa | grep $(echo #{rpm} | sed 's/.aix6.1//' | sed 's/.aix5.2//' | sed 's/.rpm//')"
  end
end

# updating python
execute "updating python" do
  command "rpm -Uvh /var/tmp/yum/python-devel-2.7.10-1.aix6.1.ppc.rpm /var/tmp/yum/python-2.7.10-1.aix6.1.ppc.rpm"
  not_if "rpm -qa | grep python-2.7.10-1"
end

# installing yum
execute "installing yum" do
  command "rpm -Uvh /var/tmp/yum/yum-3.4.3-1.aix6.1.noarch.rpm"
  not_if "rpm -qa | grep yum-3.4.3.1.noarch"
end

# changing yum configuration
template '/opt/freeware/etc/yum/yum.conf' do
  source 'yum.conf.erb'
end

# installing some software with aix yum
for soft in [ 'bash', 'bzip2', 'curl', 'emacs', 'gzip', 'screen', 'vim-enhanced', 'wget', 'zlib', 'zsh', 'patch', 'file', 'lua', 'nspr', 'git' ] do
  execute "install #{soft}" do
    command "yum -y install #{soft}"
  end
end

# removing temporary file
execute 'removing /var/tmp/yum' do
  command 'rm -rf /var/tmp/yum'
  only_if { File.exists?('/var/tmp/yum')}
end

After running the chef recipe yum is fully usable \o/ :

Using yum on AIX: what you need to know

yum is usable just like it is on a Linux system. You may hit some issues when using yum on AIX. For instance you can have this kind of errors:

# yum check
AIX-rpm-7.2.0.1-2.ppc has missing requires of rpm
AIX-rpm-7.2.0.1-2.ppc has missing requires of popt
AIX-rpm-7.2.0.1-2.ppc has missing requires of file-libs
AIX-rpm-7.2.0.1-2.ppc has missing requires of nss

If you are not aware of what is the purpose of AIX-rpm please read this. This rpm is what I call a meta package. It does not install anything. This rpm is used because the rpm database does not know anything about things (binaries, libraries) installed by standard AIX filesets. By default rpm are not “aware” of what is installed by a fileset (bff) but most of rpms depends on things installed by filesets. When you install a fileset … let’s say it install a library like libc.a AIX run the updtvpkg program to rebuild this AIX-rpm and says “this rpm will resolve any rpm dependencies issue for libc.a. So first, never try to uninstall this rpm, second it’s not a real problem is this rpm has missing dependencies …. as it is providing nothing. If you really want to see what dependencies resolve AIX-rpm run the following command:

# rpm -q --provides AIX-rpm-7.2.0.1-2.ppc | grep libc.a
libc.a(aio.o)
# lslpp -w /usr/lib/libc.a
  File                                        Fileset               Type
  ----------------------------------------------------------------------------
  /usr/lib/libc.a                             bos.rte.libc          Symlink

If you want to get rid of these messages just install the missing rpm … using yum:

# yum -y install popt file-libs

A few examples

Here are a few example a software installation using yum:

Installing git:

# yum install git
Setting up Install Process
Resolving Dependencies
--> Running transaction check
---> Package git.ppc 0:4.3.20-4 will be installed
--> Finished Dependency Resolution

Dependencies Resolved

================================================================================================================================================================================================
 Package                                    Arch                                       Version                                         Repository                                          Size
================================================================================================================================================================================================
Installing:
 git                                        ppc                                        4.3.20-4                                        AIX_Toolbox                                        215 k

Transaction Summary
================================================================================================================================================================================================
Install       1 Package

Total size: 215 k
Installed size: 889 k
Is this ok [y/N]: y
Downloading Packages:
Running Transaction Check
Running Transaction Test
Transaction Test Succeeded
Running Transaction
  Installing : git-4.3.20-4.ppc                                                                                                                                                             1/1

Installed:
  git.ppc 0:4.3.20-4

Complete!

Removing git :

# yum remove git
Setting up Remove Process
Resolving Dependencies
--> Running transaction check
---> Package git.ppc 0:4.3.20-4 will be erased
--> Finished Dependency Resolution

Dependencies Resolved

================================================================================================================================================================================================
 Package                                   Arch                                      Version                                           Repository                                          Size
================================================================================================================================================================================================
Removing:
 git                                       ppc                                       4.3.20-4                                          @AIX_Toolbox                                       889 k

Transaction Summary
================================================================================================================================================================================================
Remove        1 Package

Installed size: 889 k
Is this ok [y/N]: y
Downloading Packages:
Running Transaction Check
Running Transaction Test
Transaction Test Succeeded
Running Transaction
  Erasing    : git-4.3.20-4.ppc                                                                                                                                                             1/1

Removed:
  git.ppc 0:4.3.20-4

Complete!

List available repo

yum repolist
repo id                                                                                repo name                                                                                          status
AIX_Toolbox                                                                            AIX ToolBox Repository                                                                             233
repolist: 233

Getting rid of nimsh: USE HTTPS !

A new feature that is now available on latest version of AIX (7.2) allows you to use nim over http. It is a long awaited feature for different reasons (it’s just my opinion). I personally don’t like proprietary protocols such as nimsh and nimsh secure … security teams neither. Who has never experienced installation problems because of nimsh port not opened, because of ids, because of security teams ? Using http or https is the solution? No company is not allowing http or https ! This protocol is so used and secured, widely spread in a lot of products that everybody trust it. I personally prefer opening on single port than struggling opening all nimsh ports. You’ll understand that using http is far better than using nimsh. Before explaining this in details here are a few things you need to now. nimhttp is only available on latest version of AIX (7.2 SP0/1/2), same for the nimclient. If there is a problem using http the nimclient will automatically fallback in an NFS mode. Only certain nim operation are available over http:

Configuring the nim server

To use nim over http (nimhttp) you nim server must be at least deployed on an AIX 7.2 server (mine is updated to the latest service pack (SP2)). Start the service nimhttp on the nim server to allow nim to use http for its operations:

# oslevel -s
7200-00-02-1614
# startsrc -s nimhttp
0513-059 The nimhttp Subsystem has been started. Subsystem PID is 11665728.
# lssrc -a | grep nimhttp
 nimhttp                           11665728     active

The nimhttp service will listen on port 4901, this port is defined in the /etc/services :

# grep nimhttp /etc/services
nimhttp         4901/tcp
nimhttp         4901/udp
# netstat -an | grep 4901
tcp4       0      0  *.4901                 *.*                    LISTEN
# rmsock f1000e0004a483b8 tcpcb
The socket 0xf1000e0004a48008 is being held by proccess 14811568 (nimhttpd).
# ps -ef | grep 14811568
    root 14811568  4456760   0 04:03:22      -  0:02 /usr/sbin/nimhttpd -v

If you want to enable crypto/ssl to encrypt http authentication, just add the -a “-c” to your command line. This “-c” argument will tell nimhttp to start in secure mode and encrypt the authentication:

# startsrc -s nimhttp -a "-c"
0513-059 The nimhttp Subsystem has been started. Subsystem PID is 14811570.
# ps -ef | grep nimhttp
    root 14811570  4456760   0 22:57:51      -  0:00 /usr/sbin/nimhttpd -v -c

Starting the service for the first time will create an httpd.conf file in the root home directory :

# grep ^document_root ~/httpd.conf
document_root=/export/nim/
# grep ^service.log ~/httpd.conf
service.log=/var/adm/ras/nimhttp.log

If you choose to enable the secure authentication nimhttp will use the pem certificates file used by nim. If you are already using secure nimsh you don’t have to run the “nimconfig -c” command. If it is the first time this command will create the two pem files (root and server in /ssl_nim/certs) (check my blog post about secure nimsh for more information about that):

# nimconfig -c
# grep ^ssl. ~/httpd.conf
ssl.cert_authority=/ssl_nimsh/certs/root.pem
ssl.pemfile=/ssl_nimsh/certs/server.pem

The document_root of the http server will define the resource the nim http will “serve”. The default one is /export/nim (default nim place for all nim resources (spot, mksysb, lpp_source) and cannot be changed today (I think it is now ok on SP2, I’ll change the blog post as soon as the test will be done). Unfortunately for me one of my production nim was created by someone not very aware of AIX and … resources are not in /export/nim (I had to recreate my own nim because of that :-( )

On the client side ?

On the client side you just have nothing to do. If you’re using AIX 7.2 and nimhttp is enabled the client will automatically use http for communication (if it is enabled on the nim server). Just note that if you’re using nimhttp in secure mode, you must enable your nimclient in secure mode too:

# nimclient -c
Received 2788 Bytes in 0.0 Seconds
0513-044 The nimsh Subsystem was requested to stop.
0513-077 Subsystem has been changed.
0513-059 The nimsh Subsystem has been started. Subsystem PID is 13500758.
# stopsrc -s nimsh
# startsrc -s nimsh

Changing nimhttp port

You can easily change the port on which nimhttp is listening by modify the /etc/services file. Here is an example with the port 443 (I know this is not a good idea to use this one but it’s just for the example)

#nimhttp                4901/tcp
#nimhttp                4901/udp
nimhttp         443/tcp
nimhttp         443/udp
# stopsrc -s nimhttp
# startsrc -s nimhttp -a "-c"
# netstat -Aan | grep 443
f1000e00047fb3b8 tcp4       0      0  *.443                 *.*                   LISTEN
# rmsock f1000e00047fb3b8 tcpcb
The socket 0xf1000e00047fb008 is being held by proccess 14811574 (nimhttpd).

Same on the client side, just change the /etc/services file and use your nimclient as usual

# grep nimhttp /etc/services
#nimhttp                4901/tcp
#nimhttp                4901/udp
nimhttp         443/tcp
nimhttp         443/udp
# nimclient -l

To be sure I’m not using nfs anymore I’m removing any entries in my /etc/export file. I know that it will just work for some case (some type of resources) as nimesis is filling the file even if this one is empty:

# > /etc/exports
# exportfs -uav
exportfs: 1831-184 unexported /export/nim/bosinst_data/golden-vios-2233-08192014-bosinst_data
exportfs: 1831-184 unexported /export/nim/spot/golden-vios-22422-05072016-spot/usr
exportfs: 1831-184 unexported /export/nim/spot/golden-vios-22410-22012015-spot/usr
exportfs: 1831-184 unexported /export/nim/mksysb
exportfs: 1831-184 unexported /export/nim/hmc
exportfs: 1831-184 unexported /export/nim/lpp_source
[..]

Let’s do this

Let’s now try this with a simple example. I’m here installing powervp on a machine using a cust operation from the nimclient, on the client I’m doing like I have always do running the exact same command as before. Super simple:

# nimclient -o cust -a lpp_source=powervp1100-lpp_source -a filesets=powervp.rte

+-----------------------------------------------------------------------------+
                    Pre-installation Verification...
+-----------------------------------------------------------------------------+
Verifying selections...done
Verifying requisites...done
Results...

SUCCESSES
---------
  Filesets listed in this section passed pre-installation verification
  and will be installed.

  Selected Filesets
  -----------------
  powervp.rte 1.1.0.0                         # PowerVP for AIX

  << End of Success Section >>

+-----------------------------------------------------------------------------+
                   BUILDDATE Verification ...
+-----------------------------------------------------------------------------+
Verifying build dates...done
FILESET STATISTICS
------------------
    1  Selected to be installed, of which:
        1  Passed pre-installation verification
  ----
    1  Total to be installed

+-----------------------------------------------------------------------------+
                         Installing Software...
+-----------------------------------------------------------------------------+

installp: APPLYING software for:
        powervp.rte 1.1.0.0

0513-071 The syslet Subsystem has been added.
Finished processing all filesets.  (Total time:  4 secs).

+-----------------------------------------------------------------------------+
                                Summaries:
+-----------------------------------------------------------------------------+

Installation Summary
--------------------
Name                        Level           Part        Event       Result
-------------------------------------------------------------------------------
powervp.rte                 1.1.0.0         USR         APPLY       SUCCESS
powervp.rte                 1.1.0.0         ROOT        APPLY       SUCCESS

On the server side I’m checking the /var/adm/ras/nimhttp.log (log file for nimhttp) and I can check that files are transferred from the server to the client using the http protocol. So it works great.

# Thu Jul 21 23:44:19 2016        Request Type is GET
Thu Jul 21 23:44:19 2016        Mime not supported
Thu Jul 21 23:44:19 2016        Sending Response Header "200 OK"
Thu Jul 21 23:44:19 2016        Sending file over socket 6. Expected length is 600
Thu Jul 21 23:44:19 2016        Total length sent is 600
Thu Jul 21 23:44:19 2016        handle_httpGET: Entering cleanup statement
Thu Jul 21 23:44:20 2016        nim_http: queue socket create product (memory *)200739e8
Thu Jul 21 23:44:20 2016        nim_http: 200739e8 6 200947e8 20098138
Thu Jul 21 23:44:20 2016        nim_http: file descriptor is 6
Thu Jul 21 23:44:20 2016        nim_buffer: (resize) buffer size is 0
Thu Jul 21 23:44:20 2016        file descriptor is : 6
Thu Jul 21 23:44:20 2016        family is : 2 (AF_INET)
Thu Jul 21 23:44:20 2016        source address is : 10.14.33.253
Thu Jul 21 23:44:20 2016        socks: Removing socksObject 2ff1ec80
Thu Jul 21 23:44:20 2016        socks: 200739e8 132 <- 87 bytes (SSL)
Thu Jul 21 23:44:20 2016        nim_buffer: (append) len is 87, buffer length is 87
Thu Jul 21 23:44:20 2016        nim_http: data string passed to get_http_request: "GET /export/nim/lpp_source/powervp/powervp.1.1.0.0.bff HTTP/1.1

Let's do the same thing with a fileset coming from a bigger lpp_source (in fact an simage one for the latest release of AIX 7.2):

# nimclient -o cust -a lpp_source=7200-00-02-1614-lpp_source -a filesets=bos.loc.utf.en_KE
[..]

Looking on the nim server I notice that files are transfered from the server to the client, but NOT my fileset and it's dependencies .... but the whole lpp_source (seriously ? uh ? why ?)

# tail -f /var/adm/ras/nimhttp.log
Thu Jul 21 23:28:39 2016        Request Type is GET
Thu Jul 21 23:28:39 2016        Mime not supported
Thu Jul 21 23:28:39 2016        Sending Response Header "200 OK"
Thu Jul 21 23:28:39 2016        Sending file over socket 6. Expected length is 4482048
Thu Jul 21 23:28:39 2016        Total length sent is 4482048
Thu Jul 21 23:28:39 2016        handle_httpGET: Entering cleanup statement
Thu Jul 21 23:28:39 2016        nim_http: queue socket create product (memory *)200739e8
Thu Jul 21 23:28:39 2016        nim_http: 200739e8 6 200947e8 20098138
Thu Jul 21 23:28:39 2016        nim_http: file descriptor is 6
Thu Jul 21 23:28:39 2016        nim_buffer: (resize) buffer size is 0
Thu Jul 21 23:28:39 2016        file descriptor is : 6
Thu Jul 21 23:28:39 2016        family is : 2 (AF_INET)
Thu Jul 21 23:28:39 2016        source address is : 10.14.33.253
Thu Jul 21 23:28:39 2016        socks: Removing socksObject 2ff1ec80
Thu Jul 21 23:28:39 2016        socks: 200739e8 132 <- 106 bytes (SSL)
Thu Jul 21 23:28:39 2016        nim_buffer: (append) len is 106, buffer length is 106
Thu Jul 21 23:28:39 2016        nim_http: data string passed to get_http_request: "GET /export/nim/lpp_source/7200-00-02-1614/installp/ppc/X11.fnt.7.2.0.0.I HTTP/1.1

If you have a deeper look of what is nimclient doing when using nimhttp .... he is just transfering the whole lpp_source from the server to the client and then installing the needed fileset from a local filesystem. Filesets are storred into /tmp so be sure you have a /tmp bigger enough to store your biggest lpp_source. Maybe this will be changed in the future but it is like it is for the moment :-) . The nimclient is creating temporary directory named (prefix) "_nim_dir_" to store the lpp_source:

root@nim_server:/export/nim/lpp_source/7200-00-02-1614/installp/ppc# du -sm .
7179.57 .
root@nim_client:/tmp/_nim_dir_5964094/export/nim/lpp_source/7200-00-02-1614/installp/ppc# du -sm .
7179.74 .

More details ?

You can notice while running a cust operation from the nim client that nimhttp is also running in background (on the client itself). The truth is that the nimhttp binary running on client act as an http client. In the output below the http client is getting the file Java8_64.samples.jnlp.8.0.0.120.U and

# ps -ef |grep nim
    root  3342790 16253432   6 23:29:10  pts/0  0:00 /bin/ksh /usr/lpp/bos.sysmgt/nim/methods/c_installp -afilesets=bos.loc.utf.en_KE -alpp_source=s00va9932137:/export/nim/lpp_source/7200-00-02-1614
    root  6291880 13893926   0 23:29:10  pts/0  0:00 /bin/ksh /usr/lpp/bos.sysmgt/nim/methods/c_script -alocation=s00va9932137:/export/nim/scripts/s00va9954403.script
    root 12190194  3342790  11 23:30:06  pts/0  0:00 /usr/sbin/nimhttp -f /export/nim/lpp_source/7200-00-02-1614/installp/ppc/Java8_64.samples.jnlp.8.0.0.120.U -odest -s
    root 13500758  4325730   0 23:23:29      -  0:00 /usr/sbin/nimsh -s -c
    root 13893926 15991202   0 23:29:10  pts/0  0:00 /bin/ksh -c /var/adm/nim/15991202/nc.1469222947
    root 15991202 16974092   0 23:29:07  pts/0  0:00 nimclient -o cust -a lpp_source=7200-00-02-1614-lpp_source -a filesets=bos.loc.utf.en_KE
    root 16253432  6291880   0 23:29:10  pts/0  0:00 /bin/ksh /tmp/_nim_dir_6291880/script

You can use the nimhttp as a client to download file directly from the nim server. Here I'm just listing the content of /export/nim/lpp_source from the client

# nimhttp -f /export/nim/lpp_source -o dest=/tmp -v
nimhttp: (source)       /export/nim/lpp_source
nimhttp: (dest_dir)     /tmp
nimhttp: (verbose)      debug
nimhttp: (master_ip)    nimserver
nimhttp: (master_port)  4901

sending to master...
size= 59
pull_request= "GET /export/nim/lpp_source HTTP/1.1
Connection: close

"
Writing 1697 bytes of data to /tmp/export/nim/lpp_source/.content
Total size of datalen is 1697. Content_length size is 1697.
# cat /tmp/export/nim/lpp_source/.content
DIR: 71-04-02-1614 0:0 00240755 256
DIR: 7100-03-00-0000 0:0 00240755 256
DIR: 7100-03-01-1341 0:0 00240755 256
DIR: 7100-03-02-1412 0:0 00240755 256
DIR: 7100-03-03-1415 0:0 00240755 256
DIR: 7100-03-04-1441 0:0 00240755 256
DIR: 7100-03-05-1524 0:0 00240755 256
DIR: 7100-04-00-1543 0:0 00240755 256
DIR: 7100-04-01-1543 0:0 00240755 256
DIR: 7200-00-00-0000 0:0 00240755 256
DIR: 7200-00-01-1543 0:0 00240755 256
DIR: 7200-00-02-1614 0:0 00240755 256
FILE: MH01609.iso 0:0 00100644 1520027648
FILE: aixtools.python.2.7.11.4.I 0:0 00100644 50140160

Here I'm just downloading a python fileset !

# nimhttp -f /export/nim/lpp_source/aixtools.python.2.7.11.4.I -o dest=/tmp -v
[..]
Writing 65536 bytes of data to /tmp/export/nim/lpp_source/aixtools.python.2.7.11.4.I
Writing 69344 bytes of data to /tmp/export/nim/lpp_source/aixtools.python.2.7.11.4.I
Writing 7776 bytes of data to /tmp/export/nim/lpp_source/aixtools.python.2.7.11.4.I
Total size of datalen is 50140160. Content_length size is 50140160.
# ls -l /tmp/export/nim/lpp_source/aixtools.python.2.7.11.4.I
-rw-r--r--    1 root     system     50140160 Jul 23 01:21 /tmp/export/nim/lpp_source/aixtools.python.2.7.11.4.I

Allowed operation

All cust operations on nim objects type lpp_source, installp_bundle, fix_bundle, scripts, and file_res in push or pull are working great with nimhttp. Here are a few examples (from the official doc, thanks to Paul F for that ;-) ) :

Push:

# nim –o cust –a file_res= 
# nim –o cust –a script= 
# nim –o cust –a lpp_source= -a filesets= 
# nim –o cust –a lpp_source= -a installp_bundle= 
# nim –o cust –a lpp_source= ‐a fixes=update_all

Pull:

# nimclient -o cust -a lpp_source= -a filesets=
# nimclient –o cust –a file_res=
# nimclient –o cust –a script= nimclient –o cust –a lpp_source= -‐a filesets=
# nimclient –o cust –a lpp_source= -a installp_bundle=
# nimclient –o cust –a lpp_source= -a fixes=update”

Proxying: use your own http server

You can use you own webserver to host nimhttp and the nimhttp binary will just act as a proxy between your client and you http server. I have tried to do it but didn't succeed with that I'll let you know if I'm finding the solution:

# grep ^proxt ~/httpd.conf
service.proxy_port=80
enable_proxy=yes

Conclusion: "about administration and post-installation"

Just a few words about best practices of post-installation and administration on AIX. On on the major purpose of this blog post is to prove to you than you need to get rid of an old way of working. The first thing to do is always to try using http or https instead of NFS. To give you an example of that I'm always using http to transfer my files whatever it is (configuration, product installation and so on ...). With an automation tool such as Chef it is so simple to integrate the download of a file from an http server that you must now avoid using NFS ;-) . Second good practice is to never install things "by hand" and using yum is one of the reflex you need to have instead of using the rpm command (Linux users will laugh reading that ... I'm laughing writing that, using yum is just something I'm doing for more than 10 years ... but for AIX admins it's still not the case and not so simple to understand :-) ). As always I hope it helps.

About blogging

I just wanted to say one word about blogging because I got a lot of questions about this (from friends, readers, managers, haters, lovers). I'm doing this for two reasons. The first one is that writing and explaining things force me to better understand what I'm doing and force me to always discover new features, new bugs, new everything. Second I'm doing this for you, for my readers because I remember how blogs were useful to me when I began AIX (Chris and Nigel are the best example of that). I don't care about being the best or the worst. I'm just me. I'm doing this because I love that that's all. Even if manager, recruiters or anybody else don't care about it I'll continue to do this whatever appends. I agree with them "It does not prove anything at all". I'm just like you a standard admin trying to do his job at his best. Sorry for the two months "break" about blogging but it was really crazy at work and in my life. Take care all. Haters gonna hate.

↧

Putting NovaLink in Production & more PowerVC (1.3.1.2) tips and tricks

September 11, 2016, 4:27 pm

≫ Next: pvcctl : Using python Openstack api to code a PowerVC command line | Automating PowerVC and NovaLink (post)-installation with Ansible

≪ Previous: Enhance your AIX packages management with yum and nim over http

I’ve been quite busy and writing the blog is getting to be more and more difficult with the amount of work I have but I try to stick to my thing as writing these blogs posts is almost the only thing I can do properly in my whole life. So why do without ? As my place is one of the craziest place I have ever worked in -(for the good … and the bad (I’ll not talk here about how are the things organized here or how is the recognition of your work but be sure it is probably be one the main reason I’ll probably leave this place one day or another)- the PowerSystems growth is crazy and the number of AIX partitions we are managing with PowerVC never stops increasing and I think that we are one the biggest PowerVC customer in the whole world (I don’t know if it is a good thing or not). Just to give you a couple of examples we have here on the biggest Power Enterprise Pool I have ever seen (384 Power8 mobile cores), the number of partitions managed by PowerVC is around 2600 and we have a PowerVC managing almost 30 hosts. You have understand well … theses numbers are huge. It’s seems to be very funny, but it’s not ; the growth is problem, a technical problem and we are facing problems that most of you will never hit. I’m speaking about density and scalability. Hopefully for us the “vertical” design of PowerVC can now be replaced by what I call an “horizontal” design. Instead of putting all the nova instances on one single machine, we now have the possibility to spread the load on each host by using NovaLink. As we needed to solve these density and scalability problems we decided to move all the P8 hosts to NovaLink (this process is still ongoing but most of the engineering stuffs are already done). As you now know we are not deploying a host every year but generally a couple by month and that’s why we needed to find a solution to automate this. So this blog post will talk about all the things and the best practices I have learn using and implementing NovaLink in a huge production environment (automated installation, tips and tricks, post-install, migration and so on). But we will not stop here I’ll also talk about the new things I have learn about PowerVC (1.3.1.2 and 1.3.0.1) and give more tips and tricks to use the product as it best. Before going any further I’d first want to say a big thank you to the whole PowerVC team for their kindness and the precious time they gave to us to advise and educate the OpenStack noob I am. (A special thanks to Drew Thorstensen for the long discussions we had about Openstack and PowerVC. He is probably one the most passionate guy I have ever met at IBM).

Novalink Automated installation

I’ll not write big introduction, let’s work and let’s start with NovaLink and how to automate the Novalink installation process. Copy the content of the installation cdrom to a directory that can be served by an http server on your NIM server (I’m using my NIM server for the bootp and tftp part). Note that I’m doing this with a tar command because there are symbolic links in the iso and a simple cp will end up with a full filesystem.

# loopmount -i ESD_-_PowerVM_NovaLink_V1.0.0.3_062016.iso -o "-V cdrfs -o ro" -m /mnt
# tar cvf iso.tar /mnt/*
# tar xvf ios.tar -C /export/nim/lpp_source/powervc/novalink/1.0.0.3/iso
# ls -l /export/nim/lpp_source/powervc/novalink/1.0.0.3/iso
total 320
dr-xr-xr-x    2 root     system          256 Jul 28 17:54 .disk
-r--r--r--    1 root     system          243 Apr 20 21:27 README.diskdefines
-r--r--r--    1 root     system         3053 May 25 22:25 TRANS.TBL
dr-xr-xr-x    3 root     system          256 Apr 20 11:59 boot
dr-xr-xr-x    3 root     system          256 Apr 20 21:27 dists
dr-xr-xr-x    3 root     system          256 Apr 20 21:27 doc
dr-xr-xr-x    2 root     system         4096 Aug 09 15:59 install
-r--r--r--    1 root     system       145981 Apr 20 21:34 md5sum.txt
dr-xr-xr-x    2 root     system         4096 Apr 20 21:27 pics
dr-xr-xr-x    3 root     system          256 Apr 20 21:27 pool
dr-xr-xr-x    3 root     system          256 Apr 20 11:59 ppc
dr-xr-xr-x    2 root     system          256 Apr 20 21:27 preseed
dr-xr-xr-x    4 root     system          256 May 25 22:25 pvm
lrwxrwxrwx    1 root     system            1 Aug 29 14:55 ubuntu -> .
dr-xr-xr-x    3 root     system          256 May 25 22:25 vios

Prepare the PowerVM NovaLink repository. The content of the repository can be found in the NovaLink iso image in pvm/repo/pvmrepo.tgz:

# ls -l /export/nim/lpp_source/powervc/novalink/1.0.0.3/iso/pvm/repo/
total 720192
-r--r--r--    1 root     system          223 May 25 22:25 TRANS.TBL
-rw-r--r--    1 root     system         2106 Sep 05 15:56 pvm-install.cfg
-r--r--r--    1 root     system    368722592 May 25 22:25 pvmrepo.tgz

Extract the content of this tgz file in a directory that can be served by the http server:

# mkdir /export/nim/lpp_source/powervc/novalink/1.0.0.3/pvmrepo
# cp /export/nim/lpp_source/powervc/novalink/1.0.0.3/iso/pvm/repo/pvmrepo.tgz
# cd /export/nim/lpp_source/powervc/novalink/1.0.0.3/pvmrepo
# gunzip pvmrepo.tgz
# tar xvf pvmrepo.tar
[..]
x ./pool/non-free/p/pvm-core/pvm-core-dbg_1.0.0.3-160525-2192_ppc64el.deb, 54686380 bytes, 106810 media blocks.
x ./pool/non-free/p/pvm-core/pvm-core_1.0.0.3-160525-2192_ppc64el.deb, 2244784 bytes, 4385 media blocks.
x ./pool/non-free/p/pvm-core/pvm-core-dev_1.0.0.3-160525-2192_ppc64el.deb, 618378 bytes, 1208 media blocks.
x ./pool/non-free/p/pvm-pkg-tools/pvm-pkg-tools_1.0.0.3-160525-492_ppc64el.deb, 170700 bytes, 334 media blocks.
x ./pool/non-free/p/pvm-rest-server/pvm-rest-server_1.0.0.3-160524-2229_ppc64el.deb, 263084432 bytes, 513837 media blocks.
# rm pvmrepo.tar 
# ls -l 
total 16
drwxr-xr-x    2 root     system          256 Sep 11 13:26 conf
drwxr-xr-x    2 root     system          256 Sep 11 13:26 db
-rw-r--r--    1 root     system          203 May 26 02:19 distributions
drwxr-xr-x    3 root     system          256 Sep 11 13:26 dists
-rw-r--r--    1 root     system         3132 May 24 20:25 novalink-gpg-pub.key
drwxr-xr-x    4 root     system          256 Sep 11 13:26 pool

Copy the NovaLink boot files in a directory that can be served by your tftp server (I’m using /var/lib/tftpboot):

# mkdir /var/lib/tftpboot
# cp -r /export/nim/lpp_source/powervc/novalink/1.0.0.3/iso/pvm /var/lib/tftpboot
# ls -l /var/lib/tftpboot
total 1016
-r--r--r--    1 root     system         1120 Jul 26 20:53 TRANS.TBL
-r--r--r--    1 root     system       494072 Jul 26 20:53 core.elf
-r--r--r--    1 root     system          856 Jul 26 21:18 grub.cfg
-r--r--r--    1 root     system        12147 Jul 26 20:53 pvm-install-config.template
dr-xr-xr-x    2 root     system          256 Jul 26 20:53 repo
dr-xr-xr-x    2 root     system          256 Jul 26 20:53 rootfs
-r--r--r--    1 root     system         2040 Jul 26 20:53 sample_grub.cfg

I still don’t know why this is the case on AIX but the tftp server is searching for the grub.cfg in the root directory of your AIX system. It’s not the case for my RedHat Enterprise Linux installation but it’s the case for the NovaLink/Ubuntu installation. Copy the sample-grub.cfg to /grub.cfg and modify the content of the file:

As the gateway, netmask and nameserver will be provided the the pvm-install-config.cfg (the configuration file of the Novalink installer we will talk about this later) file comment those three lines.
The hostname will still be needed.
Modify the linux line and point to the vmlinux file provided in the NovaLink iso image.
Modify the live-installer to point to the filesystem.squashfs provided in the NovaLink iso image.
Modify the pvm-repo line to point to the pvm-repository directory we created before.
Modify the pvm-installer line to point to the NovaLink install configuration file (we will modify this one after).
Don’t do anything with the pvm-vios line as we are installing NovaLink on a system already having Virtual I/O Servers installed (I’m not installing Scale Out system but high end models only).
I’ll talk later about the pvm-disk line (this line is not by default in the pvm-install-config.template provided in the NovaLink iso image).

# cp /var/lib/tftpboot/sample_grub.cfg /grub.cfg
# cat /grub.cfg
# Sample GRUB configuration for NovaLink network installation
set default=0
set timeout=10

menuentry 'PowerVM NovaLink Install/Repair' {
 insmod http
 insmod tftp
 regexp -s 1:mac_pos1 -s 2:mac_pos2 -s 3:mac_pos3 -s 4:mac_pos4 -s 5:mac_pos5 -s 6:mac_pos6 '(..):(..):(..):(..):(..):(..)' ${net_default_mac}
 set bootif=01-${mac_pos1}-${mac_pos2}-${mac_pos3}-${mac_pos4}-${mac_pos5}-${mac_pos6}
 regexp -s 1:prefix '(.*)\.(\.*)' ${net_default_ip}
# Setup variables with values from Grub's default variables
 set ip=${net_default_ip}
 set serveraddress=${net_default_server}
 set domain=${net_ofnet_network_domain}
# If tftp is desired, replace http with tftp in the line below
 set root=http,${serveraddress}
# Remove comment after providing the values below for
# GATEWAY_ADDRESS, NETWORK_MASK, NAME_SERVER_IP_ADDRESS
# set gateway=10.10.10.1
# set netmask=255.255.255.0
# set namserver=10.20.2.22
  set hostname=nova0696010
# In this sample file, the directory novalink is assumed to exist on the
# BOOTP server and has the NovaLink ISO content
 linux /export/nim/lpp_source/powervc/novalink/1.0.0.3/iso/install/vmlinux \
 live-installer/net-image=http://${serveraddress}/export/nim/lpp_source/powervc/novalink/1.0.0.3/iso/install/filesystem.squashfs \
 pkgsel/language-pack-patterns= \
 pkgsel/install-language-support=false \
 netcfg/disable_dhcp=true \
 netcfg/choose_interface=auto \
 netcfg/get_ipaddress=${ip} \
 netcfg/get_netmask=${netmask} \
 netcfg/get_gateway=${gateway} \
 netcfg/get_nameservers=${nameserver} \
 netcfg/get_hostname=${hostname} \
 netcfg/get_domain=${domain} \
 debian-installer/locale=en_US.UTF-8 \
 debian-installer/country=US \
# The directory novalink-repo on the BOOTP server contains the content
# of the pvmrepo.tgz file obtained from the pvm/repo directory on the
# NovaLink ISO file.
# The directory novalink-vios on the BOOTP server contains the files
# needed to perform a NIM install of VIOS server(s)
#  pvmdebug=1
 pvm-repo=http://${serveraddress}/export/nim/lpp_source/powervc/novalink/1.0.0.3/novalink-repo/ \
 pvm-installer-config=http://${serveraddress}/export/nim/lpp_source/powervc/novalink/1.0.0.3/mnt/pvm/repo/pvm-install.cfg \
 pvm-viosdir=http://${serveraddress}/novalink-vios \
 pvmdisk=/dev/mapper/mpatha \
 initrd /export/nim/lpp_source/powervc/novalink/1.0.0.3/mnt/install/netboot_initrd.gz
}

Modify the pvm-install.cfg, it’s the NovaLink installer configuration file. We just need to modify here the [SystemConfig],[NovaLinkGeneralSettings],[NovaLinkNetworkSettings],[NovaLinkAPTRepoConfig] and [NovaLinkAdminCredential]. My advice is to configure one NovaLink by hand (by doing an installation directly with the iso image, then after the installation your configuration file is saved in /var/log/pvm-install/novalink-install.cfg. You can copy this one as your template on your installation server. This file is filled by the answers you gave during the NovaLink installation)

# more /export/nim/lpp_source/powervc/novalink/1.0.0.3/mnt/pvm/repo/pvm-install.cfg
[SystemConfig]
serialnumber = XXXXXXXX
lmbsize = 256

[NovaLinkGeneralSettings]
ntpenabled = True
ntpserver = timeserver1
timezone = Europe/Paris

[NovaLinkNetworkSettings]
dhcpip = DISABLED
ipaddress = YYYYYYYY
gateway = ZZZZZZZZ
netmask = 255.255.255.0
dns1 = 8.8.8.8
dns2 = 8.8.9.9
hostname = WWWWWWWW
domain = lab.chmod666.org

[NovaLinkAPTRepoConfig]
downloadprotocol = http
mirrorhostname = nimserver
mirrordirectory = /export/nim/lpp_source/powervc/novalink/1.0.0.3/mnt/
mirrorproxy =

[VIOSNIMServerConfig]
novalink_private_ip = 192.168.128.1
vios1_private_ip = 192.168.128.2
vios2_private_ip = 192.168.128.3
novalink_netmask = 255.255.128.0
viosinstallprompt = False

[NovaLinkAdminCredentials]
username = padmin
password = $6$N1hP6cJ32p17VMpQ$sdThvaGaR8Rj12SRtJsTSRyEUEhwPaVtCTvbdocW8cRzSQDglSbpS.jgKJpmz9L5SAv8qptgzUrHDCz5ureCS.
userdescription = NovaLink System Administrator

Finally modify the /etc/bootptab file and add a line matching your installation:

# tail -1 /etc/bootptab
nova0696010:bf=/var/lib/tftpboot/core.elf:ip=10.20.65.16:ht=ethernet:sa=10.255.228.37:gw=10.20.65.1:sm=255.255.255.0:

Don’t forget to setup an http server, serving all the needed files. I know this configuration is super unsecured. But honestly I don’t care my NIM server is in a super secured network just accessible by the VIOS and NovaLink partition. So I’m good :-) :

# cd /opt/freeware/etc/httpd/ 
# grep -Ei "^Listen|^DocumentRoot" conf/httpd.conf
Listen 80
DocumentRoot "/"

Instead of doing this over and over and over at every NovaLink installation I have written a custom script preparing my NovaLink installation file, what I do in this script is:

Preparing the pvm-install.cfg file.
Modifying the grub.cfg file.
Adding a line to the /etc/bootptab file.

#  ./custnovainstall.ksh nova0696010 10.20.65.16 10.20.65.1 255.255.255.0
#!/usr/bin/ksh

novalinkname=$1
novalinkip=$2
novalinkgw=$3
novalinknm=$4
cfgfile=/export/nim/lpp_source/powervc/novalink/novalink-install.cfg
desfile=/export/nim/lpp_source/powervc/novalink/1.0.0.3/mnt/pvm/repo/pvm-install.cfg
grubcfg=/export/nim/lpp_source/powervc/novalink/grub.cfg
grubdes=/grub.cfg

echo "+--------------------------------------+"
echo "NovaLink name: ${novalinkname}"
echo "NovaLink IP: ${novalinkip}"
echo "NovaLink GW: ${novalinkgw}"
echo "NovaLink NM: ${novalinknm}"
echo "+--------------------------------------+"
echo "Cfg ref: ${cfgfile}"
echo "Cfg file: ${cfgfile}.${novalinkname}"
echo "+--------------------------------------+"

typeset -u serialnumber
serialnumber=$(echo ${novalinkname} | sed 's/nova//g')

echo "SerNum: ${serialnumber}"

cat ${cfgfile} | sed "s/serialnumber = XXXXXXXX/serialnumber = ${serialnumber}/g" | sed "s/ipaddress = YYYYYYYY/ipaddress = ${novalinkip}/g" | sed "s/gateway = ZZZZZZZZ/gateway = ${novalinkgw}
/g" | sed "s/netmask = 255.255.255.0/netmask = ${novalinknm}/g" | sed "s/hostname = WWWWWWWW/hostname = ${novalinkname}/g" > ${cfgfile}.${novalinkname}
cp ${cfgfile}.${novalinkname} ${desfile}
cat ${grubcfg} | sed "s/  set hostname=WWWWWWWW/  set hostname=${novalinkname}/g" > ${grubcfg}.${novalinkname}
cp ${grubcfg}.${novalinkname} ${grubdes}
# nova1009425:bf=/var/lib/tftpboot/core.elf:ip=10.20.65.15:ht=ethernet:sa=10.255.248.37:gw=10.20.65.1:sm=255.255.255.0:
echo "${novalinkname}:bf=/var/lib/tftpboot/core.elf:ip=${novalinkip}:ht=ethernet:sa=10.255.248.37:gw=${novalinkgw}:sm=${novalinknm}:" >> /etc/bootptab

Novalink installation: vSCSI or NPIV ?

NovaLink is not designed to be installed of top of NPIV it’s a fact. As it is designed to be installed on a totally new system without any Virtual I/O Servers configured the NovaLink installation is by default creating the Virtual I/O Servers and using these VIOS the installation process is creating backing devices on top of logical volumes created in the default VIOS storage pool. Then the Novalink installation partition is created on top of these two logical volumes and at the end mirrored. This is the way NovaLink is doing for Scale Out systems.

For High End systems NovaLink is assuming your going to install the NovaLink partition on top of vSCSI (have personnaly tried with hdisk backed and SSP Logical Unit backed and both are working ok). For those like me who wants to install NovaLink on top of NPIV (I know this is not a good choice, but once again I was forced to do that) there still is a possiblity to do it. (In my humble opinion the NPIV design is done for high performance and the Novalink partition is not going to be an I/O intensive partition. Even worse our whole new design is based on NPIV for LPARs …. it’s a shame as NPIV is not a solution designed for high denstity and high scalability. Every PowerVM system administrator should remember this. NPIV IS NOT A GOOD CHOICE FOR DENSITY AND SCALABILITY USE IT FOR PERFORMANCE ONLY !!!. The story behind this is funny. I’m 100% sure that SSP is ten time a better choice to achieve density and scalability. I decided to open a poll on twitter asking this question “Will you choose SSP or NPIV to design a scalable AIX cloud based on PowerVC ?”. I was 100% sure SSP will win and made a bet with friend (I owe him beers now) that I’ll be right. What was my surprise when seeing the results. 90% of people vote for NPIV. I’m sorry to say that guys but there are two possibilities: 1/ You don’t really know what scalability and density means because you never faced it so that’s why you made the wrong choice. 2/ You know it and you’re just wrong :-) . This little story is another proof telling that IBM is not responsible about the dying of AIX and PowerVM … but unfortunately you are responsible of it not understanding that the only way to survive is to face high scalable solution like Linux is doing with Openstack and Ceph. It’s a fact. Period.)

This said … if you are trying to install NovaLink on top of NPIV you’ll get an error. A workaround to this problem is to add the following line to the grub.cfg file

 pvmdisk=/dev/mapper/mpatha \

If you do that you’ll be able to install NovaLink on your NPIV disk but still have an error the first time you’ll install it at the “grub-install step”. Just re-run the installation a second time and the grub-install command will work ok :-) (I’ll explain how to do to avoid this second issue later).

One work-around to this second issue is to recreate the initrd by adding a line in the debian-installer config file.

Fully automated installation by example

Here the core.elf file is downloaded by tftp. You can se in the capture below that the grub.cfg file is searched in / :

The installer is starting:

The vmlinux is downloaded (http):

The root.squashfs is downloaded (http):

The pvm-install.cfg configuration file is downloaded (http):

pvm services are started. At this time if you are running in co-management mode you’ll see the Red lock in the HMC Server status:

The Linux and Novalink istallation is ongoing:

System is ready:

Novalink code auto update

When adding a NovaLink host to PowerVC the powervc packages coming from the powervc management host will be installed on the NovaLink partition. You can check this during the installation. Here is what’s going on when adding the NovaLink host to PowerVC:

# cat /opt/ibm/powervc/log/powervc_install_2016-09-11-164205.log
################################################################################
Starting the IBM PowerVC Novalink Installation on:
2016-09-11T16:42:05+02:00
################################################################################

LOG file is /opt/ibm/powervc/log/powervc_install_2016-09-11-164205.log

2016-09-11T16:42:05.18+02:00 Installation directory is /opt/ibm/powervc
2016-09-11T16:42:05.18+02:00 Installation source location is /tmp/powervc_img_temp_1473611916_1627713/powervc-1.3.1.2
[..]
Setting up python-neutron (10:8.0.0-201608161728.ibm.ubuntu1.375) ...
Setting up neutron-common (10:8.0.0-201608161728.ibm.ubuntu1.375) ...
Setting up neutron-plugin-ml2 (10:8.0.0-201608161728.ibm.ubuntu1.375) ...
Setting up ibmpowervc-powervm-network (1.3.1.2) ...
Setting up ibmpowervc-powervm-oslo (1.3.1.2) ...
Setting up ibmpowervc-powervm-ras (1.3.1.2) ...
Setting up ibmpowervc-powervm (1.3.1.2) ...
W: --force-yes is deprecated, use one of the options starting with --allow instead.

***************************************************************************
IBM PowerVC Novalink installation
 successfully completed at 2016-09-11T17:02:30+02:00.
 Refer to
 /opt/ibm/powervc/log/powervc_install_2016-09-11-165617.log
 for more details.
***************************************************************************

Installing the missing deb packages if NovaLink host was added before PowerVC upgrade

If the NovaLink host was added in PowerVC 1.3.1.1 and you updated to PowerVC 1.3.1.2 you have to update the package by hand because there is a little bug during the update of some packages:

From the PowerVC management host copy the latest packages to the NovaLink host:

# scp /opt/ibm/powervc/images/powervm/powervc-powervm-compute-1.3.1.2.tgz padmin@nova0696010:~
padmin@nova0696010's password:
powervc-powervm-compute-1.3.1.2.tgz

Update the packages on the NovaLink host

# tar xvzf powervc-powervm-compute-1.3.1.2.tgz
# cd powervc-1.3.1.2/packages/powervm
# dpkg -i nova-powervm_2.0.3-160816-48_all.deb
# dpkg -i networking-powervm_2.0.1-160816-6_all.deb
# dpkg -i ceilometer-powervm_2.0.1-160816-17_all.deb
# /opt/ibm/powervc/bin/powervc-services restart

rsct and pvm deb update

Never forget to install latest rsct and pvm packages after the installation. You can clone the official IBM repository for pvm and rsct files (you can check my previous post about Novalink for more details about cloning the repository). Then create two files in /etc/apt/sources.list.d one for pvm, the other for rsct

# vi /etc/apt/sources.list.d/pvm.list
deb http://nimserver/export/nim/lpp_source/powervc/novalink/nova/debian novalink_1.0.0 non-free
# vi /etc/apt/source.list.d/rsct.list
deb http://nimserver/export/nim/lpp_source/powervc/novalink/rsct/ubuntu xenial main
# dpkg -l | grep -i rsct
ii  rsct.basic                                3.2.1.0-15300                           ppc64el      Reliable Scalable Cluster Technology - Basic
ii  rsct.core                                 3.2.1.3-16106-1ubuntu1                  ppc64el      Reliable Scalable Cluster Technology - Core
ii  rsct.core.utils                           3.2.1.3-16106-1ubuntu1                  ppc64el      Reliable Scalable Cluster Technology - Utilities
# # dpkg -l | grep -i pvm
ii  pvm-cli                                   1.0.0.3-160516-1488                     all          Power VM Command Line Interface
ii  pvm-core                                  1.0.0.3-160525-2192                     ppc64el      PVM core runtime package
ii  pvm-novalink                              1.0.0.3-160525-1000                     ppc64el      Meta package for all PowerVM Novalink packages
ii  pvm-rest-app                              1.0.0.3-160524-2229                     ppc64el      The PowerVM NovaLink REST API Application
ii  pvm-rest-server                           1.0.0.3-160524-2229                     ppc64el      Holds the basic installation of the REST WebServer (Websphere Liberty Profile) for PowerVM NovaLink 
# apt-get install rsct.core rsct.basic
Reading package lists... Done
Building dependency tree
Reading state information... Done
The following packages were automatically installed and are no longer required:
  docutils-common libpaper-utils libpaper1 python-docutils python-roman
Use 'apt autoremove' to remove them.
The following additional packages will be installed:
  rsct.core.utils src
The following packages will be upgraded:
  rsct.core rsct.core.utils src
3 upgraded, 0 newly installed, 0 to remove and 6 not upgraded.
Need to get 9,356 kB of archives.
After this operation, 548 kB disk space will be freed.
[..]
# apt-get install pvm-novalink
Reading package lists... Done
Building dependency tree
Reading state information... Done
The following packages were automatically installed and are no longer required:
  docutils-common libpaper-utils libpaper1 python-docutils python-roman
Use 'apt autoremove' to remove them.
The following additional packages will be installed:
  pvm-core pvm-rest-app pvm-rest-server pypowervm
The following packages will be upgraded:
  pvm-core pvm-novalink pvm-rest-app pvm-rest-server pypowervm
5 upgraded, 0 newly installed, 0 to remove and 1 not upgraded.
Need to get 287 MB of archives.
After this operation, 203 kB of additional disk space will be used.
Do you want to continue? [Y/n] Y
[..]

After the installation, here is what you should have if everything was updated properly:

dpkg -l | grep rsct
ii  rsct.basic                                3.2.1.4-16154-1ubuntu1                  ppc64el      Reliable Scalable Cluster Technology - Basic
ii  rsct.core                                 3.2.1.4-16154-1ubuntu1                  ppc64el      Reliable Scalable Cluster Technology - Core
ii  rsct.core.utils                           3.2.1.4-16154-1ubuntu1                  ppc64el      Reliable Scalable Cluster Technology - Utilities
dpkg -l | grep pvm
ii  pvm-cli                                   1.0.0.3-160516-1488                     all          Power VM Command Line Interface
ii  pvm-core                                  1.0.0.3.1-160713-2441                   ppc64el      PVM core runtime package
ii  pvm-novalink                              1.0.0.3.1-160714-1152                   ppc64el      Meta package for all PowerVM Novalink packages
ii  pvm-rest-app                              1.0.0.3.1-160713-2417                   ppc64el      The PowerVM NovaLink REST API Application
ii  pvm-rest-server                           1.0.0.3.1-160713-2417                   ppc64el      Holds the basic installation of the REST WebServer (Websphere Liberty Profile) for PowerVM NovaLink

Novalink post-installation (my ansible way to do that)

You all now know that I’m not very fond of doing the same things over and over again, that’s why I have create an ansible post-install playbook especially for NovaLink post installation. You can download it here: nova_ansible. Then install ansible on a host that has an ssh access to all your NovaLink partitions and run the the ansible playbook:

Untar the ansible playbook:

# mkdir /srv/ansible
# cd /srv/ansible
# tar xvf novalink_ansible.tar

Modify the group_vars/novalink.yml to fit your environment:

# cat group_vars/novalink.yml
ntpservers:
  - ntpserver1
  - ntpserver2
dnsservers:
  - 8.8.8.8
  - 8.8.9.9
dnssearch:
  - lab.chmod666.org
vepa_iface: ibmveth6
repo: nimserver

Share root ssh key to the NovaLink host (be careful by default NovaLink does not allow root login you have to modify the sshd configuration file):

Put all your Novalink hosts into the inventory file:

#cat inventories/hosts.novalink
[novalink]
nova65a0cab
nova65ff4cd
nova10094ef
nova06960ab

Run ansible-playbook and you’re done:

# ansible-playbook -i inventories/hosts.novalink site.yml

More details about NovaLink

MGMTSWITCH vswitch automatic creation

Do not try to create the MGMTSWITCH by yourself. The NovaLink installer is doing it for you. As my Virtual I/O Servers are installed using the IBM Provisioning Toolkit for PowerVM … I was creating the MGMTSWITCH at this time but I was wrong. You can see this in the file /var/log/pvm-install/pvminstall.log on the NovaLink partition:

# cat /var/log/pvm-install/pvminstall.log
Fri Aug 12 17:26:07 UTC 2016: PVMDebug = 0
Fri Aug 12 17:26:07 UTC 2016: Running initEnv
[..]
Fri Aug 12 17:27:08 UTC 2016: Using user provided pvm-install configuration file
Fri Aug 12 17:27:08 UTC 2016: Auto Install set
[..]
Fri Aug 12 17:27:44 UTC 2016: Auto Install = 1
Fri Aug 12 17:27:44 UTC 2016: Validating configuration file
Fri Aug 12 17:27:44 UTC 2016: Initializing private network configuration
Fri Aug 12 17:27:45 UTC 2016: Running /opt/ibm/pvm-install/bin/switchnetworkcfg -o c
Fri Aug 12 17:27:46 UTC 2016: Running /opt/ibm/pvm-install/bin/switchnetworkcfg -o n -i 3 -n MGMTSWITCH -p 4094 -t 1
Fri Aug 12 17:27:49 UTC 2016: Start setupinstalldisk operation for /dev/mapper/mpatha
Fri Aug 12 17:27:49 UTC 2016: Running updatedebconf
Fri Aug 12 17:56:06 UTC 2016: Pre-seeding disk recipe

NPIV lpar creation problem !

As you know my environment is crazy. Every lpar we are creating have 4 virtual fibre channels adapters. Obviously two on fabric A and two on fabric B. And obviously again each fabric must be present on each Virtual I/O Servers. So to sum up. An lpar must have access to fabric A and B using VIOS1 and to fabric A and B using VIOS2. Unfortunately there was a little bug in the current NovaLink (1.0.0.3) code and all the lpar created were created with only two adapters. The PowerVC team gave my a patch to handle this particular issue patching the npiv.py file. This patch needs to be installed on the NovaLink partition itself.:

# cd /usr/lib/python2.7/dist-packages/powervc_nova/virt/ibmpowervm/pvm/volume
# sdiff npiv.py.back npiv.bck

I’m intentionally not giving you the solution here (just by copying/pasting code) because an issue is addressed and an APAR has been opened for this issue and is resolved in 1.3.1.2 version. IT16534

From NovaLink to HMC …. and the opposite

One of the challenge for me was to be sure everything was working ok regarding LPM and NovaLink. So I decided to test different cases:

From NovaLink host to Novalink host (didn’t had any trouble)
From NovaLink host to HMC host (didn’t had any trouble)
From HMC host to Novalink host (had a trouble)

Once again this issue avoiding HMC to Novalink LPM to work correctly is related to storage. A patch is ongoing but let me explain this issue a little bit (only if you have to absolutely move an LPAR from HMC to NovaLink and your are in the same case as I am):

PowerVC is not correctly doing the mapping to the destination Virtual I/O Servers and is trying to map two times the fabric A on the VIOS1 and two time the fabric B on the VIOS2. Hopefully for us you can do the migration by hand :

Do the LPM operation from PowerVC and check on the HMC side how PowerVC is doing the mapping (log on the HMC to check this):

#  lssvcevents -t console -d 0 | grep powervc_admin | grep migrlpar
time=08/31/2016 18:53:27,"text=HSCE2124 User name powervc_admin: migrlpar -m 9119-MME-656C38A -t 9119-MME-65A0C31 --id 18 --ip 10.22.33.198 -u wlp -i ""virtual_fc_mappings=6/vios1/2//fcs2,3/vios2/1//fcs2,4/vios2/1//fcs1,5/vios1/2//fcs1"",shared_proc_pool_id=0 -o m command failed."

One interesting point you can see here is that the NovaLink user used for LPM is not padmin but wlp. Have look on the Novalink machine if you are a little bit curious:

If you are double checking the mapping you’ll see that PowerVC is mixing up the VIOS. Just rerun the command in the right order and you’ll see that you’re going to be able to do HMC to NovaLink LPM (By the way PowerVC is automattically detecting that the host has changed for this lpar (moved outside of PowerVC)):

# migrlpar -m 9119-MME-656C38A -t 9119-MME-65A0C31 --id 18 --ip 10.22.33.198 -u wlp -i '"virtual_fc_mappings=6/vios2/1//fcs2,3/vios1/2//fcs2,4/vios2/1//fcs1,5/vios1/2//fcs1"',shared_proc_pool_id=0 -o m # lssvcevents -t console -d 0 | grep powervc_admin | grep migrlpar time=08/31/2016 19:13:00,"text=HSCE2123 User name powervc_admin: migrlpar -m 9119-MME-656C38A -t 9119-MME-65A0C31 --id 18 --ip 10.22.33.198 -u wlp -i ""virtual_fc_mappings=6/vios2/1//fcs2,3/vios1/2//fcs2,4/vios2/1//fcs1,5/vios1/2//fcs1"",shared_proc_pool_id=0 -o m command was executed successfully."

One more time don't worry about this issue a patch is on the way. But I thought it was interessting to talk about it just to show you how PowerVC is handling this (user, key sharing, check on the HMC).

Deep dive into the initrd

I am curious and there is no way to change this. As I wanted to know how the NovaLink installer is working I had to check into the netboot_initrd.gz file. There are a lot of interesting stuff to check in this initrd. Run the commands below on a Linux partition if you also want to have a look:

# scp nimdy:/export/nim/lpp_source/powervc/novalink/1.0.0.3/iso/install/netboot_initrd.gz .
# gunzip netboot_initrd
# cpio -i < netboot_initrd
185892 blocks

The installer is located in opt/ibm/pvm-install:

# ls opt/ibm/pvm-install/data/
40mirror.pvm  debpkgs.txt  license.txt  nimclient.info  pvm-install-config.template  pvm-install-preseed.cfg  rsct-gpg-pub.key  vios_diagram.txt
# ls opt/ibm/pvm-install/bin
assignio.py        envsetup        installpvm                    monitor        postProcessing    pvmwizardmain.py  restore.py        switchnetworkcfg  vios
cfgviosnetwork.py  functions       installPVMPartitionWizard.py  network        procmem           recovery          setupinstalldisk  updatedebconf     vioscfg
chviospasswd       getnetworkinfo  ioadapter                     networkbridge  pvmconfigdata.py  removemem         setupviosinstall  updatenimsetup    welcome.py
editpvmconfig      initEnv         mirror                        nimscript      pvmtime           resetsystem       summary.py        user              wizpkg

You can for instance check what's the installer is exactly doing. Let's take again the exemple of the MGMTSWITCH creation, you can see in the output below that I was right saying that:

Remember that I was telling you before that I had problem with installation on NPIV. You can avoid installing NovaLink two times by modifying the debian installer directly in the initrd by adding a line in the debian installer file opt/ibm/pvm-install/data/pvm-install-preseed.cfg (you have to rebuild the initrd after doing this) :

# grep bootdev opt/ibm/pvm-install/data/pvm-install-preseed.cfg
d-i grub-installer/bootdev string /dev/mapper/mpatha
# find | cpio -H newc -o > ../new_initrd_file
# gzip -9 ../new_initrd_file
# scp ../new_initrdfile.gz nimdy:/export/nim/lpp_source/powervc/novalink/1.0.0.3/iso/install/netboot_initrd.gz

You can also find good example here of pvmctl commands:

# grep -R pvmctl *
pvmctl lv create --size $LV_SIZE --name $LV_NAME -p id=$vid
pvmctl scsi create --type lv --vg name=rootvg --lpar id=1 -p id=$vid --stor-id name=$LV_NAME

Troubleshooting

NovaLink is not PowerVC so here is a little reminder of what I do to troubleshot Novalink:

Installation troubleshooting:

#cat /var/log/pvm-install/pvminstall.log

Neutron Agent log (always double check this one):

# cat /var/log/neutron/neutron-powervc-pvm-sea-agent.log

Nova logs for this host are not accessible on the PowerVC management host anymore, so check it on the NovaLink partition if needed:

# cat /var/log/nova/nova-compute.log

pvmctl logs:

# cat /var/log/pvm/pvmctl.log

One last thing to add about NovaLink. One thing I like a lot is that Novalink is doing backups of the system and VIOS hourly/daily. These backup are stored in /var/backup/pvm :

# crontab -l
# VIOS hourly backups - at 15 past every hour except for midnight
15 1-23 * * * /usr/sbin/pvm-backup --type vios --frequency hourly
# Hypervisor hourly backups - at 15 past every hour except for midnight
15 1-23 * * * /usr/sbin/pvm-backup --type system --frequency hourly
# VIOS daily backups - at 15 past midnight
15 0    * * * /usr/sbin/pvm-backup --type vios --frequency daily
# Hypervisor daily backups - at 15 past midnight
15 0    * * * /usr/sbin/pvm-backup --type system --frequency daily
#ls -l /var/backups/pvm
total 4
drwxr-xr-x 2 root pvm_admin 4096 Sep  9 00:15 9119-MME*0265FF47B

More PowerVC tips and tricks

Let's finish this blog post with more PowerVC tips and tricks. Before giving you the tricks I have to warn you. All of these tricks are not supported by PowerVC, use them at your own risk OR contact your support before doing anything else. You may break and destroy everything if you are not aware of what you are doing. So please be very careful using all these tricks. YOU HAVE BEEN WARNED !!!!!!

Accessing and querying the database

This first trick is funny and will allow you to query and modify the PowerVC database. Once again do this a your own risks. One of the issue I had was strange. I do not remeber how it happends exactly but some of my luns that were not attached to any hosts and were still showing an attachmenent number equals to 1 and I didn't had the possibility to remove it. Even worse someone has deleted these luns on the SVC side. So these luns were what I called "ghost lun". Non existing but non-deletable luns. (I had also to remove the storage provider related to these luns). The only way to change this was to change the state to detached directly in the cinder database. Be careful this trick is only working with MariaDB.

First get the database password. Get the encrypted password from /opt/ibm/powervc/data/powervc-db.conf file and decode it to have the clear password:

# grep ^db_password /opt/ibm/powervc/data/powervc-db.conf
db_password = aes-ctr:NjM2ODM5MjM0NTAzMTg4MzQzNzrQZWi+mrUC+HYj9Mxi5fQp1XyCXA==
# python -c "from powervc_keystone.encrypthandler import EncryptHandler; print EncryptHandler().decode('aes-ctr:NjM2ODM5MjM0NTAzMTg4MzQzNzrQZWi+mrUC+HYj9Mxi5fQp1XyCXA==')"
OhnhBBS_gvbCcqHVfx2N
# mysql -u root -p cinder
Enter password:
MariaDB [cinder]> MariaDB [cinder]> show tables;
+----------------------------+
| Tables_in_cinder           |
+----------------------------+
| backups                    |
| cgsnapshots                |
| consistencygroups          |
| driver_initiator_data      |
| encryption                 |
[..]

Then get the lun uuid on the PowerVC gui for the lun you want to change, and follow the commands below:

MariaDB [cinder]> select * from volume_attachment where volume_id='9cf6d85a-3edd-4ab7-b797-577ff6566f78' \G
*************************** 1. row ***************************
   created_at: 2016-05-26 08:52:51
   updated_at: 2016-05-26 08:54:23
   deleted_at: 2016-05-26 08:54:23
      deleted: 1
           id: ce4238b5-ea39-4ce1-9ae7-6e305dd506b1
    volume_id: 9cf6d85a-3edd-4ab7-b797-577ff6566f78
attached_host: NULL
instance_uuid: 44c7a72c-610c-4af1-a3ed-9476746841ab
   mountpoint: /dev/sdb
  attach_time: 2016-05-26 08:52:51
  detach_time: 2016-05-26 08:54:23
  attach_mode: rw
attach_status: attached
1 row in set (0.01 sec)
MariaDB [cinder]> select * from volumes where id='9cf6d85a-3edd-4ab7-b797-577ff6566f78' \G
*************************** 1. row ***************************
                 created_at: 2016-05-26 08:51:57
                 updated_at: 2016-05-26 08:54:23
                 deleted_at: NULL
                    deleted: 0
                         id: 9cf6d85a-3edd-4ab7-b797-577ff6566f78
                     ec2_id: NULL
                    user_id: 0688b01e6439ca32d698d20789d52169126fb41fb1a4ddafcebb97d854e836c9
                 project_id: 1471acf124a0479c8d525aa79b2582d0
                       host: pb01_mn_svc_qual
                       size: 1
          availability_zone: nova
                     status: available
              attach_status: attached
               scheduled_at: 2016-05-26 08:51:57
                launched_at: 2016-05-26 08:51:59
              terminated_at: NULL
               display_name: dummy
        display_description: NULL
          provider_location: NULL
              provider_auth: NULL
                snapshot_id: NULL
             volume_type_id: e49e9cc3-efc3-4e7e-bcb9-0291ad28df42
               source_volid: NULL
                   bootable: 0
          provider_geometry: NULL
                   _name_id: NULL
          encryption_key_id: NULL
           migration_status: NULL
         replication_status: disabled
replication_extended_status: NULL
    replication_driver_data: NULL
        consistencygroup_id: NULL
                provider_id: NULL
                multiattach: 0
            previous_status: NULL
1 row in set (0.00 sec)
MariaDB [cinder]> update volume_attachment set attach_status='detached' where volume_id='9cf6d85a-3edd-4ab7-b797-577ff6566f78';
Query OK, 1 row affected (0.00 sec)
Rows matched: 1  Changed: 1  Warnings: 0
MariaDB [cinder]> update volumes set attach_status='detached' where id='9cf6d85a-3edd-4ab7-b797-577ff6566f78';
Query OK, 1 row affected (0.00 sec)
Rows matched: 1  Changed: 1  Warnings: 0

The second issue I had was about having some machines in deleted state but the reality was that the HMC just rebooted and for an unknow reason these machines where seen as 'deleted' .. but they were not. Using this trick I was able to force a re-evalutation of each machine is this case:

#  mysql -u root -p nova
Enter password:
MariaDB [nova]> select * from instance_health_status where health_state='WARNING';
+---------------------+---------------------+------------+---------+--------------------------------------+--------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------+
| created_at          | updated_at          | deleted_at | deleted | id                                   | health_state | reason                                                                                                                                                                                                                | unknown_reason_details |
+---------------------+---------------------+------------+---------+--------------------------------------+--------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------+
| 2016-07-11 08:58:37 | NULL                | NULL       |       0 | 1af1805c-bb59-4bc9-8b6d-adeaeb4250f3 | WARNING      | [{"resource_local": "server", "display_name": "p00ww6754398", "resource_property_key": "rmc_state", "resource_property_value": "initializing", "resource_id": "1af1805c-bb59-4bc9-8b6d-adeaeb4250f3"}]                |                        |
| 2015-07-31 16:53:50 | 2015-07-31 18:49:50 | NULL       |       0 | 2668e808-10a1-425f-a272-6b052584557d | WARNING      | [{"resource_local": "server", "display_name": "multi-vol", "resource_property_key": "vm_state", "resource_property_value": "deleted", "resource_id": "2668e808-10a1-425f-a272-6b052584557d"}]                         |                        |
| 2015-08-03 11:22:38 | 2015-08-03 15:47:41 | NULL       |       0 | 2934fb36-5d91-48cd-96de-8c16459c50f3 | WARNING      | [{"resource_local": "server", "display_name": "clouddev-test-754df319-00000038", "resource_property_key": "rmc_state", "resource_property_value": "inactive", "resource_id": "2934fb36-5d91-48cd-96de-8c16459c50f3"}] |                        |
| 2016-07-11 09:03:59 | NULL                | NULL       |       0 | 3fc42502-856b-46a5-9c36-3d0864d6aa4c | WARNING      | [{"resource_local": "server", "display_name": "p00ww3254401", "resource_property_key": "rmc_state", "resource_property_value": "initializing", "resource_id": "3fc42502-856b-46a5-9c36-3d0864d6aa4c"}]                |                        |
| 2015-07-08 20:11:48 | 2015-07-08 20:14:09 | NULL       |       0 | 54d02c60-bd0e-4f34-9cb6-9c0a0b366873 | WARNING      | [{"resource_local": "server", "display_name": "p00wb3740870", "resource_property_key": "rmc_state", "resource_property_value": "inactive", "resource_id": "54d02c60-bd0e-4f34-9cb6-9c0a0b366873"}]                    |                        |
| 2015-07-31 17:44:16 | 2015-07-31 18:49:50 | NULL       |       0 | d5ec2a9c-221b-44c0-8573-d8e3695a8dd7 | WARNING      | [{"resource_local": "server", "display_name": "multi-vol-sp5", "resource_property_key": "vm_state", "resource_property_value": "deleted", "resource_id": "d5ec2a9c-221b-44c0-8573-d8e3695a8dd7"}]                     |                        |
+---------------------+---------------------+------------+---------+--------------------------------------+--------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------+
6 rows in set (0.00 sec)
MariaDB [nova]> update instance_health_status set health_state='PENDING',reason='' where health_state='WARNING';
Query OK, 6 rows affected (0.00 sec)
Rows matched: 6  Changed: 6  Warnings: 0

The ceilometer issue

When updating from PowerVC 1.3.0.1 to 1.3.1.1 PowerVC is changing the database backend from DB2 to MariaDB. This is a good thing but the way the update is done is by exporting all the data in flat files and then re-inserting it in the MariaDB database records per records. I had a huge problem because of this, just because my ceilodb base was huge because of the number of machines I had and the number of operations we run on PowerVC since it is in production. The DB insert took more than 3 days and never finish. If you don't need the ceilo data my advice is to change the retention from 270 days y default to 2 hours:

# powervc-config metering event_ttl --set 2 --unit hr 
# ceilometer-expirer --config-file /etc/ceilometer/ceilometer.conf

If this is not enough an you still experiencing problems regarding the update the best way is to flush the entire table before the update:

# /opt/ibm/powervc/bin/powervc-services stop
# /opt/ibm/powervc/bin/powervc-services db2 start
# /bin/su - pwrvcdb -c "db2 drop database ceilodb2"
# /bin/su - pwrvcdb -c "db2 CREATE DATABASE ceilodb2 AUTOMATIC STORAGE YES ON /home/pwrvcdb DBPATH ON /home/pwrvcdb USING CODESET UTF-8 TERRITORY US COLLATE USING SYSTEM PAGESIZE 16384 RESTRICTIVE"
# /bin/su - pwrvcdb -c "db2 connect to ceilodb2 ; db2 grant dbadm on database to user ceilometer"
# /opt/ibm/powervc/bin/powervc-dbsync ceilometer
# /bin/su - pwrvcdb -c "db2 connect TO ceilodb2; db2 CALL GET_DBSIZE_INFO '(?, ?, ?, 0)' > /tmp/ceilodb2_db_size.out; db2 terminate" > /dev/null

Multi tenancy ... how to deal with a huge environment

As my environment is growing bigger and bigger I faced a couple people trying to force me to multiply the number of PowerVC machine we have. As Openstack is a solution designed to handle both density and scalability I said that doing this is just a "non-sense". Seriously people who still believe in this have not understand anything about the cloud, openstack and PowerVC. Hopefully we found a solution acceptable by everybody. As we are created what we are calling "building-block" we had to find a way to isolate one "block" from one another. The solution for host isolation is called mutly tenancy isolation. For the storage side we are just going to play with quotas. By doing this a user will be able to manage a couple of hosts and the associated storage (storage template) without having the right to do anything on the others:

Before doing anything create the tenant (or project) and a user associated with it:

# cat /opt/ibm/powervc/version.properties | grep cloud_enabled
cloud_enabled = yes
# ~/powervcrc
export OS_USERNAME=root
export OS_PASSWORD=root
export OS_TENANT_NAME=ibm-default
export OS_AUTH_URL=https://powervc.lab.chmod666.org:5000/v3/
export OS_IDENTITY_API_VERSION=3
export OS_CACERT=/etc/pki/tls/certs/powervc.crt
export OS_REGION_NAME=RegionOne
export OS_USER_DOMAIN_NAME=Default
export OS_PROJECT_DOMAIN_NAME=Default
export OS_COMPUTE_API_VERSION=2.25
export OS_NETWORK_API_VERSION=2.0
export OS_IMAGE_API_VERSION=2
export OS_VOLUME_API_VERSION=2
# source powervcrc
# openstack project create hb01
+-------------+----------------------------------+
| Field       | Value                            |
+-------------+----------------------------------+
| description |                                  |
| domain_id   | default                          |
| enabled     | True                             |
| id          | 90d064b4abea4339acd32a8b6a8b1fdf |
| is_domain   | False                            |
| name        | hb01                             |
| parent_id   | default                          |
+-------------+----------------------------------+
# openstack role list
+----------------------------------+---------------------+
| ID                               | Name                |
+----------------------------------+---------------------+
| 1a76014f12594214a50c36e6a8e3722c | deployer            |
| 54616a8b136742098dd81eede8fd5aa8 | vm_manager          |
| 7bd6de32c14d46f2bd5300530492d4a4 | storage_manager     |
| 8260b7c3a4c24a38ba6bee8e13ced040 | deployer_restricted |
| 9b69a55c6b9346e2b317d0806a225621 | image_manager       |
| bc455ed006154d56ad53cca3a50fa7bd | admin               |
| c19a43973db148608eb71eb3d86d4735 | service             |
| cb130e4fa4dc4f41b7bb4f1fdcf79fc2 | self_service        |
| f1a0c1f9041d4962838ec10671befe33 | vm_user             |
| f8cf9127468045e891d5867ce8825d30 | viewer              |
+----------------------------------+---------------------+
# useradd hb01_admin
# openstack role add --project hb01 --user hb01_admin admin

Then associate each host group (aggregates in Openstack terms) (you have to put your allowed hosts in an host group to enable this feature) that are allowed for this tenant using filter_tenant_id meta-data. For each allowed host group add this field to the metatadata of the host. (first find the tenant id):

# openstack project list
+----------------------------------+-------------+
| ID                               | Name        |
+----------------------------------+-------------+
| 1471acf124a0479c8d525aa79b2582d0 | ibm-default |
| 90d064b4abea4339acd32a8b6a8b1fdf | hb01        |
| b79b694c70734a80bc561e84a95b313d | powervm     |
| c8c42d45ef9e4a97b3b55d7451d72591 | service     |
| f371d1f29c774f2a97f4043932b94080 | project1    |
+----------------------------------+-------------+
# openstack aggregate list
+----+---------------+-------------------+
| ID | Name          | Availability Zone |
+----+---------------+-------------------+
|  1 | Default Group | None              |
| 21 | aggregate2    | None              |
| 41 | hg2           | None              |
| 43 | hb01_mn       | None              |
| 44 | hb01_me       | None              |
+----+---------------+-------------------+
# nova aggregate-set-metadata hb01_mn filter_tenant_id=90d064b4abea4339acd32a8b6a8b1fdf 
Metadata has been successfully updated for aggregate 43.
| Id | Name    | Availability Zone | Hosts             | Metadata                                                                                                                                   
| 43 | hb01_mn | -                 | '9119MME_1009425' | 'dro_enabled=False', 'filter_tenant_id=90d064b4abea4339acd32a8b6a8b1fdf', 'hapolicy-id=1', 'hapolicy-run_interval=1', 'hapolicy-stabilization=1', 'initialpolicy-id=4', 'runtimepolicy-action=migrate_vm_advise_only', 'runtimepolicy-id=5', 'runtimepolicy-max_parallel=10', 'runtimepolicy-run_interval=5', 'runtimepolicy-stabilization=2', 'runtimepolicy-threshold=70' |
# nova aggregate-set-metadata hb01_me filter_tenant_id=90d064b4abea4339acd32a8b6a8b1fdf 
Metadata has been successfully updated for aggregate 44.
| Id | Name    | Availability Zone | Hosts             | Metadata                                                                                                                                   
| 44 | hb01_me | -                 | '9119MME_0696010' | 'dro_enabled=False', 'filter_tenant_id=90d064b4abea4339acd32a8b6a8b1fdf', 'hapolicy-id=1', 'hapolicy-run_interval=1', 'hapolicy-stabilization=1', 'initialpolicy-id=2', 'runtimepolicy-action=migrate_vm_advise_only', 'runtimepolicy-id=5', 'runtimepolicy-max_parallel=10', 'runtimepolicy-run_interval=5', 'runtimepolicy-stabilization=2', 'runtimepolicy-threshold=70' |

To make this work add the AggregateMultiTenancyIsolation to the scheduler_default_filter in nova.conf file and restart nova services:

# grep scheduler_default_filter /etc/nova/nova.conf
scheduler_default_filters = RamFilter,CoreFilter,ComputeFilter,RetryFilter,AvailabilityZoneFilter,ImagePropertiesFilter,ComputeCapabilitiesFilter,MaintenanceFilter,PowerVCServerGroupAffinityFilter,PowerVCServerGroupAntiAffinityFilter,PowerVCHostAggregateFilter,PowerVMNetworkFilter,PowerVMProcCompatModeFilter,PowerLMBSizeFilter,PowerMigrationLicenseFilter,PowerVMMigrationCountFilter,PowerVMStorageFilter,PowerVMIBMiMobilityFilter,PowerVMRemoteRestartFilter,PowerVMRemoteRestartSameHMCFilter,PowerVMEndianFilter,PowerVMGuestCapableFilter,PowerVMSharedProcPoolFilter,PowerVCResizeSameHostFilter,PowerVCDROFilter,PowerVMActiveMemoryExpansionFilter,PowerVMNovaLinkMobilityFilter,AggregateMultiTenancyIsolation
# powervc-services restart

We are done regarding the hosts.

Enabling quotas

To allow one user/tenant to create volumes only on onz storage provider we first need to enable quotas using the following commands:

# grep quota /opt/ibm/powervc/policy/cinder/policy.json
    "volume_extension:quotas:show": "",
    "volume_extension:quotas:update": "rule:admin_only",
    "volume_extension:quotas:delete": "rule:admin_only",
    "volume_extension:quota_classes": "rule:admin_only",
    "volume_extension:quota_classes:validate_setup_for_nested_quota_use": "rule:admin_only",

Then put to 0 all the non-allowed storage template for this tenant and let the only one you want to 10000. Easy:

# cinder --service-type volume type-list
+--------------------------------------+---------------------------------------------+-------------+-----------+
|                  ID                  |                     Name                    | Description | Is_Public |
+--------------------------------------+---------------------------------------------+-------------+-----------+
| 53434872-a0d2-49ea-9683-15c7940b30e5 |               svc2 base template            |      -      |    True   |
| e49e9cc3-efc3-4e7e-bcb9-0291ad28df42 |               svc1 base template            |      -      |    True   |
| f45469d5-df66-44cf-8b60-b226425eee4f |                     svc3                    |      -      |    True   |
+--------------------------------------+---------------------------------------------+-------------+-----------+
# cinder --service-type volume quota-update --volumes 0 --volume-type "svc2" 90d064b4abea4339acd32a8b6a8b1fdf
# cinder --service-type volume quota-update --volumes 0 --volume-type "svc3" 90d064b4abea4339acd32a8b6a8b1fdf
+-------------------------------------------------------+----------+
|                        Property                       |  Value   |
+-------------------------------------------------------+----------+
|                    backup_gigabytes                   |   1000   |
|                        backups                        |    10    |
|                       gigabytes                       | 1000000  |
|              gigabytes_svc2 base template             | 10000000 |
|              gigabytes_svc1 base template             | 10000000 |
|                     gigabytes_svc3                    |    -1    |
|                  per_volume_gigabytes                 |    -1    |
|                       snapshots                       |  100000  |
|             snapshots_svc2 base template              |  100000  |
|             snapshots_svc1 base template              |  100000  |
|                     snapshots_svc3                    |    -1    |
|                        volumes                        |  100000  |
|            volumes_svc2 base template                 |  100000  |
|            volumes_svc1 base template                 |    0     |
|                      volumes_svc3                     |    0     |
+-------------------------------------------------------+----------+
# powervc-services stop
# powervc-services start

By doing this you have enable the isolation between two tenants. Then use the appropriate user to do the appropriate task.

PowerVC cinder above the Petabyte

Now that quota are enabled use this command if you want to be able to have more that one petabyte of data managed by PowerVC:

# cinder --service-type volume quota-class-update --gigabytes -1 default
# powervc-services stop
# powervc-services start

PowerVC cinder above 10000 luns

Change the osapi_max_limit in cinder.conf if you want to go above the 10000 lun limits (check every cinder configuration files; the cinder.conf if for the global number of volumes):

# grep ^osapi_max_limit cinder.conf
osapi_max_limit = 15000
# powervc-services stop
# powervc-services start

Snapshot and consistncy group

There is a new cool feature available with the latest version of PowerVC (1.3.1.2). This feature allows you to create snapshots of volume (only on SVC and Storwise for the moment). You now have the possibility to create consistency group (group of volumes) and create snapshots of these consistency groups (allowing for instance to make a backup of a volume group directly from OpenStack. I'm doing the example below using the command line because I think it is easier to understand with these commands rather than showing you the same thing with the rest api):

First create a consistency group:

# cinder --service-type volume type-list
+--------------------------------------+---------------------------------------------+-------------+-----------+
|                  ID                  |                     Name                    | Description | Is_Public |
+--------------------------------------+---------------------------------------------+-------------+-----------+
| 53434872-a0d2-49ea-9683-15c7940b30e5 |              svc2 base template             |      -      |    True   |
| 862b0a8e-cab4-400c-afeb-99247838f889 |             p8_ssp base template            |      -      |    True   |
| e49e9cc3-efc3-4e7e-bcb9-0291ad28df42 |               svc1 base template            |      -      |    True   |
| f45469d5-df66-44cf-8b60-b226425eee4f |                     svc3                    |      -      |    True   |
+--------------------------------------+---------------------------------------------+-------------+-----------+
# cinder --service-type volume consisgroup-create --name foovg_cg "svc1 base template"
+-------------------+-------------------------------------------+
|      Property     |                   Value                   |
+-------------------+-------------------------------------------+
| availability_zone |                    nova                   |
|     created_at    |         2016-09-11T21:10:58.000000        |
|    description    |                    None                   |
|         id        |    950a5193-827b-49ab-9511-41ba120c9ebd   |
|        name       |                  foovg_cg                 |
|       status      |                  creating                 |
|    volume_types   | [u'e49e9cc3-efc3-4e7e-bcb9-0291ad28df42'] |
+-------------------+-------------------------------------------+
# cinder --service-type volume consisgroup-list
+--------------------------------------+-----------+----------+
|                  ID                  |   Status  |   Name   |
+--------------------------------------+-----------+----------+
| 950a5193-827b-49ab-9511-41ba120c9ebd | available | foovg_cg |
+--------------------------------------+-----------+----------+

Create volume in this consistency group:

# cinder --service-type volume create --volume-type "svc1 base template" --name foovg_vol1 --consisgroup-id 950a5193-827b-49ab-9511-41ba120c9ebd 200
# cinder --service-type volume create --volume-type "svc1 base template" --name foovg_vol2 --consisgroup-id 950a5193-827b-49ab-9511-41ba120c9ebd 200
+------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------+
|           Property           |                                                                          Value                                                                           |
+------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------+
|         attachments          |                                                                            []                                                                            |
|      availability_zone       |                                                                           nova                                                                           |
|           bootable           |                                                                          false                                                                           |
|     consistencygroup_id      |                                                           950a5193-827b-49ab-9511-41ba120c9ebd                                                           |
|          created_at          |                                                                2016-09-11T21:23:02.000000                                                                |
|         description          |                                                                           None                                                                           |
|          encrypted           |                                                                          False                                                                           |
|        health_status         | {u'health_value': u'PENDING', u'id': u'8d078772-00b5-45fc-89c8-82c63e2c48ed', u'value_reason': u'PENDING', u'updated_at': u'2016-09-11T21:23:02.669372'} |
|              id              |                                                           8d078772-00b5-45fc-89c8-82c63e2c48ed                                                           |
|           metadata           |                                                                            {}                                                                            |
|       migration_status       |                                                                           None                                                                           |
|         multiattach          |                                                                          False                                                                           |
|             name             |                                                                        foovg_vol2                                                                        |
|    os-vol-host-attr:host     |                                                                           None                                                                           |
| os-vol-tenant-attr:tenant_id |                                                             1471acf124a0479c8d525aa79b2582d0                                                             |
|      replication_status      |                                                                         disabled                                                                         |
|             size             |                                                                           200                                                                            |
|         snapshot_id          |                                                                           None                                                                           |
|         source_volid         |                                                                           None                                                                           |
|            status            |                                                                         creating                                                                         |
|          updated_at          |                                                                           None                                                                           |
|           user_id            |                                             0688b01e6439ca32d698d20789d52169126fb41fb1a4ddafcebb97d854e836c9                                             |
|         volume_type          |                                                                   svc1 base template                                                                     |
+------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------+

You're now able to attach these two volumes to a machine from the PowerVC GUI:

# lsmpio -q
Device           Vendor Id  Product Id       Size    Volume Name
------------------------------------------------------------------------------
hdisk0           IBM        2145                 64G volume-aix72-44c7a72c-000000e0-
hdisk1           IBM        2145                100G volume-snap1-dab0e2d1-130a
hdisk2           IBM        2145                100G volume-snap2-5e863fdb-ab8c
hdisk3           IBM        2145                200G volume-foovg_vol1-3ba0ff59-acd8
hdisk4           IBM        2145                200G volume-foovg_vol2-8d078772-00b5
# cfgmr
# lspv
hdisk0          00c8b2add70d7db0                    rootvg          active
hdisk1          00f9c9f51afe960e                    None
hdisk2          00f9c9f51afe9698                    None
hdisk3          none                                None
hdisk4          none                                None

Then you can create a snapshot fo these two volumes. It's that easy :-) :

# cinder --service-type volume cgsnapshot-create 950a5193-827b-49ab-9511-41ba120c9ebd
+---------------------+--------------------------------------+
|       Property      |                Value                 |
+---------------------+--------------------------------------+
| consistencygroup_id | 950a5193-827b-49ab-9511-41ba120c9ebd |
|      created_at     |      2016-09-11T21:31:12.000000      |
|     description     |                 None                 |
|          id         | 20e2ce6b-9c4a-4eea-b05d-f0b0b6e4768f |
|         name        |                 None                 |
|        status       |               creating               |
+---------------------+--------------------------------------+
# cinder --service-type volume cgsnapshot-list
+--------------------------------------+-----------+------+
|                  ID                  |   Status  | Name |
+--------------------------------------+-----------+------+
| 20e2ce6b-9c4a-4eea-b05d-f0b0b6e4768f | available |  -   |
+--------------------------------------+-----------+------+
# cinder --service-type volume cgsnapshot-show 20e2ce6b-9c4a-4eea-b05d-f0b0b6e4768f
+---------------------+--------------------------------------+
|       Property      |                Value                 |
+---------------------+--------------------------------------+
| consistencygroup_id | 950a5193-827b-49ab-9511-41ba120c9ebd |
|      created_at     |      2016-09-11T21:31:12.000000      |
|     description     |                 None                 |
|          id         | 20e2ce6b-9c4a-4eea-b05d-f0b0b6e4768f |
|         name        |                 None                 |
|        status       |              available               |
+---------------------+--------------------------------------+

Conclusion

Please keep in mind that the content of this blog post comes from real life and production examples. I hope you will be able to better understand that scalability, density, fast deployment, snapshots, multi tenancy are some features that are absolutely needed in the AIX world. As you can see the PowerVC team is moving fast. Probably faster than every customer I have ever seen. I must admit they are right. Doing this is the only way the face the Linux X86 offering. And I must confess this is damn fun to work on those things. I'm so happy to have the best of two worlds AIX/PowerSystem and Openstack. This is the only direction we have to take if we want AIX to survive. So please stop being scared or not convinced by these solutions they are damn good, production ready. Please face and embrace the future and stop looking at the past. As always I hope it help.

↧

pvcctl : Using python Openstack api to code a PowerVC command line | Automating PowerVC and NovaLink (post)-installation with Ansible

November 13, 2016, 1:33 pm

≫ Next: Unleash the true potential of SRIOV vNIC using vNIC failover !

≪ Previous: Putting NovaLink in Production & more PowerVC (1.3.1.2) tips and tricks

The world is changing fast, especially regarding sysadmin jobs and skills. Everybody has noticed that being a good sysadmin now implies two things. The first one is “being a good dev”. Not just someone who knows how to write a ksh or bash script but someone who is able to write in one of these three languages: python, ruby, go. Most of my team members does not understand that. This is now almost mandatory. I truly believe what I’m saying here. The second one is to have strong skills in automation tools. On this part I’m almost ok being good at Chef, Saltstack and Ansible. Unfortunately for the world the best tool is never the one who wins and that’s why Ansible is almost winning everywhere in the battle of automation. It is simple to understand why: Ansible is simple to use and to understand and it is based on ssh. My opinion is that Ansible is ok for administration stuff but not ok when scaling. Being based on ssh make it being based on a “push” model and in my humble opinion push models are bad and pull models is the future. (One thing to say about that: this is just my opinion. I don’t want this to end in never ending trolls on Twitter. Please do blog posts if you want to express yourself) (I’m saying this because twitter is becoming a place to troll and not anymore a place to share skills and knowledge, like it was before). This is said.

The first part of this blog post will talk about a tool I am coding called pvcctl. This tool is a python tool allowing you to use PowerVC in command line. It was also for me the opportunity to be better at python and to improve my skills developing in this language. Keep in mind that I’m not a developer but I’m here going to give you simple tips and tricks to use python to write your own tools to query and interact with Openstack. I must admit that I’ve tried everything: httplib, librequest, Chef, Ansible, Salstack. None of this solution was ok for me. I finally ended in using Openstack python api to write this tool. It’s not that hard and it now allows me to write my own programs to interact with PowerVC. Once again keep in mind that this tool fit my needs and it will probably not fit yours. This is an example of how to write a tool based on python Openstack api. Not an official tool or anything else.

The second part of this blog post will talk about an Ansible playbook I’ve written to take care of PowerVC installation and NovaLink post installation. The more machine I deploy the more PowerVC I need (yeah yeah, I was forced to) and the more NovaLink I need too. Instead of doing the same thing over and over again the best solution was to use an automation tool and as it is now the most common one used on Linux the one I choose was ansible.

Using python Openstack api to code a PowerVC command line

This part of the post will show you how to use the python Openstack api to create scripts to query and interact with PowerVC. First of all I know that their are other ways to use the APIs but (it’s my opinion) I think that using service-specific clients is the simplest way to understand and to work with the API. This part of the blog post will only talk about service-specific clients (ie. novaclient, cinderclient, and so on …). I wanted to thanks Matthew Edmonds from the PowerVC team. He helped me to better understand the api and gave me good advises. So a big shout out to you Matthew :-). Thank you.

Initialize you script

Sessions

Almost all Openstack tools are using “rc” files to load authentication credentials and endpoints. As I wanted my tool to work like this (ie. sourcing an rc file containing my credentials) I have found that the best way to do this was to use session. By using session you don’t have to manage or work with any tokens or to be worried about that. Sessions is taking care of that for you and you have nothing to do. As you can see on the code below the “OS_*” environment variables are used here. So before running the tool all you have to do is to export these variables. It’s as simple as that:

An example “rc” file filled with the OS_* values (note that crt file must be copied from the PowerVC host to the host running the tool (/etc/pki/tls/certs/powervc.crt)):

# cat powervrc
export OS_AUTH_URL=https://mypowervc:5000/v3/
export OS_USERNAME=root
export OS_PASSWORD=root
export OS_TENANT_NAME=ibm-default
export OS_REGION_NAME=RegionOne
export OS_USER_DOMAIN_NAME=Default
export OS_PROJECT_DOMAIN_NAME=Default
export OS_CACERT=~/powervc_labp8.crt
export OS_IMAGE_ENDPOINT=https://mypowervc:9292/
export NOVACLIENT_DEBUG=0
# source powervcrc

The python piece of code creating a session object:

from keystoneauth1.identity import v3
from keystoneauth1 import session

auth = v3.Password(auth_url=env['OS_AUTH_URL'],
                   username=env['OS_USERNAME'],
                   password=env['OS_PASSWORD'],
                   project_name=env['OS_TENANT_NAME'],
                   user_domain_id=env['OS_USER_DOMAIN_NAME'],
                   project_domain_id=env['OS_PROJECT_DOMAIN_NAME'])

sess = session.Session(auth=auth, verify=env['OS_CACERT'])

The logger

Instead of using the print statement each time I need to debug my script I have found that most of Openstack API can be used with a python logger object. By using a logger you’ll be able to see all your http calls to the Openstack API (your post, put, get, delete with their json body, their response and their url). It is super useful to debug your scripts and it’s super simple to use. The piece of code below will create a logger object writing to my log directory. You’ll see after how to use a logger when creating a client object (a nova, cinder, or neutron object):

import logging

logger = logging.getLogger('pvcctl')
hdlr = logging.FileHandler(BASE_DIR + "/logs/pvcctl.log")
logger.addHandler(hdlr)
logger.setLevel(logging.DEBUG)

Here is an exemple of the output created by the logger with a novaclient (the novaclient was created specifying a logger object):

REQ: curl -g -i --cacert "/data/tools/ditools/pvcctl/conf/powervc.crt" -X POST https://mypowervc/powervc/openstack/compute/v2.1/51488ae7be7e4ec59759ccab496c8793/servers/a3cea5b8-33b4-432e-88ec-e11e47941846/os-volume_attachments -H "User-Agent: python-novaclient" -H "Content-Type: application/json" -H "Accept: application/json" -H "X-OpenStack-Nova-API-Version: 2.19" -H "X-Auth-Token: {SHA1}9b8becc425d25fdfb98d5e4f055c71498d2e744f" -d '{"volumeAttachment": {"volumeId": "a90f04ce-feb2-4163-9b36-23765777c6a0"}}' 
RESP: [200] Date: Fri, 28 Oct 2016 13:05:24 GMT Server: Apache Content-Length: 194 Content-Type: application/json X-Openstack-Nova-Api-Version: 2.19 Vary: X-OpenStack-Nova-API-Version X-Compute-Request-Id: req-58addd19-7421-4c9f-a712-9c386f46b6cb Cache-control: max-age=0, no-cache, no-store, must-revalidate Pragma: no-cache Keep-Alive: timeout=5, max=93 Connection: Keep-Alive 
RESP BODY: {"volumeAttachment": {"device": "/dev/sdb", "serverId": "a3cea5b8-33b4-432e-88ec-e11e47941846", "id": "a90f04ce-feb2-4163-9b36-23765777c6a0", "volumeId": "a90f04ce-feb2-4163-9b36-23765777c6a0"}}

The clients

Each Openstack service (nova, glance, neutron, swift, cinder, …) is provided with a python Openstack API. I’m -in my tool- only using novaclient, cinderclient and neutronclient but it will be the exact same thing if you want to use ceilomeiter or glance. Before doing anything else you have to install the clients you want to use (using your package manager (yum on my side) or using pip)

# yum install python2-novaclient.noarch
# pip install python-neutronclient

Initializing the clients

After the clients are installed you can use them in your python scripts, import them and create the objects. Use the previously created session object to create the clients objects (session=sess in my example below):

from novaclient import client as client_nova
from neutronclient.v2_0 import client as client_neutron
from cinderclient import client as client_cinder

nova = client_nova.Client(2.19, session=sess)
neutron = client_neutron.Client(session=sess)
cinder = client_cinder.Client(2.0, service_type="volume", session=sess)

If your client can be created using a logger object you can specify this at the time of the object creation. Here is an example with novaclient:

nova = client_nova.Client(2.19, session=sess, http_log_debug=True, logger=logger)

Using the clients

After the objects are created using them is super simple. I’ll give you here a couple of examples below:

Searching a vm (this will return a server object):

server = nova.servers.find(name=machine)

Renaming a vm:

server = nova.servers.find(name=machine)
server.update(name=new_name)

Starting a vm:

server = nova.servers.find(name=machine)
server.start()

Stopping a vm:

server = nova.servers.find(name=machine)
server.stop()

Listing vms:

for server in nova.servers.list():
  name = getattr(server, 'OS-EXT-SRV-ATTR:hostname')
  print name

Find a vlan:

vlan = neutron.list_networks(name=vlan)

Creating a volume:

cinder.volumes.create(name=volume_name, size=size, volume_type=storage_template)

And so on. Each client type has it’s own method, the best way to find which methods are available for each object is to check in the official Openstack API documentation:

novaclient: http://docs.openstack.org/developer/python-novaclient/api.html
neutronclient: http://docs.openstack.org/developer/python-neutronclient/usage/library.html
cinderclient:http://docs.openstack.org/developer/python-cinderclient/

What about PowerVC extensions ? (using get,put,delete …)

If you have already read my blog posts about PowerVC you will probably already know that PowerVC add some extensions to OpenStack. That means that for the PowerVC extension using the Openstack method shipped with the API will not work. To be more specific the methods used to query or interact with the PowerVC extensions will simply not exists at all. The good part of these API is that they are also shipped with the http common methods. This means that for each Openstack api object, let’s say nova, you’ll be able to directly use the put, post, get, delete and so on method. By doing that you’ll be able to use the same object to use all api method (let’s say create or rename a server) and to use the PowerVC extensions. For instance the “host-sea” is a PowerVC added extension (link here). You can simply use a novaclient to query or post something to the extension (the example below shows you both post and a get on the PowerVC extension “host-seas”:

resp, host_seas = nova.client.get("/host-seas?network_id=" + net_id + "&vlan_id" + vlan_id)
resp, body = nova.client.post("/host-network-mapping", body=mapping_json)

Here is another example for onboarding or unmanaging volume (which is a PowerVC extension to Openstack):

resp, body = cinder.client.post("/os-hosts/" + oshost['host_name'] + "/onboard", body=onboard_json)
resp, body = cinder.client.post("/os-hosts/" + oshost['host_name'] + "/unmanage", body=onboard_json)

Working with json

Last part for this tips and tricks on how to write your own python code using Openstack api: you’ll probably see that you’ll need to work with json. What is cool with python is that json can be managed as a dict object. It’s super simple to use:

Importing json:

import json

Loading json:

json_load = json.loads('{ "ibm-extend": { "new_size": 0 } }')

Using the dict:

json_load['ibm-extend']['new_size'] = 200

Use it as a body in a post call (grow a volume):

resp, body = cinder.client.post("/volumes/" + volume.id + "/action", body=json_grow)

The pvcctl tool

Now that you have understand this I can now say to you that I’ve written a tool called pvcctl based on the Openstack python api. This tool is freely available on github. As I said before this tools fit my needs and is an example of what can be done using the Openstack API in python. Keep in mind that I’m not a developer and the code can probably be better. But this tool is used by my whole team on PowerVC so … it will probably be good enough to create shells scripts on top of it or for daily PowerVC administration. The tool can be found a this address: https://github.com/chmod666org/pvcctl. Give it a try and tell me what you think a about it. I give you below a couple of example of how to use the tools. You’ll see it’s super simple:

Create a network:

# pvcctl network create name=newvlan id=666 cidr='10.10.20.20/24' dns1='8.8.8.8' dns2='8.8.9.9' gw='10.10.20.254'

Add description on a vm:

# pvcctl vm set_description vm=deckard desc="We call it Voight-Kampff"

Migrate a vm:

# pvcctl vm migrate vm=tyrell host=21AFF8V

Attach a volume to a vm:

# pvcctl vm attach_vol vm=tyrell vol=myvol

Create a vm

# pvcctl vm create ip='10.14.20.240' ec_max=1 ec_min=0.1 ec=0.1 vp=1 vp_min=1 vp_max=4 mem=4096 mem_min=1024 mem_max=8192 weight=240 name=bcubcu disks="death" scg=ssp vlan=vlan-1331 image=kitchen-aix72 aggregate=hg2 user_data=testvm srr=yes

Create a volume:

# pvcctl volume create provider=mystorageprovider name=volume_test size=10

Grow a volume!

# pvcctl volume grow vol=test_volume size=50

Automating PowerVC and NovaLink installation and post-installation with ansible

At the same time I have released my pvcctl tool I also had the idea that releasing my PowerVC and NovaLink playbook for Ansible will be a good thing. This playbook is not so huge and is not doing a lot of things but I was using it a lot when deploying all my NovaLink hosts (I now have 16 MME managed by NovaLink) and when creating PowerVC servers for different kind of project. That’s a shame that everybody in my company has not yet understood why having multiple PowerVC is just a wrong idea and a waste of time (I’m not surprised that between a good and a bad idea they prefer to choose the bad one :-) , obvious when you never touch to production at all but when you still have the power of deciding things in your hands). Anyway this playbook is used for two things, first one is preparing my novalink hosts (being sure I’m at the latest version of NovaLink, being sure that everything is configured as I want to (ntp, dns, rsct)), second one is installing PowerVC hosts (installing PowerVC is just super boring you always have to install tons of rpms needed for dependencies and if like me, you do not have a satellite connection or access to the internet it can be a real pain). The only thing you have to do is to configure the inventories files and the group_vars files located in the playbook directory. The playbook can be founded at this address https://github.com/chmod666org/ansible-powervc-novalink.

Put the name of your NovaLink hosts in the hosts.novalink file:

# cat inventories/hosts.novalink
nl1.lab.chmod666.org
nl2.lab.chmod666.org
[..]

Put the name of your PowerVC hosts in the hosts.powervcfile:

# cat inventories/hosts.powervc
pvc1.lab.chmod666.org
pvc2.lab.chmod666.org
[..]

Next prepare group_vars files for NovaLink …

ntpservers:
  - myntp1
  - myntp2
  - myntp2
dnsservers:
  - 8.8.8.8
  - 8.8.9.9
dnssearch:
  - lab.chmod666.org
vepa_iface: ibmveth6
repo: novalinkrepo

and PowerVC:

ntpservers:
  - myntp1
  - myntp2
  - myntpd3
dnsservers:
  - 8.8.8.8
  - 8.8.9.9
dnssearch:
  - lab.chmod666.org
repo_rhel: http://myrepo.lab.chmod666.org/rhel72le/
repo_ibmtools: http://myrepo.lab.chmod666.org/ibmpowertools71le/
repo_powervc: http://myrepo.lab.chmod666.org/powervc
powervc_base: PowerVC_V1.3.1_for_Power_Linux_LE_RHEL_7.1_062016.tar.gz
powervc_upd: powervc-update-ppcle-1.3.1.2.tgz
powervc_rpm: [ 'python-dns-1.12.0-1.20150617git465785f.el7.noarch.rpm', 'selinux-policy-3.13.1-60.el7.noarch.rpm', 'selinux-policy-targeted-3.13.1-60.el7.noarch.rpm', 'python-fpconst-0.7.3-12.el7.noarch.rpm', 'python-pyasn1-0.1.6-2.el7.noarch.rpm', 'python-pyasn1-modules-0.1.6-2.el7.noarch.rpm', 'python-twisted-web-12.1.0-5.el7_2.ppc64le.rpm', 'sysfsutils-2.1.0-16.el7.ppc64le.rpm', 'SOAPpy-0.11.6-17.el7.noarch.rpm', 'SOAPpy-0.11.6-17.el7.noarch.rpm', 'python-twisted-core-12.2.0-4.el7.ppc64le.rpm', 'python-zope-interface-4.0.5-4.el7.ppc64le.rpm', 'pyserial-2.6-5.el7.noarch.rpm' ]
powervc_base_version: 1.3.1.0
powervc_upd_version: 1.3.1.2
powervc_edition: cloud_powervm

You then just have to run the playbook for Novalink and PowerVC hosts to run the installation and post-installation:

Novalink post-install:

# ansible-playbook -i inventories/hosts.novalink site.yml

PowerVC install:

# ansible-playbook -i inventories/hosts.powervc site.yml

Just to give you an example of one the the tasks of this playbook here is the task in charge to install PowerVC. Pretty simple :-) :

## install powervc
- name: check previous installation
  command: bash -c "rpm -qa | grep ibmpowervc-"
  register: check_base
  ignore_errors: True
- debug: var=check_base

- name: install powervc binaires
  command: chdir=/tmp/powervc-{{ powervc_base_version }} /tmp/powervc-{{ powervc_base_version }}/install -s cloud_powervm
  environment:
    HOST_INTERFACE: "{{ ansible_default_ipv4.interface }}"
    EGO_ENABLE_SUPPORT_IPV6: N
    PATH: $PATH:/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.65-3.b17.el7.ppc64le/jre/bin/:/usr/sbin:/usr/bin
    ERL_EPMD_ADDRESS: "::ffff:127.0.1.1"
  when: check_base.rc == 1

- name: check previous update
  command: rpm -q ibmpowervc-{{ powervc_upd_version }}-1.noarch
  register: check_upd
  ignore_errors: True
- debug: var=check_upd
  
- name: updating powervc
  command: chdir=/tmp/powervc-{{ powervc_upd_version }} /tmp/powervc-{{ powervc_upd_version }}/update -s 
  when: check_upd.rc == 1

The goal here is not to explain how Ansible is working but to show you a simple example of what I’m doing with Ansible on my Linux boxes (all of this related to Power). If you want to check further have a look in the playbook itself on github :-)

Conclusion

This blog post is just a way to show you my work on both pvcctl and Ansible playbook for NovaLink and PowerVC. It’s not a detailed blog post about deep technical stuffs. I hope you’ll give a try to the tools and tell me what can be improved or changed. As always … I hope it helps.

↧

Unleash the true potential of SRIOV vNIC using vNIC failover !

December 2, 2016, 5:05 pm

≫ Next: Running Docker on PowerSystems using Ubuntu and Redhat ppc64le (docker, registry, compose and swarm with haproxy)

≪ Previous: pvcctl : Using python Openstack api to code a PowerVC command line | Automating PowerVC and NovaLink (post)-installation with Ansible

I’m always working on tight schedule, I never have the time to write documentation because we’re moving fast, very fast … but not as fast as I want to ;-). A few months ago we were asked to put the TSM servers in our PowerVC environment I thought it was a very very bad idea to put a pet among the cattle as TSM servers are very specific and super I/O intensive in our environment (and are configured with plenty of rmt devices. This means that we tried to put lan-free stuffs into Openstack which is not designed at all for this kind of things). In my previous place we tried to put the TSM servers behind a virtualized environment (this means serving network through Shared Ethernet Adapters) and this was an EPIC FAIL. A few weeks after putting the servers in production we decided to move back to physical I/O and decided to used dedicated network adapters. As we didn’t want to make the same mistake in my current place we decided not to go on Shared Ethernet Adapters. Instead of that we took the decision to use SRIOV vNIC. SRIOV vNIC have the advantage to be fully virtualized (this means LPM aware and super flexible) allowing us to have the wanted flexibility (by moving TSM servers between sites if we feel the need to put a host in maintenance mode or if we are facing any kind of outage). In my previous blog post about vNIC I was very happy with the performance but not with the reliability. I didn’t want to go on NIB adapters for network redundancy (because it is an anti-virtualization way of doing things (we do not want to manage anything inside the VM, we want to let the virtualization environment do the job for us)). Lucky for me the project was reschedule to the end of the year and we finally took the decision not to put the TSM server into our big Openstack by dedicating some hosts for the backup stuffs. The latest version of PowerVM, HMC and firmware arrived just at time to let me use SRIOV vNIC failover new feature for this new TSM environment (fortunately for me we had some data center issues allowing me to wait enough time not to go on NIB and start the production directly with SRIOV vNIC \o/). I just have delivered the first four servers to my backup team yesterday and I must admit that SRIOV vNIC failover is a killer feature for this kind of things. Let’s now see how to setup this !

Prerequisites

As always using the latest features means you need to have everything up to date. In this case the minimal requierements for SRIOV vNIC failover are Virtual I/O Servers 2.2.5.10, Hardware Management Console v8R860 with the latest patchs and finally having a firmware up to date (ie. fw 860). Note that not all AIX versions are ok with SRIOV vNIC I’m here only using AIX 7.2 TL1 SP1:

Check the Virtual I/O Server are installed in 2.2.5.10:

# ioslevel
2.2.5.10

Check the HMC is in the latest version (V8R860)

hscroot@myhmc:~> lshmc -V
"version= Version: 8
 Release: 8.6.0
 Service Pack: 0
HMC Build level 20161101.1
MH01655: Required fix for HMC V8R8.6.0 (11-01-2016)
","base_version=V8R8.6.0
"

Check the firmware version is ok on the PowerSystem:

# updlic -o u -t sys -l latest -m reptilian-9119-MME-659707C -r mountpoint -d /home/hscroot/860_056/ -v
# lslic -m reptilan-9119-MME-65BA46F -F activated_level,activated_spname
56,FW860.10

What’s SRIOV vNIC failover and how it works ?

I’ll not explain here what’s an SRIOV vNIC, if you want to know more about it just check my previous blog post speaking about this topic A first look at SRIOV vNIC adapters. What’s failover is adding is a feature allowing you to add as “many” backing devices as you want for a vNIC adapter (the maximum is 6 backing devices). For each backing device you have the possibility to choose on which Virtual I/O Server will be created the corresponding vnicserver and set a failover priority to determine which backing device is active. Keep in mind that priorities are working the exact same way as it is with Shared Ethernet Adapter. This means that priority 10 is an higher priority than priority 20.

On the example shown on the images above and below the vNIC is configured with two backing devices (on two differents SRIOV adapters) with priority 10 and 20. As long as there is no outage (for instance on the Virtual I/O Server or on the adapter itself) the physical port utilized will be the one with priority 10. If the adapter has for instance an hardware issue we will have the possiblity to manually fallback on the second backing device or let the hypervisor do this for us by checking the next highest priority to choose the right backing device to use. Easy. This allow us to have redundant LPM aware and high performance adapters fully virtualized. A MUST :-) !

Creating a SRIOV vNIC failover using the HMC GUI and administrating it

To create or delete an SRIOV vNIC failover adapter (I’ll call this vNIC for the rest of the blog post) the machine must be shutdown or active (this is not possible to add a vNIC when a machine is booted in OpenFirmware). The only way to do this using the HMC GUI is to used the enhanced interface (no problem as we will have no other choice in a near future). Select the machine on which you want to create the adapter and click on the “Virtual NICs” tab.

Click “Add Virtual NIC”:

Chose the “Physical Port Location Code” (the physical port of the SRIOV adapter) on which you want to create the vNIC. You can add from one to six “backup adapter” (by clicking the “Add Entry” buton). This means that only one vNIC will be active at a moment. If this one is failing (adapter issue, network issue) the vNIC will failover to the next backup adapter depending on the “Failover priority”. Be careful to spread the hosting Virtual I/O Server to be sure that having a Virtual I/O Server down will be seamless for you partition:

On the example above:

I’m creating a vNIC failover with “vNIC Auto Priority Failover” enabled.
Four VF will be created two on the VIOS ending with 88, two on the VIOS ending with 89.
Obviously four vnicservers will be created on the VIOS (2 on each).
The lower priority will take the lead. This means That if the first one with priority 10 is failing the active adapter will be the second one. Then if the second one with priority 20 is failing the third one will be active and so on. Keep in my that if your lower priority is ok nothing will appends if one on the other backup adapter is failing. Be smart when choosing the priorities. As Yoda says “Wise you must be!”.
The physical ports are located on different CECs.

The “Advanced Virtual NIC Settings” is applied to all the vNIC that will be created (in the example above 4). For instance I’m using vlan tagging on these port so I just need to apply the “Port VLAN ID” one time.

You can choose or not to allow the hypervisor to perform the failover/fallback automatically depending on the priorities you have set. If you click “enable” the hypervisor will automatically failover to the next operational backing device depending on the priorities. If it is disabled only a user can trigger a failover operation.

Be careful the priorities are designed the same way they are on Shared Ethernet Adapter. This means the lowest number you will have in the failover priority will be the “highest priority failover” just like it is designed for Shared Ethernet Adapter. On the image below you can notice that the “priority” 10 which is the “highest failover priority” is active (but it is the lowest number between 10 20 30 and 40)

After the creation of the vNIC you can check differents stuffs on the Virtual I/O Server. You will notice that every entry added for the creation of the vNIC has a corresponding VF (virtual function) and a corresponding vnicserver (each vnicserver has a VF mapped on it):

You can see that for each entry added when creating a vNIC you’ll have the corresponding VF device present on the Virtual I/O Servers:

vios1# lsdev -type adapter -field name physloc description | grep "VF"
[..]
ent3             U78CA.001.CSS08ZN-P1-C3-C1-T2-S5                                  PCIe3 4-Port 10GbE SR Adapter VF(df1028e21410e304)
ent4             U78CA.001.CSS08EL-P1-C3-C1-T2-S6                                  PCIe3 4-Port 10GbE SR Adapter VF(df1028e21410e304)

vios2# lsdev -type adapter -field name physloc description | grep "VF"
[..]
ent3             U78CA.001.CSS08ZN-P1-C4-C1-T2-S2                                  PCIe3 4-Port 10GbE SR Adapter VF(df1028e21410e304)
ent4             U78CA.001.CSS08EL-P1-C4-C1-T2-S2                                  PCIe3 4-Port 10GbE SR Adapter VF(df1028e21410e304)

For each VF you’ll see the corresponding vnicserver devices:

vios1# lsdev -type adapter -virtual | grep vnicserver
[..]
vnicserver1      Available   Virtual NIC Server Device (vnicserver)
vnicserver2      Available   Virtual NIC Server Device (vnicserver)

vios2# lsdev -type adapter -virtual | grep vnicserver
[..]
vnicserver1      Available   Virtual NIC Server Device (vnicserver)
vnicserver2      Available   Virtual NIC Server Device (vnicserver)

You can check the corresponding mapped VF for each vnicserver using the ‘lsmap’ command. You can check on funny thing: when the adapter was never “used” by using the “Make the backing Device Active” button in the GUI the corresponding client name and Client device will not be showed:

vios1# lsmap -all -vnic -fmt :
[..]
vnicserver1:U9119.MME.659707C-V2-C32898:6:lizard:AIX:ent3:Available:U78CA.001.CSS08ZN-P1-C3-C1-T2-S5:ent0:U9119.MME.659707C-V6-C6
vnicserver2:U9119.MME.659707C-V2-C32899:6:N/A:N/A:ent4:Available:U78CA.001.CSS08EL-P1-C3-C1-T2-S6:N/A:U9119.MME.659707C-V6-C6

vios2# lsmap -all -vnic
[..]
Name          Physloc                            ClntID ClntName       ClntOS
------------- ---------------------------------- ------ -------------- -------
vnicserver1   U9119.MME.659707C-V1-C32898             6 N/A            N/A

Backing device:ent3
Status:Available
Physloc:U78CA.001.CSS08ZN-P1-C4-C1-T2-S2
Client device name:ent0
Client device physloc:U9119.MME.659707C-V6-C6

Name          Physloc                            ClntID ClntName       ClntOS
------------- ---------------------------------- ------ -------------- -------
vnicserver2   U9119.MME.659707C-V1-C32899             6 N/A            N/A

Backing device:ent4
Status:Available
Physloc:U78CA.001.CSS08EL-P1-C4-C1-T2-S2
Client device name:N/A
Client device physloc:U9119.MME.659707C-V6-C6

You can activate the device by yourself just by clicking the “Make backing Device Active Button” in the GUI and check the vnicserver is now logged:

vios2# lsmap -all -vnic -vadapter
[..]
vnicserver1:U9119.MME.659707C-V1-C32898:6:lizard:AIX:ent3:Available:U78CA.001.CSS08ZN-P1-C4-C1-T2-S2:ent0:U9119.MME.659707C-V6-C6
vnicserver2:U9119.MME.659707C-V1-C32899:6:N/A:N/A:ent4:Available:U78CA.001.CSS08EL-P1-C4-C1-T2-S2:N/A:U9119.MME.659707C-V6-C6

I noticed something pretty strange for me. When you are doing a manual failover of the vNIC the auto-priority will be set to disable. Remember to re-enable it after the manual operation was performed:

You can also check the status and the priority of the vNIC in the Virtual I/O Server using the vnicstat command. Some good information are showed by the command, the state of the device, if it is active or not (I have noticed 2 different states in my test which are “active” (meaning this is the vf/vnicserver you are using) and “config_2″ meaning the adapter is ready and available for a failover operation (there is probably another state when the link is down but I didn’t had the time to ask my network team to shut a port to verify this)) and finally the failover priority. The vnicstat command is a root command.

vios1#  vnicstat vnicserver1

--------------------------------------------------------------------------------
VNIC Server Statistics: vnicserver1
--------------------------------------------------------------------------------
Device Statistics:
------------------
State: active
Backing Device Name: ent3

Failover State: active
Failover Readiness: operational
Failover Priority: 10

Client Partition ID: 6
Client Partition Name: lizard
Client Operating System: AIX
Client Device Name: ent0
Client Device Location Code: U9119.MME.659707C-V6-C6
[..]

vios2# vnicstat vnicserver1
--------------------------------------------------------------------------------
VNIC Server Statistics: vnicserver1
--------------------------------------------------------------------------------
Device Statistics:
------------------
State: config_2
Backing Device Name: ent3

Failover State: inactive
Failover Readiness: operational
Failover Priority: 20
[..]

You can also check vnic server events in this errpt (login when failover and so on …)

# errpt | more
8C577CB6   1202195216 I S vnicserver1    VNIC Transport Event
60D73419   1202194816 I S vnicserver1    VNIC Client Login
# errpt -aj 60D73419 | more
---------------------------------------------------------------------------
LABEL:          VS_CLIENT_LOGIN
IDENTIFIER:     60D73419

Date/Time:       Fri Dec  2 19:48:06 2016
Sequence Number: 10567
Machine Id:      00C9707C4C00
Node Id:         vios2
Class:           S
Type:            INFO
WPAR:            Global
Resource Name:   vnicserver1

Description
VNIC Client Login

Probable Causes
VNIC Client Login

Failure Causes
VNIC Client Login

Same thing using the hmc command line.

Now we will do the same thing in command line. I warn you the commands are pretty huge !!!!

List the sriov adapter (you will need those to create the vNICs):

# lshwres -r sriov --rsubtype adapter -m reptilian-9119-MME-65BA46F
adapter_id=3,slot_id=21010012,adapter_max_logical_ports=64,config_state=sriov,functional_state=1,logical_ports=64,phys_loc=U78CA.001.CSS08XH-P1-C3-C1,phys_ports=4,sriov_status=running,alternate_config=0
adapter_id=4,slot_id=21010013,adapter_max_logical_ports=64,config_state=sriov,functional_state=1,logical_ports=64,phys_loc=U78CA.001.CSS08XH-P1-C4-C1,phys_ports=4,sriov_status=running,alternate_config=0
adapter_id=1,slot_id=21010022,adapter_max_logical_ports=64,config_state=sriov,functional_state=1,logical_ports=64,phys_loc=U78CA.001.CSS08RG-P1-C3-C1,phys_ports=4,sriov_status=running,alternate_config=0
adapter_id=2,slot_id=21010023,adapter_max_logical_ports=64,config_state=sriov,functional_state=1,logical_ports=64,phys_loc=U78CA.001.CSS08RG-P1-C4-C1,phys_ports=4,sriov_status=running,alternate_config=0

List vNIC for virtual machine “lizard”:

lshwres -r virtualio  -m reptilian-9119-MME-65BA46F --rsubtype vnic --level lpar --filter "lpar_names=lizard"
lpar_name=lizard,lpar_id=6,slot_num=6,desired_mode=ded,curr_mode=ded,auto_priority_failover=0,port_vlan_id=0,pvid_priority=0,allowed_vlan_ids=all,mac_addr=6ac53577b106,allowed_os_mac_addrs=all,"backing_devices=sriov/vios1/1/3/0/2700c003/2.0/2.0/50,sriov/vios2/2/1/0/27004003/2.0/2.0/60","backing_device_states=sriov/2700c003/0/Operational,sriov/27004003/1/Operational"

Creates a vNIC with 2 backing devices first one on Virtual I/O Server 1 on adapter 1 on physical port 2 with a failover priority set to 10, second one on Virtual I/O Server 2 on adapter 3 on physical port 2 with a failover priority set to 20 (this vNIC will take the next available slot which will be 6) (WARNING: Physical port numbering starts from 0):

#chhwres -r virtualio -m reptilian-9119-MME-65BA46F -o a -p lizard --rsubtype vnic -v -a 'port_vlan_id=3455,auto_priority_failover=1,backing_devices="sriov/vios1//1/1/2.0/10,sriov/vios1//3/1/2.0/20"'
#lshwres -r virtualio -m reptilian-9119-MME-65BA46F --rsubtype vnic --level lpar --filter "lpar_names=lizard"
lpar_name=lizard,lpar_id=6,slot_num=6,desired_mode=ded,curr_mode=ded,auto_priority_failover=1,port_vlan_id=3455,pvid_priority=0,allowed_vlan_ids=all,mac_addr=6ac53577b106,allowed_os_mac_addrs=all,"backing_devices=sriov/vios1/1/1/1/2700400b/2.0/2.0/10,sriov/vios2/2/3/1/2700c008/2.0/2.0/20","backing_device_states=sriov/2700400b/1/Operational,sriov/2700c008/0/Operational"

Add two backing devices (one on each vios on adapter 2 and 4, both on physical port 2 with failover priority set to 30 and 40) on vNIC with slot 6:

# chhwres -r virtualio -m reptilian-9119-MME-65BA46F -o s --rsubtype vnic -p lizard -s 6 -a '"backing_devices+=sriov/vios1//2/1/2.0/30,sriov/vios2//4/1/2.0/40"'
# lshwres -r virtualio -m reptilian-9119-MME-65BA46F --rsubtype vnic --level lpar --filter "lpar_names=lizard"
lpar_name=lizard,lpar_id=6,slot_num=6,desired_mode=ded,curr_mode=ded,auto_priority_failover=1,port_vlan_id=3455,pvid_priority=0,allowed_vlan_ids=all,mac_addr=6ac53577b106,allowed_os_mac_addrs=all,"backing_devices=sriov/vios1/1/1/1/2700400b/2.0/2.0/10,sriov/vios2/2/3/1/2700c008/2.0/2.0/20,sriov/vios1/1/2/1/27008005/2.0/2.0/30,sriov/vios2/2/4/1/27010002/2.0/2.0/40","backing_device_states=sriov/2700400b/1/Operational,sriov/2700c008/0/Operational,sriov/27008005/0/Operational,sriov/27010002/0/Operational"

Change the failover priority of logical port 2700400b of the vNIC in slot 6 to 11:

# chhwres -m reptilian-9119-MME-65BA46F -r virtualio -o s --rsubtype vnicbkdev -p lizard -s 6 --logport 2700400b -a "failover_priority=11"
# lshwres -r virtualio -m reptilian-9119-MME-65BA46F --rsubtype vnic --level lpar --filter "lpar_names=lizard"
lpar_name=lizard,lpar_id=6,slot_num=6,desired_mode=ded,curr_mode=ded,auto_priority_failover=1,port_vlan_id=3455,pvid_priority=0,allowed_vlan_ids=all,mac_addr=6ac53577b106,allowed_os_mac_addrs=all,"backing_devices=sriov/vios1/1/1/1/2700400b/2.0/2.0/11,sriov/vios2/2/3/1/2700c008/2.0/2.0/20,sriov/vios1/1/2/1/27008005/2.0/2.0/30,sriov/vios2/2/4/1/27010002/2.0/2.0/40","backing_device_states=sriov/2700400b/1/Operational,sriov/2700c008/0/Operational,sriov/27008005/0/Operational,sriov/27010002/0/Operational"

Make logical port 27008005 active on vNIC in slot 6:

# chhwres -m reptilian-9119-MME-65BA46F -r virtualio -o act --rsubtype vnicbkdev -p lizard  -s 6 --logport 27008005 
# lshwres -r virtualio -m reptilian-9119-MME-65BA46F --rsubtype vnic --level lpar --filter "lpar_names=lizard"
lpar_name=lizard,lpar_id=6,slot_num=6,desired_mode=ded,curr_mode=ded,auto_priority_failover=0,port_vlan_id=3455,pvid_priority=0,allowed_vlan_ids=all,mac_addr=6ac53577b106,allowed_os_mac_addrs=all,"backing_devices=sriov/vios1/1/1/1/2700400b/2.0/2.0/11,sriov/vios2/2/3/1/2700c008/2.0/2.0/20,sriov/vios1/1/2/1/27008005/2.0/2.0/30,sriov/vios2/2/4/1/27010002/2.0/2.0/40","backing_device_states=sriov/2700400b/0/Operational,sriov/2700c008/0/Operational,sriov/27008005/1/Operational,sriov/27010002/0/Operational"

Re-enable automatic failover on vNIC in slot 6:

# chhwres -m reptilian-9119-MME-65BA46F -r virtualio -o s --rsubtype vnic -p lizard  -s 6 -a "auto_priority_failover=1"
# lshwres -r virtualio -m reptilian-9119-MME-65BA46F --rsubtype vnic --level lpar --filter "lpar_names=lizard"
lpar_name=lizard,lpar_id=6,slot_num=6,desired_mode=ded,curr_mode=ded,auto_priority_failover=1,port_vlan_id=3455,pvid_priority=0,allowed_vlan_ids=all,mac_addr=6ac53577b106,allowed_os_mac_addrs=all,"backing_devices=sriov/vios1/1/1/1/2700400b/2.0/2.0/11,sriov/vios2/2/3/1/2700c008/2.0/2.0/20,sriov/vios1/1/2/1/27008005/2.0/2.0/30,sriov/vios2/2/4/1/27010002/2.0/2.0/40","backing_device_states=sriov/2700400b/1/Operational,sriov/2700c008/0/Operational,sriov/27008005/0/Operational,sriov/27010002/0/Operational"

Testing the failover.

It’s now time to test is the failover is working as intended. The test will be super simple I will just shutoff one of the two Virtual I/O Server and check if I’m loosing some packets or not. I’m first checking on which VIOS is located the active adapter:

I now need to shutdown the Virtual I/O Server ending with 88 and check if the one ending with 89 is taking the lead:

*****88# shutdown -force

Priorities 10 and 30 are on the shutted Virtual I/O Server, the highest priority is on the active Virtual I/O Server is 20. This backing device hosted on the second Virtual I/O Server is serving the network I/Os;

You can check the same thing with command line on the remaining Virtual I/O Server:

*****89# errpt | more
IDENTIFIER TIMESTAMP  T C RESOURCE_NAME  DESCRIPTION
60D73419   1202214716 I S vnicserver0    VNIC Client Login
60D73419   1202214716 I S vnicserver1    VNIC Client Login
*****89# vnicstat vnicserver1
--------------------------------------------------------------------------------
VNIC Server Statistics: vnicserver1
--------------------------------------------------------------------------------
Device Statistics:
------------------
State: active
Backing Device Name: ent3

Failover State: active
Failover Readiness: operational
Failover Priority: 20

During my tests the failover was working as I expected. You can see on the picture below that during this test I only lost one ping between 64 and 66 during the failover/failback process.

In the partition I saw some messaging in the errpt during the failover:

# errpt | mroe 
4FB9389C   1202215816 I S ent0           VNIC Link Up
F655DA07   1202215816 I S ent0           VNIC Link Down
# errpt -a | more
[..]
SOURCE ADDRESS
56FB 2DB8 A406
Event
physical link: DOWN   logical link: DOWN
Status
[..]
SOURCE ADDRESS
56FB 2DB8 A406
Event
physical link: UP   logical link: UP
Status

What about Live Partition Mobility.

If you want a seamless LPM experience without having to choose the destination adapter and physical port on which to map you current vNIC backing devices on the destination, just fill the label and sublabel (most important is label) for each physical port of your SRIOV adapter. Then during the LPM if names are aligned between two systems the good physical port will be automatically chose depending on the names of the label:

The LPM was working like a charm and I didn’t notice any particular problems during the move. vNIC failover and LPM are working ok as long as you take care of your SRIOV labels :-). I did notice on AIX 7.2 TL1 SP1 that there was no errpt messages in the partition itself but just in the Virtual I/O Server … weird :-)

# errpt | more
IDENTIFIER TIMESTAMP  T C RESOURCE_NAME  DESCRIPTION
3EB09F5A   1202222416 I S Migration      Migration completed successfully

Conlusion.

No long story here. If you need performance AND flexibility you absolutely have to use SRIOV vNIC failover adapters. This feature offers you the best of two worlds having the possibility to dedicate 10GB adapters with a failover capability without having to be worried about LPM or about NIB configuration. It’s not applicable in all cases but it’s definitely something to have for an environment such as TSM or network I/O intensive workloads. Use it !

About reptilians !

Before you start reading this, keep your sense of humor and be noticed that what I say is not related to my workplace at all it’s a general way of thinking not especially based on my experience. Don’t be offended by this it’s just a personal opinion based on things I may or may have not seen during my life. You’ve been warned.

This blog was never a place to share my opinions about life and society but I must admit that I should have done that before. Speaking about this kind of things makes you feel alive in world where everything needs to be ok and where you don’t have anymore the right to feel or express something about what you are living. There are couple of good blog posts speaking of this kind of things and related to the IT world. I agree with all of what is said in these posts. Some of the authors of these posts are just telling what they love in their daily jobs but I think it’s also a way to say what they probably won’t love in another one :-) :

Adam Leventhal’s “I’m not a resource”: here
Brendan Gregg’s “Working at Netflix in 2016″: here

All of this to say that I work at nights, I work on weekends, I’m thinking about PowerSystems/computers when I fall asleep. I always have new ideas and I always want to learn new things, discover new technologies and features. I truly, deeply love this but being like this does not help me and will never help me in my daily job for one single reason. In this world people who have the knowledge are not people who are taking technical decisions it’s sad but true. I’m just good at working the most I can for the less money possible. Nobody cares if techs are happy, unhappy, want to stay or leave. I doesn’t make any differences for anyone driving a company. What’s important is money. Everything is meaningless. We are no one we are nothing, just number in a excel spreadsheet. I’m probably saying because I’m not good enough in anything to find an acceptable workplace. Once again sad but true.

Even worst, if you just want to follow what’s the industry is asking you have to be everywhere and know everything. I know I’ll be forced in a very near future to move on Devops/ Linux (I love Linux I’m an RHCE certified engineer !). That’s why since a couple of years now, at night after my daily job is finished I’m working again: working to understand how Docker is working, working to install my own Openstack on my own machines, working to understand Saltstack, Ceph, Python, Ruby, Go …. it’s a never ending process. But it’s still not enough for them ! No enough to be consider as good or good enough guy to fit for a job. I remember being asked to know about Openstack, Cassandra, Hadoop, AWS, KVM, Linux, Automation tools (puppet this time), Docker and continuous integration for one single job application. First, I seriously doubt that someone will have such skills and be good at each. Second even if I’m an expert on each one if you have a look a few years ago it was the exact same thing but with different products. You have to understand and be good at every new products in minutes. All of this to understand that one or two years after you are considered as an “expert” you are bad at everything that exists in the industry. I’m really sick of this fight against something I can’t control. Being a hard worker and clever enough to understand every new features is not enough nowadays. On top of that you also need to be a beautiful person with a nice perfect smile wearing a perfect suit. You also have to be on LinkedIn and be connected with the good persons. And even if every of these boxes are checked you still need to be lucky enough to be at the right place at the right moment. I’m so sick of this. Work doesn’t pay. Only luck. I don’t want to live in this kind of world but I have to. Anyway this is just a “two-cents” way of thinking. Everything is probably a big trick orchestrated by this reptilians lizard mens ! ^^. Be good at what you do and don’t care about what people are thinking of you (even your horrible french accent during your sessions) … that’s the most important !

↧

Running Docker on PowerSystems using Ubuntu and Redhat ppc64le (docker, registry, compose and swarm with haproxy)

February 23, 2017, 10:12 am

≫ Next: Building a “Docker Swarm as a Service” infrastructure on ppc64le Linux (w/ Ansible & Openstack ARA, Terraform, GlusterFS, Traefik and Portainer)

≪ Previous: Unleash the true potential of SRIOV vNIC using vNIC failover !

Every blog post that I read since a couple of months are mentioning Docker, that’s a fact ! I’ve never been so stressed since years because our jobs are changing. That is not my choice or my will but what we were doing couple of years ago and what we are doing now is going to disappear sooner than I thought. The world of infrastructure as we know it is dying, same thing for sysadmin jobs. I would never have thought this was something that could happen to me during my career, but here we are. Old Unix systems are slowly dying and Linux virtual machines are becoming less and less popular. One part of my career plan is to be excellent on two different systems Linux and AIX but I now have to recognize I probably made a mistake thinking it will saves me from unemployment or from any bullshit job. We’re all gonna end up retired that’s certain but the reality is that I’ll prefer working on something fun and innovative than being stuck on old stuffs forever. We’ve got Openstack for a while and we now have Docker. As no employers will look at a candidate with no Docker experience I had to learn this (in fact I’m using docker since more than one year now. My twitter followers already knows this). I don’t want to be one of the social-reject of a world that is changing too fast. Computer science is living its car crisis and we are the blue collars who will be left behind. There is no choice; there won’t be a place for everyone and you’ll not be the only one fighting in the pit trying to be hired. You have to react now or slowly die … like all the sysadmins I see in banks getting worse and worse. Moving them on Openstack was a real challenge (still not completed) I can’t imagine trying to make them work on Docker. On the other hand I’m also surrounded by excellent people (I have to say I’ve met a true genius a couple of years ago) who are doing crazy things. Unfortunately for me they are not working with me (they are in big companies (ie. RedHat/Oracle/Big blue) or in other places where people tends to understand something is changing and going on)). I feel like being a bad at everything I do. Unemployable. Nevertheless I still have the energy to work on new things and Docker is a part of it. One of my challenge was/is to migrate all our infrastructure services on Docker, not just for the fun but to be able to easily reproduce this infrastructure over and over again. The goal here is to run every infrastructure service in a Docker containers and try at least to make them highly available. We are here going to see how to do that on PowerSystems trying to use Ubuntu or Redhat ppc64le to run our Docker engine and containers. We will next create our own Docker base images (Ubuntu and Redhat ones) and push it in our custom made registry. Then we will create containers for our applications (I’ll just give here some examples (webserver and grafana/influxdb). Then to finish we will try Swarm to make these containers highly available by creating “global/replicas” services. This blog post is also here to prove that Power is an architecture on which you can do the exact same thing as x86. Having Ubuntu 16.04 LTS available on ppc64le arch is a damn good thing because it provides a lot of Opensource products (graphite, grafana, influxdb and all web servers, and so on). Let’s do everything to become a killer DevOps. I have done this for sysadmin stuffs why the hell I’ll not be capable of providing the same effort on DevOps things. I’m not that bad, at least I try.

Installing the docker-engine

Red Hat Enterprise Linux ppc64el

Unfortunately for our “little” community the current Red Hat Enterprise repositories for the ppc64le arch do not provides the Docker packages. IBM is providing a repository at this adress http://ftp.unicamp.br/pub/ppc64el/rhel/7_1/. On my side I’m mirroring this repository on my local site (with wget) and create my own repository as my servers have no access to the internet. Keep in mind that this repository is not up to date with the lastest version of Docker. At the time I’m writing this blog post Docker 1.13 is available on this repository is still serving Docker 1.12. Not exactly what we want for a technology like Docker (we absolutely want to keep the engine up to date):

# wget --mirror http://ftp.unicamp.br/pub/ppc64el/rhel/7_1/docker-ppc64el/
# wget --mirror http://ftp.unicamp.br/pub/ppc64el/rhel/7_1/misc_ppc64el/
# cat docker.repo
[docker-ppc64le-misc]
name=docker-ppc64le-msic
baseurl=http://nimprod:8080/dockermisc-ppc64el/
enabled=1
gpgcheck=0
[docker-ppc64le]
name=docker-ppc64le
baseurl=http://nimprod:8080/docker-ppc64el/
enabled=1
gpgcheck=0
# yum info docker.ppc64le
Loaded plugins: product-id, search-disabled-repos, subscription-manager
This system is not registered to Red Hat Subscription Management. You can use subscription-manager to register.
Installed Packages
Name        : docker
Arch        : ppc64le
Version     : 1.12.0
Release     : 0.ael7b
Size        : 77 M
Repo        : installed
From repo   : docker-ppc64le
Summary     : The open-source application container engine
URL         : https://dockerproject.org
License     : ASL 2.0
Description : Docker is an open source project to build, ship and run any application as a
[..]
# yum search swarm
yum search swarm
Loaded plugins: product-id, search-disabled-repos, subscription-manager
This system is not registered to Red Hat Subscription Management. You can use subscription-manager to register.
============================================================================================================================== N/S matched: swarm ==============================================================================================================================
docker-swarm.ppc64le : Docker Swarm is native clustering for Docker.
[..]

# yum -y install docker
[..]
Downloading packages:
(1/3): docker-selinux-1.12.0-0.ael7b.noarch.rpm                                                                                                                                                                                                          |  27 kB  00:00:00
(2/3): libtool-ltdl-2.4.2-20.el7.ppc64le.rpm                                                                                                                                                                                                             |  50 kB  00:00:00
(3/3): docker-1.12.0-0.ael7b.ppc64le.rpm                                                                                                                                                                                                                 |  16 MB  00:00:00
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Total                                                                                                                                                                                                                                            33 MB/s |  16 MB  00:00:00
Running transaction check
Running transaction test
Transaction test succeeded
Running transaction
  Installing : libtool-ltdl-2.4.2-20.el7.ppc64le                                                                                                                                                                                                                            1/3
  Installing : docker-selinux-1.12.0-0.ael7b.noarch                                                                                                                                                                                                                         2/3
setsebool:  SELinux is disabled.
  Installing : docker-1.12.0-0.ael7b.ppc64le                                                                                                                                                                                                                                3/3
rhel72/productid                                                                                                                                                                                                                                         | 1.6 kB  00:00:00
  Verifying  : docker-selinux-1.12.0-0.ael7b.noarch                                                                                                                                                                                                                         1/3
  Verifying  : docker-1.12.0-0.ael7b.ppc64le                                                                                                                                                                                                                                2/3
  Verifying  : libtool-ltdl-2.4.2-20.el7.ppc64le                                                                                                                                                                                                                            3/3

Installed:
  docker.ppc64le 0:1.12.0-0.ael7b

Dependency Installed:
  docker-selinux.noarch 0:1.12.0-0.ael7b                                                                                                   libtool-ltdl.ppc64le 0:2.4.2-20.el7

Complete!
# systemctl start docker
# docker ps -a
CONTAINER ID        IMAGE               COMMAND             CREATED             STATUS              PORTS               NAMES
# docker info
Containers: 0
 Running: 0
 Paused: 0
 Stopped: 0
Images: 0
Server Version: 1.12.0
[..]

Enabling the device-mapper direct disk mode (instead of loop)

By default on RHEL after installing the docker packages and starting the engine Docker use an lvm loop device to create it’s pool (where the images and the containers will be stored). This is not recommanded and not good for production usage. That’s why on every docker engine host I’m creating a dockervg for this pool. Red Hat provides with the atomic host project a tool called docker-storage-setup to let you configure the thin pool for you (on another volume group).

# git clone https://github.com/projectatomic/docker-storage-setup.git
# cd docker-storage-setup
# make install

Create a volume group on a physical volume, configure and run docker-storage-setup:

# docker-storage-setup --reset
# systemctl stop docker
# rm -rf /var/lib/docker
# pvcreate /dev/mapper/mpathb
  Physical volume "/dev/mapper/mpathb" successfully created
# vgcreate dockervg /dev/mapper/mpathb
  Volume group "dockervg" successfully created
# cat /etc/sysconfig/docker-storage-setup
# Edit this file to override any configuration options specified in
# /usr/lib/docker-storage-setup/docker-storage-setup.
#
# For more details refer to "man docker-storage-setup"
VG=dockervg
SETUP_LVM_THIN_POOL=yes
DATA_SIZE=70%FREE
# /usr/bin/docker-storage-setup
  Rounding up size to full physical extent 104.00 MiB
  Logical volume "docker-pool" created.
  Logical volume "docker-pool" changed.
# cat /etc/sysconfig/docker-storage
DOCKER_STORAGE_OPTIONS="--storage-driver devicemapper --storage-opt dm.fs=xfs --storage-opt dm.thinpooldev=/dev/mapper/dockervg-docker--pool --storage-opt dm.use_deferred_removal=true "

I don’t know why on the version of docker I am running the DOCKER_STORAGE_OPTIONS (in /etc/sysconfig/docker-storage) was not read. I had to manually edit the systemctl unit to be able to let Docker use my thinpooldev:

# vi /usr/lib/systemd/system/docker.service
ExecStart=/usr/bin/dockerd --storage-driver devicemapper --storage-opt dm.fs=xfs --storage-opt dm.thinpooldev=/dev/mapper/dockervg-docker--pool --storage-opt dm.use_deferred_removal=true
# systemctl daemon-reload
# systemctl start docker
# docker info
[..]
Storage Driver: devicemapper
 Pool Name: dockervg-docker--pool
 Pool Blocksize: 524.3 kB
 Base Device Size: 10.74 GB
 Backing Filesystem: xfs
 Data file:
 Metadata file:
 Data Space Used: 20.45 MB
 Data Space Total: 74.94 GB
 Data Space Available: 74.92 GB
 Metadata Space Used: 77.82 kB
 Metadata Space Total: 109.1 MB
 Metadata Space Available: 109 MB
 Thin Pool Minimum Free Space: 7.494 GB
 Udev Sync Supported: true
 Deferred Removal Enabled: true
 Deferred Deletion Enabled: false
 Deferred Deleted Device Count: 0
 Library Version: 1.02.107-RHEL7 (2015-10-14)

Ubuntu 16.04 LTS ppc64le

As always on Ubuntu all is always super easy. I’m just deploying an Ubuntu 16.04 LTS and run a single apt install to install the docker engine. Neat. Just for you information as my server do not have any access to the internet I’m using a tool called apt-mirror to mirror the official Ubuntu repositories. The tool can be found easily on github at this address. https://github.com/apt-mirror/apt-mirror. You then just have to specify which arch and which repository you want to clone on your local site:

# cat /etc/apt/mirror.list
[..]
set defaultarch       ppc64el
[..]
set use_proxy         on
set http_proxy        proxy:8080
set proxy_user        benoit
set proxy_password    mypasswd
[..]
deb http://ports.ubuntu.com/ubuntu-ports xenial main restricted universe multiverse
deb http://ports.ubuntu.com/ubuntu-ports xenial-security main restricted universe multiverse
deb http://ports.ubuntu.com/ubuntu-ports xenial-updates main restricted universe multiverse
deb http://ports.ubuntu.com/ubuntu-ports xenial-backports main restricted universe multiverse
# /usr/local/bin/apt-mirror
Downloading 152 index files using 20 threads...
Begin time: Fri Feb 17 14:36:03 2017
[20]... [19]... [18]... [17]... [16]... [15]... [14]... [13]... [12]... [11]... [10]... [9]... [8]... [7]... [6]... [5].

After having downloaded the packages create a repository based on these downloaded deb files accessible trough http and install Docker:

# cat /etc/os-release
NAME="Ubuntu"
VERSION="16.04 LTS (Xenial Xerus)"
# uname -a
Linux dockermachine1 4.4.0-21-generic #37-Ubuntu SMP Mon Apr 18 18:30:22 UTC 2016 ppc64le ppc64le ppc64le GNU/Linux
# apt-install docker.io
Reading package lists... Done
Building dependency tree
Reading state information... Done
[..]
Setting up docker.io (1.10.3-0ubuntu6) ...
Adding group `docker' (GID 116) ...
# docker ps -a
CONTAINER ID        IMAGE               COMMAND             CREATED             STATUS              PORTS               NAMES

On ubuntu use aufs

I strongly recommend keeping aufs as the default filesystem to store containers and images. I’m creating and mounting the /var/lib/docker/aufs on another disk with a lot of space available and that’s it:

# pvcreate /dev/mapper/mpathb
  Physical volume "/dev/mapper/mpathb" successfully created
# vgcreate dockervg /dev/mapper/mpathb
  Volume group "dockervg" successfully created
# lvcreate -n dockerlv -L99G dockervg
  Logical volume "dockerlv" created.
# mkfs.ext4 /dev/dockervg/dockerlv
[..]
# echo "/dev/mapper/dockervg-dockerlv /var/lib/docker/ ext4 errors=remount-ro 0       1" > /etc/fstab
# systemctl stop docker
# mount /var/lib/docker
# systemctl start docker
# df -h | grep docker
/dev/mapper/dockervg-dockerlv   98G   61M   93G   1% /var/lib/docker

The docker-compose case

If you’re installing Docker on a Ubuntu host everything is easy as docker-compose will be available on the Ubuntu official repository. Just run an apt-get install docker-compose and it’s ok.

# apt install docker-compose
[..]
# docker-compose -v
docker-compose version 1.5.2, build unknown

On RedHat compose is not available on the repository delivred by IBM. docker-compose is just a python program and can be downloaded and install via pip. Download compose on a machine with internet access, then use pip to install it:

On the machine having the access to the internet:

# mkdir compose
# pip install --proxy "http://benoit:mypasswd@myproxy:8080"  --download="compose" docker-compose --force --upgrade
[..]
Successfully downloaded docker-compose cached-property six backports.ssl-match-hostname PyYAML ipaddress enum34 colorama requests jsonschema docker texttable websocket-client docopt dockerpty functools32 docker-pycreds
# scp -r compose dockerhost:~
docker_compose-1.11.1-py2.py3-none-any.whl                                                                                                                                                                                                    100%   83KB  83.4KB/s   00:00
cached_property-1.3.0-py2.py3-none-any.whl                                                                                                                                                                                                    100% 8359     8.2KB/s   00:00
six-1.10.0-py2.py3-none-any.whl                                                                                                                                                                                                               100%   10KB  10.1KB/s   00:00
backports.ssl_match_hostname-3.5.0.1.tar.gz                                                                                                                                                                                                   100% 5605     5.5KB/s   00:00
PyYAML-3.12.tar.gz                                                                                                                                                                                                                            100%  247KB 247.1KB/s   00:00
ipaddress-1.0.18-py2-none-any.whl                                                                                                                                                                                                             100%   17KB  17.1KB/s   00:00
enum34-1.1.6-py2-none-any.whl                                                                                                                                                                                                                 100%   12KB  12.1KB/s   00:00
colorama-0.3.7-py2.py3-none-any.whl                                                                                                                                                                                                           100%   19KB  19.5KB/s   00:00
requests-2.11.1-py2.py3-none-any.whl                                                                                                                                                                                                          100%  503KB 502.8KB/s   00:00
jsonschema-2.6.0-py2.py3-none-any.whl                                                                                                                                                                                                         100%   39KB  38.6KB/s   00:00
docker-2.1.0-py2.py3-none-any.whl                                                                                                                                                                                                             100%  103KB 102.9KB/s   00:00
texttable-0.8.7.tar.gz                                                                                                                                                                                                                        100% 9829     9.6KB/s   00:00
websocket_client-0.40.0.tar.gz                                                                                                                                                                                                                100%  192KB 191.6KB/s   00:00
docopt-0.6.2.tar.gz                                                                                                                                                                                                                           100%   25KB  25.3KB/s   00:00
dockerpty-0.4.1.tar.gz                                                                                                                                                                                                                        100%   14KB  13.6KB/s   00:00
functools32-3.2.3-2.zip                                                                                                                                                                                                                       100%   33KB  33.3KB/s   00:00
docker_pycreds-0.2.1-py2.py3-none-any.whl                                                                                                                                                                                                     100% 4474     4.4KB/s   00:00

On the machine runrunning docker:

# rpm -ivh python2-pip-8.1.2-5.el7.noarch.rpm
# cd compose
# pip install docker-compose -f ./ --no-index
[..]
Successfully installed colorama-0.3.7 docker-2.1.0 docker-compose-1.11.1 ipaddress-1.0.18 jsonschema-2.6.0
# docker-compose -v
docker-compose version 1.11.1, build 7c5d5e4

Creating you docker base images and run your first application (a web server)

Regardless of which Linux distribution you have chosen you now need a docker base image to run your first containers. You have two choices: downloading an image from the internet and modify it to your own needs or create an image by yourself base on your current os.

Downloading an image from the internet

From a machine having an access to the internet install the docker engine and download the Ubuntu image. Using the docker save command create a tar based image. This one can then be imported on any docker engine using the docker load command:

On the machine having access to the internet:

# docker pull ppc64le/ubuntu
# docker save ppc644le/ubuntu > /tmp/ppc64le_ubuntu.tar

On your docker engine host:

# docker load  < ppc64le_ubuntu.tar
4fad21ac6351: Loading layer [==================================================>] 173.5 MB/173.5 MB
625e647dc584: Loading layer [==================================================>] 15.87 kB/15.87 kB
8505832e8bea: Loading layer [==================================================>] 9.216 kB/9.216 kB
9bca281924ab: Loading layer [==================================================>] 4.608 kB/4.608 kB
289bda1cbd14: Loading layer [==================================================>] 3.072 kB/3.072 kB
Loaded image: ppc64le/ubuntu:latest
# docker images
REPOSITORY          TAG                 IMAGE ID            CREATED             SIZE
ppc64le/ubuntu      latest              1967d889e07f        3 months ago        167.9 MB

The problem is that this image is not customized for your/my own needs. By this I mean the repositories used by the image are “pointing” to the officials Ubuntu repositories which will obviously not work if you have no access to the internet. We now have to modify the image for our needs. Run a container and launch a shell, then modify the sources.list with you local repository. Then commit this images to validate the changes made inside this one (you will generate a new image based on the current one plus your modifications):

# docker run -it ppc64le/ubuntu /bin/bash
# rm /etc/apt/sources.list
# echo "deb http://ubuntuppc64le.chmod666.org:8080/ubuntu-ports-main/ xenial main" >> /etc/apt/sources.list
# echo "deb http://ubuntuppc64le.chmod666.org:8080/ubuntu-ports-main/ xenial-updates main" >> /etc/apt/sources.list
# echo "deb http://ubuntuppc64le.chmod666.org:8080/ubuntu-ports-main/ xenial-security main" >> /etc/apt/sources.list
# echo "deb http://ubuntuppc64le.chmod666.org:8080/ubuntu-ports/ xenial restricted" >> /etc/apt/sources.list
# echo "deb http://ubuntuppc64le.chmod666.org:8080/ubuntu-ports/ xenial-updates restricted" >> /etc/apt/sources.list
# echo "deb http://ubuntuppc64le.chmod666.org:8080/ubuntu-ports/ xenial-security restricted" >> /etc/apt/sources.list
# echo "deb http://ubuntuppc64le.chmod666.org:8080/ubuntu-ports/ xenial universe" >> /etc/apt/sources.list
# echo "#deb http://ubuntuppc64le.chmod666.org:8080/ubuntu-ports/ xenial-updates universe" >> /etc/apt/sources.list
# echo "deb http://ubuntuppc64le.chmod666.org:8080/ubuntu-ports/ xenial-security universe" >> /etc/apt/sources.list
# echo "deb http://ubuntuppc64le.chmod666.org:8080/ubuntu-ports/ xenial multiverse" >> /etc/apt/sources.list
# echo "#deb http://ubuntuppc64le.chmod666.org:8080/ubuntu-ports/ xenial-updates multiverse" >> /etc/apt/sources.list
# echo "deb http://ubuntuppc64le.chmod666.org:8080/ubuntu-ports/ xenial-security multiverse" >> /etc/apt/sources.list
# exit
# docker ps -a
# docker commit
# docker commit a9506bd5dd30 ppc64le/ubuntucust
sha256:423c13b604dee8d24dae29566cd3a2252e4060270b71347f8d306380b8b6817d
# docker images

Test the image is working by creating an image based on the one just created before. I’m here creating a dockerfile to do this. I’m not explaining here how dockerfiles are working, there are plenty of tutorial on the internet to learn this. To sum up you need to now the basis of Docker to read this blog post ;-) .

# cat dockerfile
FROM ppc64le/ubuntucust

RUN apt-get -y update && apt-get -y install apache2

ENV APACHE_RUN_USER www-data
ENV APACHE_RUN_GROUP www-data
ENV APACHE_LOG_DIR /var/log/apache2
ENV APACHE_PID_FILE /var/run/apache2.pid
ENV APACHE_RUN_DIR /var/run/apache2
ENV APACHE_LOCK_DIR /var/lock/apache2

RUN mkdir -p $APACHE_RUN_DIR $APACHE_LOCK_DIR $APACHE_LOG_DIR

EXPOSE 80

CMD [ "-D", "FOREGROUND" ]
ENTRYPOINT ["/usr/sbin/apache2"]

I’m building the image calling it ubuntu_apache2 (this image will run a single apache2 server and expose the port 80):

# docker build -t ubuntu_apache2 . 
Sending build context to Docker daemon 2.048 kB
Step 1 : FROM ppc64le/ubuntucust
 ---> 423c13b604de
Step 2 : RUN apt-get -y update && apt-get -y install apache2
 ---> Running in 5f868988bf5c
Get:1 http://ubuntuppc64le.chmod666.org:8080/ubuntu-ports-main xenial InRelease [247 kB]
Get:2 http://ubuntuppc64le.chmod666.org:8080/ubuntu-ports-main xenial-updates InRelease [102 kB]
Get:3 http://ubuntuppc64le.chmod666.org:8080/ubuntu-ports-main xenial-security InRelease [102 kB]
debconf: unable to initialize frontend: Dialog
debconf: (TERM is not set, so the dialog frontend is not usable.)
debconf: falling back to frontend: Readline
Processing triggers for libc-bin (2.23-0ubuntu4) ...
Processing triggers for systemd (229-4ubuntu11) ...
Processing triggers for sgml-base (1.26+nmu4ubuntu1) ...
 ---> 4256ac36c0f7
Removing intermediate container 5f868988bf5c
Step 3 : EXPOSE 80
 ---> Running in fc72a50d3f1d
 ---> 3c273b0e2c3f
Removing intermediate container fc72a50d3f1d
Step 4 : CMD -D FOREGROUND
 ---> Running in 112d87a2f1e6
 ---> e6ddda152e97
Removing intermediate container 112d87a2f1e6
Step 5 : ENTRYPOINT /usr/sbin/apache2
 ---> Running in 6dab9b99f945
 ---> bed93aae55b3
Removing intermediate container 6dab9b99f945
Successfully built bed93aae55b3
# docker images
REPOSITORY           TAG                 IMAGE ID            CREATED              SIZE
ubuntu_apache2       latest              bed93aae55b3        About a minute ago   301.8 MB
ppc64le/ubuntucust   latest              423c13b604de        7 minutes ago        167.9 MB
ppc64le/ubuntu       latest              1967d889e07f        3 months ago         167.9 MB

Run a container with this image and expose the port 80:

# docker run -d -it -p 80:80 ubuntu_apache2
49916e3703c1cf0a671be10984b3215478973c0fd085490a61142b8959495732
# docker ps
CONTAINER ID        IMAGE               COMMAND                  CREATED             STATUS              PORTS                NAMES
49916e3703c1        ubuntu_apache2      "/usr/sbin/apache2 -D"   12 seconds ago      Up 10 seconds       0.0.0.0:80->80/tcp   high_brattain
# ps -ef | grep -i apache
root     11282 11267  0 11:04 pts/1    00:00:00 /usr/sbin/apache2 -D FOREGROUND
33       11302 11282  0 11:04 pts/1    00:00:00 /usr/sbin/apache2 -D FOREGROUND
33       11303 11282  0 11:04 pts/1    00:00:00 /usr/sbin/apache2 -D FOREGROUND
root     11382  3895  0 11:04 pts/0    00:00:00 grep --color=auto -i apache

On another host test the service is running by using curl (you can see here that you have access to the default index page of the Ubuntu apache2 server):

# curl mydockerhost
  <body>
    <div class="main_page">
      <div class="page_header floating_element">
        <img src="/icons/ubuntu-logo.png" alt="Ubuntu Logo" class="floating_element"/>
        <span class="floating_element">
          Apache2 Ubuntu Default Page
[..]

Creating your own image

You can also create your own image from scratch. For RHEL based systems (Centos, Fedora), Redhat provides an awesome script doing the job for you. This script is called mkimage-yum.sh and can be directly download from github. Have a look in it if you want to have the exact details (mknode, yum installroot, …..). The script will create a tar file and import it. After running the script you will have a new image available to use:

# wget https://github.com/docker/docker/blob/master/contrib/mkimage-yum.sh
# chmod +x mkimage-yum.sh 
# ./mkimage-yum.sh baserehel72
[..]
+ tar --numeric-owner -c -C /tmp/base.sh.bxma2T .
+ docker import - baserhel72:7.2
sha256:f8b80847b4c7fe03d2cfdeda0756a7aa857eb23ab68e5c954cf3f0cb01f61562
+ docker run -i -t --rm baserhel72:7.2 /bin/bash -c 'echo success'
success
+ rm -rf /tmp/base.sh.bxma2T
# docker images
REPOSITORY           TAG                 IMAGE ID            CREATED              SIZE
baserhel72           7.2                 f8b80847b4c7        About a minute ago   309.1 MB
[..]

I’m running a web server to be sure everything is working is ok (same thing than on Ubuntu, httpd installation and exposing the port 80). Here below here is the dockerfile and the image build:

# cat dockerfile
FROM baserhel72:7.2

RUN yum -y update && yum -y upgrade && yum -y install httpd

EXPOSE 80

CMD [ "-D", "FOREGROUND" ]
ENTRYPOINT ["/usr/sbin/httpd"]
# docker build -t rhel_httpd .
Sending build context to Docker daemon 2.048 kB
Step 1 : FROM baserhel72:7.2
 ---> 0c22a33fc079
Step 2 : RUN yum -y update && yum -y upgrade && yum -y install httpd
 ---> Running in 74c79763c56f
[..]
Dependency Installed:
  apr.ppc64le 0:1.4.8-3.el7                apr-util.ppc64le 0:1.5.2-6.el7
  httpd-tools.ppc64le 0:2.4.6-40.el7       mailcap.noarch 0:2.1.41-2.el7

Complete!
 ---> 73094e173c1b
Removing intermediate container 74c79763c56f
Step 3 : EXPOSE 80
 ---> Running in 045b86d1a6dc
 ---> f032c1569201
Removing intermediate container 045b86d1a6dc
Step 4 : CMD -D FOREGROUND
 ---> Running in 9edc1cc2540d
 ---> 6d5d27171cba
Removing intermediate container 9edc1cc2540d
Step 5 : ENTRYPOINT /usr/sbin/httpd
 ---> Running in 8280382d61f0
 ---> f937439d4359
Removing intermediate container 8280382d61f0
Successfully built f937439d4359

Again I’m launching a container and checking the service is available by curling the docker host. You can see that the image is based on RedHat … and the default page is the RHEL test page :

# docker run
# docker ps
CONTAINER ID        IMAGE               COMMAND                  CREATED             STATUS              PORTS                NAMES
30d090b2f0d1        rhel_httpd          "/usr/sbin/httpd -D F"   3 seconds ago       Up 1 seconds        0.0.0.0:80->80/tcp   agitated_boyd
# curl localhost
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http//www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">

<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en">
        <head>
                <title>Test Page for the Apache HTTP Server on Red Hat Enterprise Linux</title>
[..]

Creating your own docker registry

We now have our base Docker images but we want to make the available on every docker hosts without having to recreate them over and over again. To do so we are going to create what we call a docker registry. This registry will allow us to distribute our images across different docker hosts. Neat :-) . When you are installing Docker the package docker-distribution is also installed and is shipped with a binary called “registry”. Why not running the registry … in a Docker container?

Verify you have the registry command on the system:

# which registry
/usr/bin/registry
# registry --version
registry github.com/docker/distribution v2.3.0+unknown

The package containing the registry is docker-distribution:

# yum whatproviders /usr/bin/registry
Loaded plugins: product-id, search-disabled-repos, subscription-manager
This system is not registered to Red Hat Subscription Management. You can use subscription-manager to register.
docker-distribution-2.3.0-2.ael7b.ppc64le : Docker toolset to pack, ship, store, and deliver content
Repo        : @docker
Matched from:
Filename    : /usr/bin/registry

It’s a question of “chicken or egg” but you will need to have a base image to create your registry image (it’s obvious). As we have created our images locally we will now use one of these image (the RedHat one) to run the docker registry in a container. Here are the steps we are going to follow.

Create a dockerfile based on the RedHat image we just created before. This docker file will contain the registry binary (registry) (COPY ./registry), the registry config file (config.yml) (COPY ./config.yml) and a wrapper script allowing its execution (entrypoint.sh) (COPY ./entrypoint.sh). We will also secure the registry with a password using htaccess file type (RUN htpasswd). Finally we will make volumes (VOLUME) /var/lib/registry and /certs available and expose (EXPOSE) the port 5000. Obviously necessaries directories will be create (RUN mkdir) and need tools will be install (RUN yum). I’m also here generating the htaccess file with regimguser with the password regimguser:

# cat dockerfile
FROM ppc64le/rhel72:7.2

RUN yum update && yum upgrade && yum -y install httpd-tools
RUN mkdir /etc/registry && mkdir /certs

COPY ./registry /usr/bin/registry
COPY ./entrypoint.sh /entrypoint.sh
COPY ./config.yml /etc/registry/config.yml

RUN htpasswd -b -B -c /etc/registry/registry_passwd regimguser regimguser

VOLUME ["/var/lib/registry", "/certs"]
EXPOSE 5000

ENTRYPOINT ["./entrypoint.sh"]

CMD ["/etc/registry/config.yml"]

Copy the registry binary to the directory containing the dockerfile:

# cp /usr/bin/registry .

create an entrypoint.sh file in the directory containing the dockerfile. This script will launch the registry binary:

# cat entrypoint.sh
#!/bin/sh

set -e
exec /usr/bin/registry "$@"

Create a configuration file for the registry in the directory containing the dockerfile and name it config.yml. This configuration file will contain where to store the registry file, the certification and the authentication method to the registry (we are using an htaccess file):

version: 0.1
storage:
  filesystem:
    rootdirectory: /var/lib/registry
  delete:
    enabled: true
http:
  addr: :5000
  tls:
      certificate: /certs/domain.crt
      key: /certs/domain.key
auth:
  htpasswd:
    realm: basic-realm
    path: /etc/registry/registry_passwd

Build the image:

# docker build -t registry .
ending build context to Docker daemon 13.57 MB
Step 1 : FROM ppc64le/rhel72:7.2
 ---> 9005cbc9c7f6
Step 2 : RUN yum update && yum upgrade && yum -y install httpd-tools
 ---> Using cache
 ---> de34fdf3864e
Step 3 : RUN mkdir /etc/registry && mkdir /certs
 ---> Using cache
 ---> c801568b6944
Step 4 : COPY ./registry /usr/bin/registry
 ---> Using cache
 ---> 49927e0a90b8
Step 5 : COPY ./entrypoint.sh /entrypoint.sh
 ---> Using cache
[..]
Removing intermediate container 261f2b380556
Successfully built ccef43825f21
# docker images
REPOSITORY                                          TAG                 IMAGE ID            CREATED             SIZE
                                                            16d35e8c1177        About an hour ago   361 MB
registry                                            latest              4287d4e389dc        2 hours ago         361 MB

We now need to generate certificates and place it in the right directories to make the registry secure:

Generate an ssl certificate:

# cd /certs
# openssl req  -newkey rsa:4096 -nodes -sha256 -keyout /certs/domain.key  -x509 -days 365 -out /certs/domain.crt
Generating a 4096 bit RSA private key
.............................................................................................................................................................++
..........................................................++
writing new private key to '/certs/domain.key'
-----
You are about to be asked to enter information that will be incorporated
into your certificate request.
[..]
If you enter '.', the field will be left blank.
-----
Country Name (2 letter code) [XX]:
State or Province Name (full name) []:
Locality Name (eg, city) [Default City]:
Organization Name (eg, company) [Default Company Ltd]:
Organizational Unit Name (eg, section) []:
Common Name (eg, your name or your server's hostname) []:dockerengineppc64le.chmod666.org
Email Address []:

Copy the certificates on every docker engine host that will need to access the registry:

# mkdir /etc/docker/certs.d/dockerengineppc64le.chmod666.org\:5000/
# cp /certs/domain.crt /etc/docker/certs.d/dockerengineppc64le.chmod666.org\:5000/cat.crt
# cp /certs/domain.crt /etc/pki/ca-trust/source/anchors/dockerengineppc64le.chmod666.org.crt
# update-ca-trust

Restart docker:

# systemctl restart docker

Now that everything is ok regarding the image and the certificates, let’s now run the Docker container upload and download an image into the registry:

Run the container, expose the port 5000 (-p 5000:5000), be sure the registry will be started when docker start (–restart=always), let the container access the certificates we have created before (-v /certs:/certs), store the images in /var/lib/registry (-v /var/lib/registry:/var/lib/registry):

# docker run -d -p 5000:5000 --restart=always -v /certs:/certs -v /var/lib/registry:/var/lib/registry --name registry registry
51ad253616be336bcf5a1508bf48b059f01ebf20a0772b35b5686b4012600c46
# docker ps
CONTAINER ID        IMAGE               COMMAND                  CREATED             STATUS              PORTS                    NAMES
51ad253616be        registry            "./entrypoint.sh /etc"   10 seconds ago      Up 8 seconds        0.0.0.0:5000->5000/tcp   registry

Connect to the registry using docker login (user and login created before will be asked). Then pull and push an image to be sure everything is working. The only way to list the images available in the registry is to make a call to the registry api and check the catalog:

# docker login https://dockerengineppc64le.chmod666.org:5000
Username (regimguser): regimguser
Password:
Login Succeeded
# docker tag grafana dockerengineppc64le.chmod666.org:5000/ppc64le/grafana
# The push refers to a repository [dockerengineppc64le.chmod666.org:5000/ppc64le/grafana]
82bca1cb11d8: Pushed
9c1f2163c216: Pushing [==>                                                ] 22.83 MB/508.9 MB
1df85fc1eaaf: Mounted from ppc64le/ubuntucust
289bda1cbd14: Mounted from ppc64le/ubuntucust
9bca281924ab: Mounted from ppc64le/ubuntucust
8505832e8bea: Mounted from ppc64le/ubuntucust
625e647dc584: Mounted from ppc64le/ubuntucust
4fad21ac6351: Mounted from ppc64le/ubuntucust
[..]
atest: digest: sha256:88eef1b47ec57dd255aa489c8a494c11be17eb35ea98f38a63ab9f5690c26c1f size: 1984
# curl --cacert /certs/domain.crt -X GET https://regimguser:regimguser@dockerengineppc64le.chmod666.org:5000/v2/_catalog
{"repositories":["ppc64le/grafana","ppc64le/ubuntucust"]}
# docker pull dockerengineppc64le.chmod666.org:5000/ppc64le/grafana
Using default tag: latest
latest: Pulling from ppc64le/grafana
Digest: sha256:88eef1b47ec57dd255aa489c8a494c11be17eb35ea98f38a63ab9f5690c26c1f
Status: Image is up to date for dockerengineppc64le.chmod666.org:5000/ppc64le/grafana:latest

Running a more complex application (graphana + influxdb)

One of the application I’m running is grafana. This grafana is used with influxdb as a datasource. We will see here how to run grafana/influxdb in a docker containers running a ppc64le Redhat distribution:

Build the grafana docker image

First create the docker file. You now have seen a lot of dockerfiles in this blog post so I’ll not explain this to you in details. The docker engine is running on Redhat but the image used here is an Ubuntu one. Grafana and Influxdb are available in the Ubuntu repositories.

# cat /data/docker/grafana/dockerfile
FROM ppc64le/ubuntucust

RUN apt-get update && apt-get -y install grafana gosu

VOLUME ['/var/lib/grafana', '/var/log/grafana", "/etc/grafana']

EXPOSE 3000

COPY ./run.sh /run.sh

ENTRYPOINT ["/run.sh"]

Here is the entrypoint script that will run grafan when the docker container will start :

# cat /data/docker/grafana/run.sh
#!/bin/bash -e

: "${GF_PATHS_DATA:=/var/lib/grafana}"
: "${GF_PATHS_LOGS:=/var/log/grafana}"
: "${GF_PATHS_PLUGINS:=/var/lib/grafana/plugins}"

chown -R grafana:grafana "$GF_PATHS_DATA" "$GF_PATHS_LOGS"
chown -R grafana:grafana /etc/grafana

if [ ! -z "${GF_INSTALL_PLUGINS}" ]; then
  OLDIFS=$IFS
  IFS=','
  for plugin in ${GF_INSTALL_PLUGINS}; do
    grafana-cli plugins install ${plugin}
  done
  IFS=$OLDIFS
fi

exec gosu grafana /usr/sbin/grafana  \
  --homepath=/usr/share/grafana             \
  --config=/etc/grafana/grafana.ini         \
  cfg:default.paths.data="$GF_PATHS_DATA"   \
  cfg:default.paths.logs="$GF_PATHS_LOGS"   \
  cfg:default.paths.plugins="$GF_PATHS_PLUGINS"

Then build grafana image:

# cd /data/docker/grafana
# docker build -t grafana .
Step 3 : VOLUME ['/var/lib/grafana', '/var/log/grafana", "/etc/grafana']
 ---> Running in 7baf11e2a2b6
 ---> f3449dd17ad4
Removing intermediate container 7baf11e2a2b6
Step 4 : EXPOSE 3000
 ---> Running in 89e10b7bfa5e
 ---> cdc65141d2f4
Removing intermediate container 89e10b7bfa5e
Step 5 : COPY ./run.sh /run.sh
 ---> 0a75c203bc8e
Removing intermediate container 885719ef1fde
Step 6 : ENTRYPOINT /run.sh
 ---> Running in 56f8b7d1274a
 ---> 4ca5c23b9aba
Removing intermediate container 56f8b7d1274a
Successfully built 4ca5c23b9aba
# docker images
REPOSITORY           TAG                 IMAGE ID            CREATED             SIZE
grafana              latest              4ca5c23b9aba        32 seconds ago      676.8 MB
ppc64le/ubuntucust   latest              c9274707505e        12 minutes ago      167.9 MB
ppc64le/ubuntu       latest              1967d889e07f        3 months ago        167.9 MB

Run it and verify it works ok:

# docker run -d -it -p 443:3000 grafana
19bdd6c82a37a7275edc12e91668530fc1d52699542dae1e17901cce59f1230a
# docker ps
CONTAINER ID        IMAGE               COMMAND             CREATED             STATUS              PORTS                   NAMES
19bdd6c82a37        grafana             "/run.sh"           26 seconds ago      Up 24 seconds       0.0.0.0:443->3000/tcp   kickass_mcclintock
# docker logs 19bdd6c82a37
2017/02/17 15:28:36 [I] Starting Grafana
2017/02/17 15:28:36 [I] Version: master, Commit: NA, Build date: 1970-01-01 00:00:00 +0000 UTC
2017/02/17 15:28:36 [I] Configuration Info
Config files:
  [0]: /usr/share/grafana/conf/defaults.ini
  [1]: /etc/grafana/grafana.ini
Command lines overrides:
  [0]: default.paths.data=/var/lib/grafana
  [1]: default.paths.logs=/var/log/grafana
Paths:
  home: /usr/share/grafana
  data: /var/lib/grafana
[..]

Build the influxdb docker image

Same job for the influxdb image, this one is also based on the Ubuntu image. I’m here showing you the dockerfile (as always packages installation, volume, port exposition). I’m here also including a configuration file influxdb (you can see here I’m also including a configuration file for influxdb):

# cat /data/docker/influxdb/dockerfile
FROM ppc64le/ubuntucust

RUN apt-get update && apt-get -y install influxdb

VOLUME ['/var/lib/influxdb']

EXPOSE 8086 8083

COPY influxdb.conf /etc/influxdb.conf

COPY entrypoint.sh /entrypoint.sh
ENTRYPOINT ["/entrypoint.sh"]
CMD ["/usr/bin/influxd"]

# cat influxdb.conf
[meta]
  dir = "/var/lib/influxdb/meta"

[data]
  dir = "/var/lib/influxdb/data"
  engine = "tsm1"
  wal-dir = "/var/lib/influxdb/wal"

[admin]
  enabled = true

# cat entrypoint.sh
#!/bin/bash
set -e

if [ "${1:0:1}" = '-' ]; then
    set -- influxd "$@"
fi

exec "$@"

Then build influxdb image:

# docker build -t influxdb .
[..]
Step 3 : VOLUME ['/var/lib/influxdb']
 ---> Running in f3570a5a6c91
 ---> 014035e3134c
Removing intermediate container f3570a5a6c91
Step 4 : EXPOSE 8086 8083
 ---> Running in 590405701bfc
 ---> 25f557aae499
Removing intermediate container 590405701bfc
Step 5 : COPY influxdb.conf /etc/influxdb.conf
 ---> c58397a5ae7b
Removing intermediate container d22132ec9925
Step 6 : COPY entrypoint.sh /entrypoint.sh
 ---> 25e931d39bbc
Removing intermediate container 680eacd6597e
Step 7 : ENTRYPOINT /entrypoint.sh
 ---> Running in 0695135e81c0
 ---> 44ed7385ae61
Removing intermediate container 0695135e81c0
Step 8 : CMD /usr/bin/influxd
 ---> Running in f59cbcd5f199
 ---> 073eeeb78055
Removing intermediate container f59cbcd5f199
Successfully built 073eeeb78055
# docker images
REPOSITORY           TAG                 IMAGE ID            CREATED             SIZE
influxdb             latest              073eeeb78055        28 seconds ago      202.7 MB
grafana              latest              4ca5c23b9aba        11 minutes ago      676.8 MB
ppc64le/ubuntucust   latest              c9274707505e        23 minutes ago      167.9 MB
ppc64le/ubuntu       latest              1967d889e07f        3 months ago        167.9 MB

Run an influxdb container to verify it works ok:

# docker run -d -it -p 8080:8083 influxdb
c0c042c7bc1a361d1bcff403ed243651eac88270738cfc390e35dfd434cfc457
# docker ps
CONTAINER ID        IMAGE               COMMAND                  CREATED             STATUS              PORTS                    NAMES
c0c042c7bc1a        influxdb            "/entrypoint.sh /usr/"   4 seconds ago       Up 1 seconds        0.0.0.0:8080->8086/tcp   amazing_goldwasser
19bdd6c82a37        grafana             "/run.sh"                10 minutes ago      Up 10 minutes       0.0.0.0:443->3000/tcp    kickass_mcclintock
#  docker logs c0c042c7bc1a

 8888888           .d888 888                   8888888b.  888888b.
   888            d88P"  888                   888  "Y88b 888  "88b
   888            888    888                   888    888 888  .88P
   888   88888b.  888888 888 888  888 888  888 888    888 8888888K.
   888   888 "88b 888    888 888  888  Y8bd8P' 888    888 888  "Y88b
   888   888  888 888    888 888  888   X88K   888    888 888    888
   888   888  888 888    888 Y88b 888 .d8""8b. 888  .d88P 888   d88P
 8888888 888  888 888    888  "Y88888 888  888 8888888P"  8888888P"

2017/02/17 15:39:08 InfluxDB starting, version 0.10.0, branch unknown, commit unknown, built unknown
2017/02/17 15:39:08 Go version go1.6rc1, GOMAXPROCS set to 16

docker-compose

Now that we have two images on for grafana and one for influxdb lets make work them together. To do so we will user docker-compose. docker-compose allows to to describe the containers you want to run in a yml file and link them together. You can see below there are two different entries, one for influx db telling which image I’m going to use, the container name, the port that will be expose to the docker host (equivalent of -p 8080:8083 with a docker run command) and the volumes (-v with docker run command). For the grafana container everything is almost the same exception the “links” part. The grafana container should be able to “talk” to the influxdb one (to use influxdb as a datasource). The “links” stanza of the the yml file tells an entry containing the influxdb ip and name will be add in the /etc/hosts file of the grafana container. When you are going to configure grafana you will be able to use the “influxdb” name to access the database.:

# cat docker-compose.yml
influxdb:
  image: influxdb:latest
  container_name: influxdb
  ports:
    - "8080:8083"
    - "80:8086"
  volumes:
    - "/data/docker/influxdb/var/lib/influxdb:/var/lib/influxdb"

grafana:
  image: grafana:latest
  container_name: grafana
  ports:
    - "443:3000"
  links:
    - influxdb
  volumes:
    - "/data/docker/grafana/var/lib/grafana:/var/lib/grafana"
    - "/data/docker/grafana/var/log/grafana:/var/log/grafana"

To create the containers just run the “docker-compose up” (from the directory containing the yml file) command, this will create all the containers described in the yml file. Same for destroying them run a “docker-compose down.

# docker-compose up -d
Creating influxdb
Creating grafana
# docker ps
ONTAINER ID        IMAGE               COMMAND                  CREATED              STATUS              PORTS                                                    NAMES
5df7f3d58631        grafana:latest      "/run.sh"                About a minute ago   Up About a minute   0.0.0.0:443->3000/tcp                                    grafana
727dfc6763e1        influxdb:latest     "/entrypoint.sh /usr/"   About a minute ago   Up About a minute   8083/tcp, 0.0.0.0:80->8086/tcp, 0.0.0.0:8080->8086/tcp   influxdb
# docker-compose down
Stopping grafana ... done
Stopping influxdb ... done
Removing grafana ... done
Removing influxdb ... done

Just to prove you everything is working ok I’m logging inside the influxdb container and pushing some data to the database using the NOAA_data.txt file provided by influxdb guy (these are just test data).

# docker exec -it 15845e92152f /bin/bash
# apt-get install influxdb-client
# cd /var/lib/influxdb ; influx -import -path=NOAA_data.txt -precision=s
2017/02/17 17:00:35 Processed 1 commands
2017/02/17 17:00:35 Processed 76290 inserts
2017/02/17 17:00:35 Failed 0 inserts
I'm finally logging into the grafana (from a browser) and configuring the access to the database. I can create graphs based on the data just after doing this.

Creating a swarm cluster

Be very careful when starting with swarm. There are 2 different type of “swarm”. The swarm before docker 1.12 (called docker-swarm) and the swarm starting from docker 1.12 (call swarm mode). As the first version of swarm is already deprecated we will here use the swarm more embedded with docker 1.12. In this case no need to install additional software the swarm more is embedded with the docker binaries. The swarm mode can be used with the “docker service” commands to create what we call services (multiple docker-containers running across the swarm cluster with rules/constraints applied on them (create the containers on all the hosts, only on a couple of node and so on). First initialize the swarm mode on the machines (I’ll only use two nodes in my swarm cluster in the examples below) and all the worker nodes be sure you are logged in the registry (certificates are copied, docker login was done):

We will setup the swarm cluster on two nodes just to show you a simple example of the power of this technology. The first step is to choose a leader (there is one leader among the managers and the manager leader is responsible for the orchestration and the management of the swarm cluster) (if the leader has an issue one of the manager will take the lead) and a worker (you can have as many workers as you want in the swarm cluster). In the example below the manager/leader will be called (node1(manager)#) and the worker will be called (node2(worker)#). User the “docker swarm init” command to create your leader. The advertise address is the public address of the machine. The command will give you the commands to launch on the other managers or worker to allow them to join the cluster. Be sure the port tcp 2377 is reachable from all the nodes to the leader/managers. Last thing to add: swarm services rely on a overlay network, you need to createit to be able to create your swarm services:

node1(manager)# docker swarm init --advertise-addr 10.10.10.49
Swarm initialized: current node (813ompnl4c7f4ilkxqy0faj59) is now a manager.

To add a worker to this swarm, run the following command:
    docker swarm join \
    --token SWMTKN-1-69tw66gb9jwfl8y46ujeemj3p5v85ikrqvwmqzb2x32kqmek8e-a9dv25loilaor6jfmcdq8je6h \
    10.10.10.49:2377

To add a manager to this swarm, run the following command:
    docker swarm join \
    --token SWMTKN-1-69tw66gb9jwfl8y46ujeemj3p5v85ikrqvwmqzb2x32kqmek8e-9e82z5k7qrzxsk2autu9ajt3r \
    10.10.10.49:2377
node1(manager)# docker node ls
ID                           HOSTNAME                   STATUS  AVAILABILITY  MANAGER STATUS
813ompnl4c7f4ilkxqy0faj59 *  swarm1.chmod666.org  Ready   Active        Leader
node1(manager)# docker network create -d overlay mynet
8mv5ydu9vokx
node1(amanger)# docker network ls
8mv5ydu9vokx        mynet               overlay             swarm

On the worker node run the command to join the cluster and verify all the nodes are Ready and Active. This will mean that you are ready to use the swarm cluster:

node2(worker)# docker swarm join --token SWMTKN-1-69tw66gb9jwfl8y46ujeemj3p5v85ikrqvwmqzb2x32kqmek8e-a9dv25loilaor6jfmcdq8je6h 10.10.10.49:2377
This node joined a swarm as a worker.
node1(manager)# docker node ls
ID                           HOSTNAME                   STATUS  AVAILABILITY  MANAGER STATUS
813ompnl4c7f4ilkxqy0faj59 *  swarm1.chmod666.org        Ready   Active        Leader
bh7mhv3hg1x98b9j6lu00c3ef    swarm2.chmod666.org        Ready   Active

The cluster is up and ready. Before working with it we need to find a solution to share the data of our application among the cluster. The best solution (from my point of view) is to use gluster, but for the convenience of this blog post I’ll just create a small nfs server on the leader node and mount the data on the worker node (for a production server the nfs server should be externalized (mounted from a NAS server)):

node1(manager)# exportfs
# exportfs
/nfs            
node2(worker)# mount | grep nfs
mount | grep nfs
[..]
swarm1.chmod666.org:/nfs on /nfs type nfs4 (rw,relatime,vers=4.0,rsize=524288,wsize=524288,namlen=255,hard,proto=tcp,port=0,timeo=600,retrans=2,sec=sys,clientaddr=10.10.10.48,local_lock=none,addr=10.10.10.49)

Running an appliction in a swarm cluster

We now have the swarm cluster ready to run some services but we first need a service. I’ll use a web application created by myself called whoami (inspired by the emilevauge/whoami application), just displaying the hostname and the ip address of the node running the service). I’m first creating a dockerfile allowing me to create container image ready to run any cgi ksh scripts. The dockerfile is copying a configuration file in /etc/httpd/conf.d and serving the files in /var/www/mysite on an /whoami/ alias:

# cd /data/dockerfile/httpd
# cat dockerfile
FROM swarm1.chmod666.org:5000/ppc64le/rhel72:latest

RUN yum -y install httpd
RUN mkdir /var/www/mysite && chown apache:apache /var/www/mysite

EXPOSE 80

COPY ./mysite.conf /etc/httpd/conf.d
VOLUME ['/var/www/html', '/var/www/mysite']

CMD [ "-D", "FOREGROUND" ]
ENTRYPOINT ["/usr/sbin/httpd"]
# cat mysite.conf
Alias /whoami/ "/var/www/mysite/"

  AddHandler cgi-script .ksh
  DirectoryIndex whoami.ksh
  Options Indexes FollowSymLinks ExecCGI
  AllowOverride None
  Require all granted

I’m then building the image and pushing it into my private registry. The image is now available for download on any node of the swarm cluster:

Sending build context to Docker daemon 3.072 kB
Step 1 : FROM dockerengineppc64le.chmod666.org:5000/ppc64le/rhel72:latest
 ---> 9005cbc9c7f6
Step 2 : RUN yum -y install httpd
 ---> Using cache
 ---> 1bc91df747cd
[..]
 ---> Using cache
 ---> afb3cf77eb8a
Step 8 : ENTRYPOINT /usr/sbin/httpd
 ---> Using cache
 ---> 187da163e084
Successfully built 187da163e084
# docker tag httpd swarm1.chmod666.org:5000/ppc64le/httpd
# docker push swarm1.chmod666.org:5000/ppc64le/httpd
The push refers to a repository [swarm1.chmod666.org:5000/ppc64le/httpd]
92d958e708cc: Layer already exists
[..]
latest: digest: sha256:3b1521432c9704ca74707cd2f3c77fb342a957c919787efe9920f62a26b69e26 size: 1156

Now that the image is ready we will create the application, it’s just a single ksh script and a css file.

# ls /nfs/docker/whoami/
table-responsive.css  whoami.ksh
# cat whoami.ksh
#!/usr/bin/bash

hostname=$(hostname)
uname=$(uname -a)
ip=$(hostname -I)
date=$(date)
env=$(env)
echo ""
echo "<html>"
echo "<head>"
echo "  <title>Docker exemple</title>"
echo "  <link href="table-responsive.css" media="screen" type="text/css" rel="stylesheet" />"
echo "</head>"
echo "<body>"
echo "<h1><span class="blue"><<span>Docker<span class="blue"><span> <span class="yellow">on PowerSystems ppc64le</pan></h1>"
echo "<h2>Created with passion by <a href="http://chmod666.org" target="_blank">chmod666.org</a></h2>"
echo "<table class="container">"
echo "  <thead>"
echo "    <tr>"
echo "      <th><h1>type</h1></th>"
echo "      <th><h1>value</h1></th>"
echo "    </tr>"
echo "  </thead>"
echo "  <tbody>"
echo "    <tr>"
echo "      <td>hostname</td>"
echo "      <td>${hostname}</td>"
echo "    </tr>"
echo "    <tr>"
echo "      <td>uname</td>"
echo "      <td>${uname}</td>"
echo "    </tr>"
echo "    <tr>"
echo "      <td>ip</td>"
echo "      <td>${ip}</td>"
echo "    </tr>"
echo "    <tr>"
echo "      <td>date</td>"
echo "      <td>${date}</td>"
echo "    </tr>"
echo "    <tr>"
echo "      <td>httpd env</td>"
echo "      <td>SERVER_SOFTWARE:${SERVER_SOFTWARE},SERVER_NAME:${SERVER_NAME},SERVER_PROTOCOL:${SERVER_PROTOCOL}</td>"
echo "    </tr>"
echo "  </tbody>"
echo "</table>"
echo "  </tbody>"
echo "</table>"
echo "</body>"
echo "</html>"

Just to be sure the web application is working run this image on the worker node (without swarm):

# docker run -d -p 80:80 -v /nfs/docker/whoami/:/var/www/mysite --name httpd swarm1.chmod666.org:5000/ppc64le/httpd a75095b23bc31715ac95d9bb57a7a161b06ef3e6a0f4eb4ed708cf60d03c0e5d # curl localhost/whoami/ [..] hostname a75095b23bc3 uname Linux a75095b23bc3 3.10.0-327.el7.ppc64le #1 SMP Thu Oct 29 17:31:13 EDT 2015 ppc64le ppc64le ppc64le GNU/Linux ip 172.17.0.2 date Wed Feb 22 14:33:59 UTC 2017 # docker rm a75095b23bc3 -f a75095b23bc3

We are now ready to create a swarm service with our application. Verify the swarm cluster health and create a service in global mode. The global mode means swarm will create one docker container per node.

node1(manager)# docker node ls
ID                           HOSTNAME                   STATUS  AVAILABILITY  MANAGER STATUS
813ompnl4c7f4ilkxqy0faj59 *  swarm1.chmod666.org        Ready   Active        Leader
bh7mhv3hg1x98b9j6lu00c3ef    swarm2.chmod666.org        Ready   Active
node1(manager)# docker service create --name whoami --mount type=bind,source=/nfs/docker/whoami/,destination=/var/www/mysite --mode global --publish 80:80 --network mynet  swarm1.chmod666.org:5000/ppc64le/httpd
7l8c4stcl3zgiijf6oe2hvu1r
node1(manager) # docker service ls
ID            NAME    REPLICAS  IMAGE                                         COMMAND
7l8c4stcl3zg  whoami  global    swarm1.chmod666.org:5000/ppc64le/httpd

Verify there is one container available on each swarm node:

node1(manager) docker service ps 7l8c4stcl3zg  
docker service ps 7l8c4stcl3zg
ID                         NAME        IMAGE                                         NODE                       DESIRED STATE  CURRENT STATE          ERROR
2sa543un5v4hpvwgouyorhndm  whoami      swarm1.chmod666.org:5000/ppc64le/httpd        swarm1.chmod666.org        Running        Running 2 minutes ago
5061eogr8wimt9al6uss1wet2   \_ whoami  swarm2.chmod666.org:5000/ppc64le/httpd        swarm2.chmod666.org        Running        Running 2 minutes ago

I’m now accessing the webservice with both dns (swarm1 and swarm2) and I’m verifying I’m accessing a different container each time I’m doing an http resquest:

When access swarm1.chmod666.org (I’m seeing a docker hostname and ip)

When access swarm2.chmod666.org (I’m seeing a docker hostname and ip different that the first one)

You will now say: Ok that great !. But that’s not “redundant”. In fact it is because swarm is embedded with a very cool feature call swarm mesh routing. When you create service in the swarm cluster with the –publish option each swarm node will listen on this port, even the node on which the docker containers are not running, if you access any node on this port you will reach the container, by this I mean by accessing swarm1.chmod666.org you may reach the container running on swarm2.chmod666.org. When you will make another http request you can reach any of the containers running for this service. Let’s try creating a service with 10 replicas and access the same node over and over again.

node1(manager)# docker service create --name whoami --mount type=bind,source=/nfs/docker/whoami/,destination=/var/www/mysite --replicas 10 --publish 80:80 --network mynet  swarm1.chmod666.org:5000/ppc64le/httpd
el7nyiuga1vxtfgzktpfahucw
node1(manager)# docker service ls
ID            NAME    REPLICAS  IMAGE                                         COMMAND
el7nyiuga1vx  whoami  10/10     swarm1.chmod666.org:5000/ppc64le/httpd
node2(worker)# docker service ps el7nyiuga1vx
ID                         NAME          IMAGE                                         NODE                       DESIRED STATE  CURRENT STATE                ERROR
bed84pmdjy6c0758g3r52mmsq  whoami.1      swarm1.chmod666.org:5000/ppc64le/httpd        swarm2.chmod666.org        Running        Running 46 seconds ago
dgdj4ygqdr476e156osk8dd95  whoami.2      swarm1.chmod666.org:5000/ppc64le/httpd        swarm2.chmod666.org        Running        Running 46 seconds ago  
ba2ni51fo96eo6c4qfir90t7q  whoami.3      swarm1.chmod666.org:5000/ppc64le/httpd        swarm2.chmod666.org        Running        Running 48 seconds ago
9qkwigxkrqje48do39ru3cv2h  whoami.4      swarm1.chmod666.org:5000/ppc64le/httpd        swarm2.chmod666.org        Running        Running 40 seconds ago
3hgwwdly23ovafv1g0jvegu16  whoami.5      swarm1.chmod666.org:5000/ppc64le/httpd        swarm2.chmod666.org        Running        Running 43 seconds ago
0f3y844yqfbll2lmb954ro3cy  whoami.6      swarm1.chmod666.org:5000/ppc64le/httpd        swarm1.chmod666.org        Running        Running 51 seconds ago
0955dz84rv4gpb4oqv8libahd  whoami.7      swarm1.chmod666.org:5000/ppc64le/httpd        swarm1.chmod666.org        Running        Running 42 seconds ago
c05hrs9h0mm6ghxxdxc1afco9  whoami.8      swarm1.chmod666.org:5000/ppc64le/httpd        swarm1.chmod666.org        Running        Running 50 seconds ago
03qcbiuxlk13p60we0ke6vqka  whoami.9      swarm1.chmod666.org:5000/ppc64le/httpd        swarm1.chmod666.org        Running        Running 54 seconds ago
0otgw4ncka81hlxgyt82z36zj  whoami.10     swarm1.chmod666.org:5000/ppc64le/httpd        swarm1.chmod666.org        Running        Running 48 seconds ago
node1(manager)# docker ps
# docker ps
CONTAINER ID        IMAGE                                                 COMMAND                  CREATED             STATUS              PORTS                    NAMES
a25404371765        swarm1.chmod666.org:5000/ppc64le/httpd:latest   "/usr/sbin/httpd -D F"   5 minutes ago       Up 4 minutes        80/tcp                   whoami.7.0955dz84rv4gpb4oqv8libahd
07c38a306a68        swarm1.chmod666.org:5000/ppc64le/httpd:latest   "/usr/sbin/httpd -D F"   5 minutes ago       Up 4 minutes        80/tcp                   whoami.4.9qkwigxkrqje48do39ru3cv2h
e88a8c8a3639        swarm1.chmod666.org:5000/ppc64le/httpd:latest   "/usr/sbin/httpd -D F"   5 minutes ago       Up 5 minutes        80/tcp                   whoami.8.c05hrs9h0mm6ghxxdxc1afco9
f73a84cc6622        swarm1.chmod666.org:5000/ppc64le/httpd:latest   "/usr/sbin/httpd -D F"   5 minutes ago       Up 5 minutes        80/tcp                   whoami.1.bed84pmdjy6c0758g3r52mmsq
757be5ec73a4        swarm1.chmod666.org:5000/ppc64le/httpd:latest   "/usr/sbin/httpd -D F"   5 minutes ago       Up 5 minutes        80/tcp                   whoami.3.ba2ni51fo96eo6c4qfir90t7q
51ad253616be        registry                                              "./entrypoint.sh /etc"   45 hours ago        Up 2 hours          0.0.0.0:5000->5000/tcp   registry
node2(worker)# docker ps
# docker ps
CONTAINER ID        IMAGE                                                 COMMAND                  CREATED             STATUS              PORTS               NAMES
f015b0da7f2e        swarm1.chmod666.org:5000/ppc64le/httpd:latest   "/usr/sbin/httpd -D F"   6 minutes ago       Up 5 minutes        80/tcp              whoami.5.3hgwwdly23ovafv1g0jvegu16
4b7452245406        swarm1.chmod666.org:5000/ppc64le/httpd:latest   "/usr/sbin/httpd -D F"   6 minutes ago       Up 5 minutes        80/tcp              whoami.10.0otgw4ncka81hlxgyt82z36zj
71722a2d7f38        swarm1.chmod666.org:5000/ppc64le/httpd:latest   "/usr/sbin/httpd -D F"   6 minutes ago       Up 5 minutes        80/tcp              whoami.6.0f3y844yqfbll2lmb954ro3cy
01bc73d6fdf7        swarm1.chmod666.org:5000/ppc64le/httpd:latest   "/usr/sbin/httpd -D F"   6 minutes ago       Up 5 minutes        80/tcp              whoami.9.03qcbiuxlk13p60we0ke6vqka
438c0d553550        swarm1.chmod666.org:5000/ppc64le/httpd:latest   "/usr/sbin/httpd -D F"   6 minutes ago       Up 5 minutes        80/tcp              whoami.2.dgdj4ygqdr476e156osk8dd95

Let’s now try accessing the service. I’m modifying my whoami.ksh just to print the information I need (the hostname).

cat /nfs/docker/whoami/whoami.ksh #!/usr/bin/bash hostname=$(hostname) uname=$(uname -a) ip=$(hostname -I) date=$(date) env=$(env) echo "" echo "hostname: ${hostname}" echo "ip: ${ip}" echo "uname:${uname}" # for i in $(seq 1 10) ; do for i in $(seq 1 10) ; do echo "[CALL $1]" ; curl -s http://swarm1.chmod666.org/whoami/ ; done [CALL ] hostname: f015b0da7f2e ip: 10.255.0.14 10.255.0.2 172.18.0.7 10.0.0.12 10.0.0.2 uname:Linux f015b0da7f2e 3.10.0-327.el7.ppc64le #1 SMP Thu Oct 29 17:31:13 EDT 2015 ppc64le ppc64le ppc64le GNU/Linux [CALL ] hostname: 4b7452245406 ip: 10.255.0.11 10.255.0.2 172.18.0.6 10.0.0.9 10.0.0.2 uname:Linux 4b7452245406 3.10.0-327.el7.ppc64le #1 SMP Thu Oct 29 17:31:13 EDT 2015 ppc64le ppc64le ppc64le GNU/Linux [CALL ] hostname: 438c0d553550 ip: 10.0.0.5 10.0.0.2 172.18.0.4 10.255.0.7 10.255.0.2 uname:Linux 438c0d553550 3.10.0-327.el7.ppc64le #1 SMP Thu Oct 29 17:31:13 EDT 2015 ppc64le ppc64le ppc64le GNU/Linux [CALL ] hostname: 71722a2d7f38 ip: 10.255.0.10 10.255.0.2 172.18.0.5 10.0.0.8 10.0.0.2 uname:Linux 71722a2d7f38 3.10.0-327.el7.ppc64le #1 SMP Thu Oct 29 17:31:13 EDT 2015 ppc64le ppc64le ppc64le GNU/Linux [CALL ] hostname: 01bc73d6fdf7 ip: 10.255.0.6 10.255.0.2 172.18.0.3 10.0.0.4 10.0.0.2 uname:Linux 01bc73d6fdf7 3.10.0-327.el7.ppc64le #1 SMP Thu Oct 29 17:31:13 EDT 2015 ppc64le ppc64le ppc64le GNU/Linux [CALL ] hostname: a25404371765 ip: 10.255.0.9 10.255.0.2 172.18.0.7 10.0.0.7 10.0.0.2 uname:Linux a25404371765 3.10.0-327.el7.ppc64le #1 SMP Thu Oct 29 17:31:13 EDT 2015 ppc64le ppc64le ppc64le GNU/Linux [CALL ] hostname: 07c38a306a68 ip: 10.255.0.8 10.255.0.2 172.18.0.6 10.0.0.6 10.0.0.2 uname:Linux 07c38a306a68 3.10.0-327.el7.ppc64le #1 SMP Thu Oct 29 17:31:13 EDT 2015 ppc64le ppc64le ppc64le GNU/Linux [CALL ] hostname: e88a8c8a3639 ip: 10.255.0.4 10.255.0.2 172.18.0.5 10.0.0.3 10.0.0.2 uname:Linux e88a8c8a3639 3.10.0-327.el7.ppc64le #1 SMP Thu Oct 29 17:31:13 EDT 2015 ppc64le ppc64le ppc64le GNU/Linux [CALL ] hostname: f73a84cc6622 ip: 10.255.0.12 10.255.0.2 172.18.0.4 10.0.0.10 10.0.0.2 uname:Linux f73a84cc6622 3.10.0-327.el7.ppc64le #1 SMP Thu Oct 29 17:31:13 EDT 2015 ppc64le ppc64le ppc64le GNU/Linux [CALL ] hostname: 757be5ec73a4 ip: 10.255.0.13 10.255.0.2 172.18.0.3 10.0.0.11 10.0.0.2 uname:Linux 757be5ec73a4 3.10.0-327.el7.ppc64le #1 SMP Thu Oct 29 17:31:13 EDT 2015 ppc64le ppc64le ppc64le GNU/Linux

I'm here doing ten calls and I'm seeing that I'm reaching a different docker container on each call, I can see that by checking the hostname. It shows you that the routing mesh is correctly working.

HAproxy

To access the service for a single ip I’m installing an haproxy server on aonther host (an Ubuntu ppc64le host). I’m then modifying the configuration file my swarm nodes. The haproxy will check for the accessibility of the web application and will round robin the request between the two docker host. If one of the docker swarm node is failing all requests will be send to the remaining alive node.

# apt-get install haproxy
apt-get install haproxy
Reading package lists... Done
Building dependency tree
Reading state information... Done
The following additional packages will be installed:
  liblua5.3-0
Suggested packages:
[..]
Setting up haproxy (1.6.3-1ubuntu0.1) ...
Processing triggers for libc-bin (2.23-0ubuntu5) ...
Processing triggers for systemd (229-4ubuntu13) ...
Processing triggers for ureadahead (0.100.0-19) ...
# cat /etc/haproxy.conf
frontend http_front
   bind *:80
   stats uri /haproxy?stats
   default_backend http_back

backend http_back
   balance roundrobin
   server swarm1.chmod666.org 10.10.10.48:80 check
   server swarm2.chmod666.org 10.10.10.49:80 check

I’m again changing the whoami.sh script just to print the hostname. Then from another host I’m running 10000 http request on the public ip of my haproxy server. I’m then counting how many request per containers were done. By doing this we can see two things. The haproxy service is correctly spreading the requests across each swarm nodev (I’m reaching ten different containers). The swarm mesh routing is working ok: all the request are almost equally spread among all the running containers. You can see the sessions spread in the haproxy stats page and in the curl example:

# /nfs/docker/whoami/whoami.sh
#!/usr/bin/bash

hostname=$(hostname)
uname=$(uname -a)
ip=$(hostname -I)
date=$(date)
env=$(env)
echo ""
echo "${hostname}"
# for i in $(seq 1 10000) ; do curl -s http://10.10.10.50/whoami/ ; done  | sort | uniq -c
    999 01bc73d6fdf7
   1003 07c38a306a68
    993 438c0d553550
    998 4b7452245406
   1006 71722a2d7f38
    996 757be5ec73a4
   1004 a25404371765
   1004 e88a8c8a3639
    995 f015b0da7f2e
   1002 f73a84cc6622

I’m finally shutting down one of the worker nodes. We can also see two things here. The service is created with 10 replicas. When can see here that shutting down one done results in the creation of 5 more containers on the other node. By checking the haproxy stats page we also see that one node is detected down and all the request will be send to the remaining one. We have our high available docker service (to be totally redundant we also need to be sure the haproxy is running on two different host with a “floating” ip (I’ll not explain this here):

# docker ps
CONTAINER ID        IMAGE                                                 COMMAND                  CREATED              STATUS              PORTS                    NAMES
82fe21465b96        swarm1.chmod666.org:5000/ppc64le/httpd:latest   "/usr/sbin/httpd -D F"   About a minute ago   Up 29 seconds       80/tcp                   whoami.5.2d0t99pjide4w7nenzrribjph
71a4c51460ef        swarm1.chmod666.org:5000/ppc64le/httpd:latest   "/usr/sbin/httpd -D F"   About a minute ago   Up 21 seconds       80/tcp                   whoami.9.5f9qkx6t47vvjt8b9k5jhj79h
5830f0696cca        swarm1.chmod666.org:5000/ppc64le/httpd:latest   "/usr/sbin/httpd -D F"   About a minute ago   Up 32 seconds       80/tcp                   whoami.6.eso8uwhx6ij2we2iabmzx3tdu
dbc2b731c547        swarm1.chmod666.org:5000/ppc64le/httpd:latest   "/usr/sbin/httpd -D F"   About a minute ago   Up 16 seconds       80/tcp                   whoami.2.8tc8zoxrpdell4f4d8zsr0rlw
050aacdf8126        swarm1.chmod666.org:5000/ppc64le/httpd:latest   "/usr/sbin/httpd -D F"   About a minute ago   Up 23 seconds       80/tcp                   whoami.10.ej8ahxzzp8bw3pybc6fib17qh
a25404371765        swarm1.chmod666.org:5000/ppc64le/httpd:latest   "/usr/sbin/httpd -D F"   2 hours ago          Up 2 hours          80/tcp                   whoami.7.0955dz84rv4gpb4oqv8libahd
07c38a306a68        swarm1.chmod666.org:5000/ppc64le/httpd:latest   "/usr/sbin/httpd -D F"   2 hours ago          Up 2 hours          80/tcp                   whoami.4.9qkwigxkrqje48do39ru3cv2h
e88a8c8a3639        swarm1.chmod666.org:5000/ppc64le/httpd:latest   "/usr/sbin/httpd -D F"   2 hours ago          Up 2 hours          80/tcp                   whoami.8.c05hrs9h0mm6ghxxdxc1afco9
f73a84cc6622        swarm1.chmod666.org:5000/ppc64le/httpd:latest   "/usr/sbin/httpd -D F"   2 hours ago          Up 2 hours          80/tcp                   whoami.1.bed84pmdjy6c0758g3r52mmsq
757be5ec73a4        swarm1.chmod666.org:5000/ppc64le/httpd:latest   "/usr/sbin/httpd -D F"   2 hours ago          Up 2 hours          80/tcp                   whoami.3.ba2ni51fo96eo6c4qfir90t7q
51ad253616be        registry                                              "./entrypoint.sh /etc"   2 days ago           Up 4 hours          0.0.0.0:5000->5000/tcp   registry

Conclusion

What we have reviewed in this blog post is pretty. The PowerSystem ecosystem is capable of doing the exact same thing as the x86 one. Everything here is proved. Powersystems are definitly ready to run Linux. The mighty RedHat and the incredible Ubuntu both provides a viable way to enter the world of DevOps on PowerSystems. We don’t need anymore to recompile everything or search for this or that package not available on Linux. The Ubuntu repository is huge. I was super impressed by the variety of packages available that are running on Power. A few days ago RedHat finally joined the OpenPower foundation and I can assure you that this is a big news. Maybe people are still not believing in the spreading of PowerSystems but things are slowy changing and with the first OpenPower servers running on Power9 I can assure you (at least I want to believe) that things will change. Regarding Docker I was/am a big x86 user of the solution, I’m running the blog and all my “personal” services on Docker and I have to recognize that ppc64le Linux distributions provides the exact same value as the x86. Hire me if you want to do such things (DevOps on Power). They’ll probably don’t want to do anything about Linux On Power in my company (I still have the faith as we have purchased 120 pairs of power sockets of Redhat ppc64le ;-) ;-) ).

Last words: sorry for not publishing more blog posts these days but I’m not living the best part of my life at work (nobody cares about what I’m doing, I’m just nothing …) and personally (different health problems for me and the people I love). Please accept my apologizes.

↧

Building a “Docker Swarm as a Service” infrastructure on ppc64le Linux (w/ Ansible & Openstack ARA, Terraform, GlusterFS, Traefik and Portainer)

April 25, 2017, 9:51 am

≪ Previous: Running Docker on PowerSystems using Ubuntu and Redhat ppc64le (docker, registry, compose and swarm with haproxy)

First of all to avoid any ambiguity I first have to say that none of this work is the result of something that have been asked to me on my work place. This piece of work was done during my after hours work (this means mostly at home, and mostly on my home server (couple of things were done on my work lab for some obvious reasons)).
After working on Docker and Swarm I realized something: just having a Swarm cluster (with mesh routing) is not enough to provide to customers a simple and viable way to use Docker. That’s why I wanted to create a “Swarm as a Service” infrastructure providing enterprise class products to my customers. Then after finding all those products and after knowing how they work I wanted to automate the creation of the Swarm Cluster and all the services associated to it.

I warn you that this blog post is very long. Take your time to read it part by part and even if you do not understand what I’m trying to say at the beginning you will finally see my point of view at the end of the post. It may looks a little bit unclear but I want you to glean the information you want and to trash the ones that you don’t care about.. Not everything will fit your needs or match your environment that’s certain. Last advice: while reading the post, check the Ansible playbook on my github page, you will see in this one all the steps described here.

On my previous blog post the storage part was achieved by using NFS. After looking on the web and searching over and over about the best solution to share data among a Docker Swarm Cluster I finally come to the conclusion that the RedHat GlusterFS was the best solution to have a reliable way to have a replicated storage across all the nodes of my Swarm Cluster. GlusterFS just combine two things I love: Simplicity and Reliablity. Unfortunately for us you’ll see below when I’ll talk about details that I didn’t found an up to date version of GlusterFS running on the Power Systems (ie. any ppc64le system). The conclusion today if you want to do that is to use an old version of GlusterFS. No worries this version does the job perfectly.

The second thing I realized is that this infrastructure has to be dynamic. By this I mean being able to reconfigure some services (a proxy) on the fly when you decide to stop or create a service (a bunch of containers) in the Swarm cluster. To achieve this you need an intelligent proxy capable of listening the Swarm Discovery Service and modify your proxy rules at the deletion/creation time. I was already a Traefik user on x86 but I decided to check every possible solution and realized that the only second viable to achieve that was to use Dockerflow (based on haproxy). Unfortunately (or fortunately) for me I never succeeded in compiling Dockerflow on ppc64le, the second solution was to use Traefik. The job was easier on this one and after struggling a few days I had a working version of Traefik running on ppc64le. The image was ready. Next I had to found a way to be autonomous to avoid working with my network team. A solution is to ask them to create a DNS “wildcard” on a domain (for instance *.docker.chmod666.org). As Traefik is a wonderful tool it allows you to configure the reverse proxy with what I call context path (for instance http://www.chmod666.org/mynewservice). By doing this I’m able to be fully autonomous even if the network team does not allow me a DNS “wildcard”.

Next I wanted to be able to recreate the infrastructure again and again and again. Not just being able to recreate it from scratch but also to make it “scalable” by adding or removing Swarm nodes on the fly (to be honest here I have just coded the “scale up” part and not the “scale down” one ;-)). This part is achieved by two products: obviously Ansible to setup the software part inside the vm and Hashicrop Terraform to be able to drive my Openstack cloud (obviously PowerVC on PowerSystems).

Being able to run any configuration tools is a good thing, being able to track down every changes and store them somewhere is even better. To be able to have such a functionality I’m using a tool called Openstack ARA (Ansible Runbook Analysis) which is a good way to keep a trace of every playbook run in you infrastructure. Having ARA in an Ansible infrastructure is now from me something mandatory. Even if you are an Ansible Tower user, having ARA is a good way to track every playbook run. The web interface provided by ARA is simple and efficient. I like the way it is. Just a single tool doing a single job at it best.

Finally to be able to let everybody (even noobs, non super tech guys, or just people who does not have the time to learn everything) I wanted to have a web ui capable of driving my Docker infrastructure. A fantastic tool called Portainer was created by a couple of folks. I -one more time- had to recompile the entire product for ppc64le and recreate my own Docker images.

Before telling and explaining you all the details if you do not want to do all these steps by hand you can use my Docker images for ppc64le available on the official dockerhub, for the Ansible playbooks all of my work is available on Github. Last words for the introduction I know this post will be huge, take you time to read it (for instance on part per day). I just wanted to let you know this post is the end of a two months working on the project. Let me also thank the fantastic guys of the Opensource Community who gave me help without counting the time they have to spend on it (for ARA: @dmsimard , for Portainer: @Tony_Lapenna, for Traefik: timoreimann).

The Docker images are available on my Docker hub here: https://hub.docker.com/r/chmod666/
The Ansible Playbook for “Swarm as a Service”: https://github.com/chmod666org/swarm_as_a_service_ppc64le
The Ansible and Terrafrom example are directly available in the blog post

GlusterFS for data sharing among all the cluster nodes

In my previous blog post about Docker NFS was used to share the data among all the nodes of the Swarm cluster. I decided to give a try to GlusterFS. The first reason is because I think the solution is pretty simple to setup. Second reason is that GlusterFS is a replicated storage solution (ie. each disks are local to each machines, you then create files with a chosen number of copy (for instance two)) allowing to avoid sharing any disks across all the cluster nodes (much simpler for the Ansible automation that we will use to automate all of this). Neat. Here is what I have do to run GlusterFS on my ppc64le systems:

The only version I found for ppc64le is available here https://download.gluster.org/pub/gluster/glusterfs/3.7/3.7.20/RHEL/epel-7.3/ppc64le/. You can retrieve all the rpm file using a “wget –mirror” command. I have personnally created my own repository for GlusterFS. (I assume here you are smart enough to do this by yourself):

# cat /etc/yum/yum.repos.d/gluster.repo
[gluster-ppc64le]
name=gluster-ppc64le
baseurl=http://respository.chmod666.org:8080/gluster-ppc64le/
enabled=1
gpgcheck=0

Install the needed packages on all the nodes.

# yum -y install glusterfs-server userspace-rcu parted
Loaded plugins: product-id, search-disabled-repos, subscription-manager
This system is not registered to Red Hat Subscription Management. You can use subscription-manager to register.
Package userspace-rcu-0.7.16-1.el7.ppc64le already installed and latest version
Package parted-3.1-28.el7.ppc64le already installed and latest version
Resolving Dependencies
[..]
(1/4): glusterfs-api-3.7.20-1.el7.ppc64le.rpm                                                                                                                                                                                                            |  87 kB  00:00:00
(2/4): glusterfs-3.7.20-1.el7.ppc64le.rpm                                                                                                                                                                                                                | 460 kB  00:00:00
(3/4): glusterfs-fuse-3.7.20-1.el7.ppc64le.rpm                                                                                                                                                                                                           | 127 kB  00:00:00
(4/4): glusterfs-server-3.7.20-1.el7.ppc64le.rpm                                                                                                                                                                                                         | 1.3 MB  00:00:00
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Total                                                                                                                                                                                                                                            14 MB/s | 2.0 MB  00:00:00
Running transaction check
Running transaction test
Transaction test succeeded
Running transaction
  Installing : glusterfs-3.7.20-1.el7.ppc64le                                                                                                                                                                                                                               1/4
  Installing : glusterfs-fuse-3.7.20-1.el7.ppc64le                                                                                                                                                                                                                          2/4
  Installing : glusterfs-api-3.7.20-1.el7.ppc64le                                                                                                                                                                                                                           3/4
  Installing : glusterfs-server-3.7.20-1.el7.ppc64le                                                                                                                                                                                                                        4/4
  Verifying  : glusterfs-server-3.7.20-1.el7.ppc64le                                                                                                                                                                                                                        1/4
  Verifying  : glusterfs-fuse-3.7.20-1.el7.ppc64le                                                                                                                                                                                                                          2/4
  Verifying  : glusterfs-api-3.7.20-1.el7.ppc64le                                                                                                                                                                                                                           3/4
  Verifying  : glusterfs-3.7.20-1.el7.ppc64le                                                                                                                                                                                                                               4/4

Installed:
  glusterfs-server.ppc64le 0:3.7.20-1.el7

Dependency Installed:
  glusterfs.ppc64le 0:3.7.20-1.el7                                                       glusterfs-api.ppc64le 0:3.7.20-1.el7                                                       glusterfs-fuse.ppc64le 0:3.7.20-1.el7

I’m creating local file-systems on each local disks on each cluster node (each cluster node have a local disk attached to it (in my example 600Gb disks), with a filesystem called /data/brick/gvol_docker created on it) (I’m personally using ext4 filesystems (yes, xfs is just a good old Linux joke)):

node1# df | grep brick
/dev/mapper/mpathc1    591G   73M  561G   1% /data/brick/gvol_docker
node1# grep brick /etc/fstab
/dev/mapper/mpathc1 /data/brick/gvol_docker ext4 defaults 0 0
node2# df | grep brick
/dev/mapper/mpathb1    591G   73M  561G   1% /data/brick/gvol_docker
node2# df | grep brick
/dev/mapper/mpathb1 /data/brick/gvol_docker ext4 defaults 0 0
node1# systemctl start glusterd
node2# systemctl start glusterd

From my first node I’m declaring the second node of my cluster. This is what we call “peer probing”. After the probing is done from one node you can check from the other that he “sees” the other node:

node1# gluster peer status
Number of Peers: 0
node1# gluster peer probe node2
peer probe: success.
node1# gluster peer status
Number of Peers: 1

Hostname: node2
Uuid: 3ab38945-c736-4dd2-87dc-1ed25f997608
State: Accepted peer request (Connected)
node2# gluster peer status
Number of Peers: 1

Hostname: node1
Uuid: 9c4d1c7a-de39-4239-a8e1-a35bbe26e25b
State: Accepted peer request (Connected)

From one of the node create a volume for you docker data. I’m here telling I want 2 copies (ie. 2 replicas, one of each node of my cluster). The volume then have to be “started”:

# gluster volume create gvol_docker replica 2 transport tcp node1.chmod66.org:/data/brick/gvol_docker  node2.chmod666.org:/data/brick/gvol_docker force
volume create: gvol_docker: success: please start the volume to access data
# gluster volume start gvol_docker
volume start: gvol_docker: success
# gluster volume list
gvol_docker

Modify the /etc/fstab file to be sure mounting GlusterFS volume will be persistent after a reboot. Then manually mount the file-system and check that everything is working as intended:

node1# grep gluster /etc/fstab
node1:gvol_docker /data/docker/ glusterfs defaults,_netdev 0 0
node2# grep gluster /etc/fstab
node1:gvol_docker /data/docker/ glusterfs defaults,_netdev 0 0
node1# mount /data/docker
node1# df -h |grep /data/docker
node1:gvol_docker  591G   73M  561G   1% /data/docker
node1# echo "test" > /data/docker/test
node2# mount /data/docker
node2# cat /data/docker/test
test

Keep in mind that this is just an example . You can do much more using GlusterFS.

Traefik, an intelligent proxy designed for Docker

As I was saying before we will use Traefik as a reverse proxy in our Swarm cluster. I’ll not detail here how to use Traefik but just how to build it on Power. There will be another part in this blog post telling you how to use Traefik inside our cluster. This part is just here to tell you how to build your own Traefik docker image running on the ppc64le architecture.

Building Traefik from sources

To compile Traefik for the ppc64le architecture you first need go. Instead of recompiling once again everything for ppc64le I took the descision to directly install this (go) from a rpm file provided by the EPEL project (that’s why you can see fc for Fedora Core):

# rpm -qa | grep -i golang
golang-src-1.8-2.fc26.noarch
golang-1.8-2.fc26.ppc64le
golang-bin-1.8-2.fc26.ppc64le

Clone the Traefik git repository (using git) (from the internet):

# git clone https://github.com/containous/traefik
Cloning into 'traefik'...
remote: Counting objects: 13246, done.
remote: Compressing objects: 100% (15/15), done.
remote: Total 13246 (delta 2), reused 1 (delta 1), pack-reused 13230
Receiving objects: 100% (13246/13246), 16.25 MiB | 2.09 MiB/s, done.
Resolving deltas: 100% (5429/5429), done.
# ls -ld traefik
drwxr-xr-x 26 root root 4096 Apr 14 14:38 traefik

Traefik needs node and npm, this time I have recompiled node and npm from source (I’ll not explain here how to do this) (but it takes a lot of time, so take a coffee on this one):

# node -v
v8.0.0-pre
# npm --version
4.2.0
# which npm node
/usr/local/bin/npm
/usr/local/bin/node

Set your GOPATH and PATH to be able to run the go binaries. Most of the time to GOPATH is set in a go directory in your home directory:

# export GOPATH=~/go
# export PATH=$PATH:$GOPATH/bin

Download glide and glide-vc (both are prerequisites to build Traefik):

# go get github.com/Masterminds/glide
# go get github.com/sgotti/glide-vc
# go get github.com/jteeuwen/go-bindata/...
# which glide glide-vc
/root/go/bin/glide
/root/go/bin/glide-vc

Install all go prerequisites:

# cd traefik
# script/glide.sh install
[..]

For some unknow reasons or because there is something I do not understand with glide the glide.sh script was not working for me and was not installing any dependencies. I had to use this command to install all the needed go dependencies (based from the glide.yml file were we have a list of all the dependencies needed by Traefik). Then build the Traefik binary:

# cat glide.yaml  | awk '$1 ~ /^-/ && $2 ~ /package/ { print "http_proxy=\"http://pinocchio:443\" https_proxy=\"http://pinocchio:443\" go get "$NF }' | sh
# go generate
# go build
# ./traefik version
Version:      dev
Codename:     cheddar
Go version:   go1.8
Built:        I don't remember exactly
OS/Arch:      linux/ppc64le
# ls -l traefik
-rwxr-xr-x 1 root root 71718807 Apr 14 15:41 traefik
# file traefik
traefik: ELF 64-bit LSB executable, 64-bit PowerPC or cisco 7500, version 1 (SYSV), dynamically linked (uses shared libs), not stripped

Building the Traefik image

With the existing binary create a Docker image based on the Docker file provided in the Traefik repository:

# mkdir dist
# cp traefik dist
# docker build -t traefik .
Sending build context to Docker daemon 216.7 MB
Step 1/5 : FROM scratch
 --->
Step 2/5 : COPY script/ca-certificates.crt /etc/ssl/certs/
 ---> Using cache
 ---> 691efa795e96
Step 3/5 : COPY dist/traefik /
 ---> 220efc04bd2e
Removing intermediate container 859940376bdb
Step 4/5 : EXPOSE 80
 ---> Running in badf7d2111d6
 ---> 4b113142428d
Removing intermediate container badf7d2111d6
Step 5/5 : ENTRYPOINT /traefik
 ---> Running in cbaa60108171
 ---> 442a90ed7e7f
Removing intermediate container cbaa60108171
Successfully built 442a90ed7e7f

I’m testing the image is working by running it one time (example toml configuration file can be found in the repository):

# docker run -p 8080:8080 -p 80:80 -v /data/docker/traefik/traefik.toml:/etc/traefik/traefik.toml  -v /var/run/docker.sock:/var/run/docker.sock traefik
time="2017-04-14T15:20:55Z" level=error msg="Error opening fileopen log/access.log: no such file or directory"
time="2017-04-14T15:20:55Z" level=error msg="Error opening fileopen log/traefik.log: no such file or directory"
time="2017-04-14T15:20:55Z" level=info msg="Traefik version dev built on I don't remember exactly"
time="2017-04-14T15:20:55Z" level=info msg="Using TOML configuration file /etc/traefik/traefik.toml"

The Docker image is ready to be pushed in your private repository if you want to. Normally Traefik comes with a webUI allowing you to check the configuration of the reverse proxy and the services running on it. Unfortunately for me I didn’t succeed in the compilation of this webUI on ppc64le. I was stuck with a node-saas dependencies note running on ppc64le :

Portainer a free and Open webUI to drive you Swarm cluster

Portainer is a web interface for Docker. It is capable of managing single Docker engines or Swarm clusters. I really like the way it works and all the capabilities. I think this is a mandatory things that you need to provide to your customers if you want them to start on Docker, especially developers who are not all aware of how Docker is working and do not want to be worried knowing every single command line arguments ;-). You can with portainer, manage containers (deletion, creation), manage service across the Swarm cluster (deletion, creation, scaling), work with the volumes, with the networks and so on. It’s easy and beautiful, and it’s OpenSource :-). The only single problem was that I had to recompile it from scratch to be able to run it on the ppc64le arch. Neat !. I also to wanted to thanks the main Portainer developper (Anthony Lapenna). First of all, it a french guy ! (why country pride should be reserved to American people). The second reason is that Anthony is kind, smart, and ready to help. This is a rare thing nowadays I had to it. Once again thank you Anthony. So … here is how to compile Portainer for ppc64le (I’ll also show you a couple of example of how to use … just screenshots):

Building the golang-builder image

The “compilation” of the Portainer binary is based on a docker image running the go language (golang) (It’as a Docker image running a go compilation). As I’m building on a ppc64le archirecture I had to first recreate this image (they have a cross-compiler image, but running on x86, and I do not have any x86 machines). As I didn’t want to recompile go from scratch I decided to use and existing golang image already published on the official docker hub (ppc64le/golang). I then just clone the Portainer sources for their golang-builder image and use those sources to create my own builder image :

# git clone https://github.com/portainer/golang-builder
Cloning into 'golang-builder'...
remote: Counting objects: 186, done.
remote: Total 186 (delta 0), reused 0 (delta 0), pack-reused 186
Receiving objects: 100% (186/186), 47.06 KiB | 0 bytes/s, done.

Then I just had to rewrite to Dockerfile and base it from the ppc64le/golang image. I also obviously had to copy the files from the Portainer repository into the directory in which I’m building the image (if you do not have an internet access, pull the golang Docker image first (docker pull ppc64le/golang:1.7.1)) :

# cat ~/golang-builder-ppc64le/dockerfile
FROM ppc64le/golang:1.7.1

VOLUME /src
WORKDIR /src

COPY build_environment.sh /
COPY build.sh /

ENTRYPOINT ["/build.sh"]
# cp ~/golang-builder/builder-cross/*.sh ~/golang-builder-ppc64le/
# docker pull ppc64le/golang:1.7.1
# docker build -t chmod666/golang-builder:cross-platform .
Sending build context to Docker daemon 6.144 kB
Step 1/6 : FROM ppc64le/golang:1.7.1
 ---> 25dc29440507
Step 2/6 : VOLUME /src
 ---> Running in c83ddc8536cf
 ---> 8c3124eb1bfc
Removing intermediate container c83ddc8536cf
Step 3/6 : WORKDIR /src
 ---> 6c9f090aa96e
Removing intermediate container 1efc388dc274
Step 4/6 : COPY build_environment.sh /
 ---> d433a1d71f9b
Removing intermediate container 5233122d6c39
Step 5/6 : COPY build.sh /
 ---> 325d5d1677f9
Removing intermediate container 147dea39f0fe
Step 6/6 : ENTRYPOINT /build.sh
 ---> Running in a247e55d9491
 ---> f059b5f2eb0d
Successfully built f059b5f2eb0d

Building the portainer image

Now that my image is ready I can now prepare everything to build the Portainer container. The web interface is based on node and the build process is done trough a grunt file. I had to change a couple of thing inside this grunt file to be able to compile Portainer for (and on) ppc64le architecture. I had to point to my golang-builder image and add a task “release ppc64le”. The changes can be checked in this github pull request I’ll do as soon as I have the time to do it ;-) :

First I had to install node for ppc64le (I’ll not detail here how to do that I just had to once again recompile it from official sources).
Using npm I had to install bower, run bower to get the dependencies, and install all the npm needed modules:

# npm install -g bower
/usr/local/bin/bower -> /usr/local/lib/node_modules/bower/bin/bower
/usr/local/lib
└── bower@1.8.0
# npm install
npm WARN deprecated grunt-recess@0.3.5: Deprecated as RECESS is unmaintained
npm WARN deprecated minimatch@0.2.14: Please update to minimatch 3.0.2 or higher to avoid a RegExp DoS issue
npm WARN deprecated minimatch@0.3.0: Please update to minimatch 3.0.2 or higher to avoid a RegExp DoS issue
[..]

└─┬ grunt-usemin@3.1.1
  ├─┬ debug@2.6.3
  │ └── ms@0.7.2
  ├── lodash@3.10.1
  └── path-exists@1.0.0

npm WARN portainer@1.12.4 No license field.
npm WARN You are using a pre-release version of node and things may not work as expected
# cat ~/.bowerrc
{ "strict-ssl": false }
# bower-install --allow-root
bower filesize#~3.3.0       not-cached https://github.com/avoidwork/filesize.js.git#~3.3.0
bower filesize#~3.3.0          resolve https://github.com/avoidwork/filesize.js.git#~3.3.0
[..]
jquery#1.11.1 bower_components/jquery

moment#2.14.2 bower_components/moment

font-awesome#4.7.0 bower_components/font-awesome

bootstrap#3.3.7 bower_components/bootstrap
└── jquery#1.11.1

Chart.js#1.0.2 bower_components/Chart.js

To be able to run all the grunt tasks I also had to install grunt and grunt-cli:

# npm install grunt
npm WARN deprecated minimatch@0.3.0: Please update to minimatch 3.0.2 or higher to avoid a RegExp DoS issue
portainer@1.12.4 /root/portainer
└── grunt@0.4.5

npm WARN portainer@1.12.4 No license field.
npm WARN You are using a pre-release version of node and things may not work as expected
# npm install -g grunt-cli
/usr/local/bin/grunt -> /usr/local/lib/node_modules/grunt-cli/bin/grunt
/usr/local/lib
└─┬ grunt-cli@1.2.0
  ├─┬ findup-sync@0.3.0
  │ └─┬ glob@5.0.15
  │   ├─┬ inflight@1.0.6
  │   │ └── wrappy@1.0.2
  │   ├── inherits@2.0.3
  │   ├─┬ minimatch@3.0.3
  │   │ └─┬ brace-expansion@1.1.7
  │   │   ├── balanced-match@0.4.2
  │   │   └── concat-map@0.0.1
  │   ├── once@1.4.0
  │   └── path-is-absolute@1.0.1
  ├── grunt-known-options@1.1.0
  ├─┬ nopt@3.0.6
  │ └── abbrev@1.1.0
  └── resolve@1.1.7

I also had to install shasum :perl-Digest-SHA-5.85-3.el7.ppc64le):

# yum install perl-Digest-SHA-5.85-3.el7.ppc64le

Run grunt release-ppc64le to build the Portainer binary (Portainer can then by run outside of Docker in the dist directory)

# grunt release-ppc64le
Running "clean:all" (clean) task

Running "if:unixPpc64leBinaryNotExist" (if) task
[if] tests evaluated false: dist/portainer not found!

[if] running tasks:
         shell:buildUnixPpc64leBinary

Running "shell:buildUnixPpc64leBinary" (shell) task
[..]
Running "clean:tmp" (clean) task
Cleaning "dist/js/angular.4932b769.js"...OK
Cleaning "dist/js/portainer.e7a82e29.js"...OK
Cleaning "dist/js/vendor.2a5420ef.js"...OK
Cleaning "dist/css/portainer.82f0a61d.css"...OK
Cleaning "dist/css/vendor.20e93620.css"...OK

Running "replace:dist" (replace) task
>> 1 replacement in 1 file.

Done, without errors.

Finally I had to build the Docker image to be able to run Portainer inside a Docker container:

# grunt run-dev
Running "if:unixBinaryNotExist" (if) task
[if] tests evaluated true:
[if] running tasks:


Running "shell:buildImage" (shell) task
Sending build context to Docker daemon  10.1 MB
Step 1/6 : FROM centurylink/ca-certs
 ---> ec29b98d130f
Step 2/6 : COPY dist /
 ---> 200245c49d3a
Removing intermediate container 558cc1ad7bef
Step 3/6 : VOLUME /data
 ---> Running in 5397202734af
 ---> be421eb41ebf
Removing intermediate container 5397202734af
Step 4/6 : WORKDIR /
 ---> 6d63a16558f6
Removing intermediate container e557d646cd95
Step 5/6 : EXPOSE 9000
 ---> Running in d69ec6302996
 ---> 4fea90ff91be
Removing intermediate container d69ec6302996
Step 6/6 : ENTRYPOINT /portainer
 ---> Running in cf0fad52eb9b
 ---> dd0e2261e1de
Removing intermediate container cf0fad52eb9b
Successfully built dd0e2261e1de

Running "shell:run" (shell) task
Error response from daemon: No such container: portainer
Error response from daemon: No such container: portainer
8f34683455a5115cb3b9c1da6df2776b5747e22169edde4c6ebe30a73c074743

Running "watch:build" (watch) task
Waiting...

# grunt lint
Running "jshint:files" (jshint) task
>> 107 files lint free.

The last step is to tag and push this image on my Docker hub repository:

# docker tag portainer chmod666/portainer_ppc64le:1.12.4
# docker push chmod666/portainer_ppc64le:1.12.4
The push refers to a repository [docker.io/chmod666/portainer_ppc64le]
2ff68fafd090: Pushed
0cfde93eba7d: Mounted from centurylink/ca-certs
5f70bf18a086: Mounted from centurylink/ca-certs
1.12.4: digest: sha256:88a8e365e9ad506b0ad580a634dc13093b6781c102b323d787f5b2d48e90c27c size: 944

Run the image

Run the image and check you can access the Portainer gui from you browser:

# docker run -d -p 80:9000 -v /var/run/docker.sock:/var/run/docker.sock --name portainer chmod666/portainer_ppc64le:1.12.4
485b44fb1eabba5d14359a2f0b62c76aeafb8b4e48d7d035c2bb13ccffce0233
docker ps
CONTAINER ID        IMAGE                               COMMAND             CREATED             STATUS              PORTS                  NAMES
485b44fb1eab        chmod666/portainer_ppc64le:1.12.4   "/portainer"        4 seconds ago       Up 3 seconds        0.0.0.0:80->9000/tcp   portainer
# uname -a
Linux mydockerhost 3.10.0-514.el7.ppc64le #1 SMP Wed Oct 19 11:27:06 EDT 2016 ppc64le ppc64le ppc64le GNU/Linux

You can see on the images above that Portainer is running ok. Once again I’ll not explain here how to use Portainer. The tool is so good that I do not have to explain this. It’s just super simple ;-) .

Use Ansible and Terraform to create this infrastucture

I’m changing my tooling system every couple of months regarding how to “drive” Openstack. I’ve move from the curl way, to the python one, to the Ansible, to finally come to the conclusion that for this purpose (Swarm as a Service) Terraform was the best one. I’ll explain here how to use both Ansible and Terrafrom to drive your Openstack (PowerVC). You have a couple of things to change on the PowerVC host itself. Keep in my that this is not supported at all. :-)

To use Ansible or Terraform you first have to “prepare” the PowerVC host. Because Ansible use Shade (a python library) and Terafrom it’s own code you have to create some endpoints, availability zones to be able to use those tools. My understanding is that these tools are based on Openstack standards … but everybody knows that standards may differ depending to who you are talking to… so let’s say that PowerVC Openstack standards are not the same as the ones of the people from Shade or Terraform ;-).

Something I’m sure about is that almost nobody is using Openstack “host-aggregate” to isolate group of server (most of the time theses host aggregates are used to isolate type of architecture and not geographic zones, or resiliency zones). That’s why most of the tools that allows you to create Openstack servers are able to set an availibity zone and not a host aggregate. The first thing to do is to set an availaiblity zone per host aggregate you have on your PowerVC host:

# source /opt/ib/powervc/powervcrc
# openstack
(openstack) aggregate list
+----+---------------+-------------------+
| Id | Name          | Availability Zone |
+----+---------------+-------------------+
| 21 | myaz1         | -                 |
| 1  | Default Group | -                 |
| 22 | myaz2         | -                 |
+----+---------------+-------------------+
(openstack) aggregate set --zone myaz1 21
(openstack) aggregate list
+----+---------------+-------------------+
| ID | Name          | Availability Zone |
+----+---------------+-------------------+
| 21 | myaz1         | myaz1             |
|  1 | Default Group | None              |
| 22 | MarneSud      | None              |
+----+---------------+-------------------+
(openstack) aggregate set --zone myaz2 22
(openstack) aggregate list
+----+---------------+-------------------+
| ID | Name          | Availability Zone |
+----+---------------+-------------------+
| 21 | myaz1         | myaz1             |
|  1 | Default Group | None              |
| 22 | myaz2         | myaz2             |
+----+---------------+-------------------+

Once again if you want to create volumes using those tools you have to create service and enpoint for volumev2. Both Ansible and Teraform are using volumev2 and PowerVC is using volumev1. To do this you first have to change the keystone policies allowing you to create modify and delete endpoint and services:

# vim /opt/ibm/powervc/policy/keystone/policy.json
[..]
    "identity:create_endpoint": "role:service or role:admin",
    "identity:delete_endpoint": "role:service or role:admin",
[..]
    "identity:create_service": "role:service or role:admin",
    "identity:delete_service": "role:service or role:admin",
[..]

Then create the endpoint and service for volumev2:

# source /opt/ibm/powervc/powervcrc
# openstack
(openstack) service create --name cinderv2 --description "Openstack block storage v2" volumev2
+-------------+----------------------------------+
| Field       | Value                            |
+-------------+----------------------------------+
| description | Openstack block storage v2       |
| enabled     | True                             |
| id          | 62865ae6bc1149238fff31b65f34d8ea |
| name        | cinderv2                         |
| type        | volumev2                         |
+-------------+----------------------------------+
(openstack) endpoint create --region RegionOne volumev2 public https://deckard.chmod666.org:9000/v2/%(tenant_id)s
+--------------+---------------------------------------------------------+
| Field        | Value                                                   |
+--------------+---------------------------------------------------------+
| enabled      | True                                                    |
| id           | 904ad762cf4945ad8b57bd4f8e1e8bdc                        |
| interface    | public                                                  |
| region       | RegionOne                                               |
| region_id    | RegionOne                                               |
| service_id   | 62865ae6bc1149238fff31b65f34d8ea                        |
| service_name | cinderv2                                                |
| service_type | volumev2                                                |
| url          | https://deckard.chmod666.org:9000/v2/%(tenant_id)s      |
+--------------+---------------------------------------------------------+
(openstack) endpoint create --region RegionOne volumev2 admin https://deckard.chmod666.org:9000/v2/%(tenant_id)s
+--------------+---------------------------------------------------------+
| Field        | Value                                                   |
+--------------+---------------------------------------------------------+
| enabled      | True                                                    |
| id           | f81509119bdb4338adf0c4a0b30c9415                        |
| interface    | admin                                                   |
| region       | RegionOne                                               |
| region_id    | RegionOne                                               |
| service_id   | 62865ae6bc1149238fff31b65f34d8ea                        |
| service_name | cinderv2                                                |
| service_type | volumev2                                                |
| url          | https://deckard.chmod666.org:9000/v2/%(tenant_id)s      |
+--------------+---------------------------------------------------------+
(openstack) endpoint create --region RegionOne volumev2 internal https://127.0.0.1:9000/v2/%(tenant_id)s
+--------------+-----------------------------------------+
| Field        | Value                                   |
+--------------+-----------------------------------------+
| enabled      | True                                    |
| id           | de730a950aa34a4bb1a8fd921183e169        |
| interface    | internal                                |
| region       | RegionOne                               |
| region_id    | RegionOne                               |
| service_id   | 62865ae6bc1149238fff31b65f34d8ea        |
| service_name | cinderv2                                |
| service_type | volumev2                                |
| url          | https://127.0.0.1:9000/v2/%(tenant_id)s |
+--------------+-----------------------------------------+
(openstack) endpoint list --service volumev2
+----------------------------------+-----------+--------------+--------------+---------+-----------+---------------------------------------------------------+
| ID                               | Region    | Service Name | Service Type | Enabled | Interface | URL                                                     |
+----------------------------------+-----------+--------------+--------------+---------+-----------+---------------------------------------------------------+
| 904ad762cf4945ad8b57bd4f8e1e8bdc | RegionOne | cinderv2     | volumev2     | True    | public    | https://deckard.chmod666.org:9000/v2/%(tenant_id)s      |
| de730a950aa34a4bb1a8fd921183e169 | RegionOne | cinderv2     | volumev2     | True    | internal  | https://127.0.0.1:9000/v2/%(tenant_id)s                 |
| f81509119bdb4338adf0c4a0b30c9415 | RegionOne | cinderv2     | volumev2     | True    | admin     | https://deckard.chmod666.org:9000/v2/%(tenant_id)s      |
+----------------------------------+-----------+--------------+--------------+---------+-----------+---------------------------------------------------------+

For Terraform to be able to create machines with fixed ip addresses you once again have to modify the policy.json file for nova:

[..]
    "os_compute_api:os-tenant-networks": "role:service or role:admin",
    "os_compute_api:os-tenant-networks:discoverable": "role:service or role:admin",
[..]

At this time you will be able to use both Ansible and Terraform to drive you PowerVC cloud:

Installing Terraform

There are no Terraform packages available for ppc64le, we -one more time- have to build the tool from sources:

# export GOPATH=/root/.go
# export PATH=$PATH:$GOPATH/bin
# go get -u golang.org/x/tools/cmd/stringer
# go get -u github.com/hashicorp/terraform
# cd $GOPATH/src/github.com/hashicorp/terraform/
# make fmt
# make dev
==> Checking that code complies with gofmt requirements...
go generate $(go list ./... | grep -v /terraform/vendor/)
2017/04/20 19:10:21 Generated command/internal_plugin_list.go
==> Removing old directory...
==> Installing gox...
==> Building...
Number of parallel builds: 7

-->     linux/ppc64: github.com/hashicorp/terraform
==> Results:
total 198M
-rwxr-xr-x 1 root root 198M Apr 20 19:11 terraform
# file bin/terraform
bin/terraform: ELF 64-bit MSB executable, 64-bit PowerPC or cisco 7500, version 1 (SYSV), statically linked, not stripped
./bin/terraform --help
Usage: terraform [--version] [--help]  [args]

The available commands for execution are listed below.
The most common, useful commands are shown first, followed by
less common or more advanced commands. If you're just getting
started with Terraform, stick with the common commands. For the
other commands, please read the help and docs before usage.

Common commands:
    apply              Builds or changes infrastructure
    console            Interactive console for Terraform interpolations
    destroy            Destroy Terraform-managed infrastructure
    env                Environment management
    fmt                Rewrites config files to canonical format
    get                Download and install modules for the configuration

I’m then creating a Terraform plan allowing me to create three docker machines. Each of the machine will have a 500Gb disk for its dockervg volume group and a 600Gb disk for it’s Gluster disks. The Terraform plan deploy.tf below shows you how to do this. There are a couple of things to node. The “ssh_key_file” variable is the public key of an Openstack key/pair I’ve create on PowerVC. After the machines are installed I’m running an Ansible playbook allowing me to install the rmc packages (this is the local-exec in the “openstack_compute_instance_v2″). After this I’m attaching both disks for each machines. Finally I’m running two local-exec providers one discovering the new disks attached before, and the second one running my Ansible playbook installing and configuring the Swarm cluster. (The one available in my github account) :

# cat deploy.tf
variable "volumetype" {
  default = "myvoltype"
}

variable "imagename" {
  default = "redhat72ppc64le"
}

variable "network_name" {
  default = "myvlan"
}

variable "ssh_key_file" {
  default = "~/.ssh/id_rsa.terraform"
}

variable "docker_machines_names" {
  default = [ "docker1", "docker2", "docker3" ]
}

variable "docker_machines_ips" {
  default = [ "10.19.10.101", "10.19.10.102", "10.19.10.103" ]
}

resource "openstack_compute_keypair_v2" "terraform_key" {
  name = "terraform_key"
  public_key = "${file("${var.ssh_key_file}.pub")}"
}

resource "openstack_blockstorage_volume_v2" "dockervg_volumes" {
  count = "${length(var.docker_machines_names)}"
  name = "${element(var.docker_machines_names, count.index)}_docker"
  size = "500"
  volume_type = "${var.volumetype}"
}

resource "openstack_blockstorage_volume_v2" "gluster_volumes" {
  count = "${length(var.docker_machines_names)}"
  name = "${element(var.docker_machines_names, count.index)}_gluster"
  size = "600"
  volume_type = "${var.volumetype}"
}

resource "openstack_compute_instance_v2" "docker_machines" {
  count = "${length(var.docker_machines_names)}"
  name = "${element(var.docker_machines_names, count.index)}"
  image_name = "${var.imagename}"
  key_pair = "${openstack_compute_keypair_v2.terraform_key.name}"
  flavor_name = "docker"
  network {
    name = "${var.network_name}"
    fixed_ip_v4 = "${element(var.docker_machines_ips, count.index)}"
  }
  provisioner "local-exec" {
    command = "sleep 600 ; ansible-playbook  -i /srv/ansible/inventories/ansible_inventory -l -l ${element(var.docker_machines_ips, count.index)} -e ansible_role_path=/srv/ansible /srv/ansible/rmc_linux_on_p.yml\" ; sleep 120"
  }
}

resource "openstack_compute_volume_attach_v2" "dockervg_volumes_attach" {
  count = "${length(var.docker_machines_names)}"
  volume_id = "${element(openstack_blockstorage_volume_v2.dockervg_volumes.*.id, count.index)}"
  instance_id = "${element(openstack_compute_instance_v2.docker_machines.*.id, count.index)}"
}

resource "openstack_compute_volume_attach_v2" "gluster_volumes_attach" {
  count = "${length(var.docker_machines_names)}"
  volume_id = "${element(openstack_blockstorage_volume_v2.gluster_volumes.*.id, count.index)}"
  instance_id = "${element(openstack_compute_instance_v2.docker_machines.*.id, count.index)}"
}

resource "null_resource" "discover_disks" {
  count = "${length(var.docker_machines_names)}"
  provisioner "local-exec" {
    command = "ansible-playbook -i /srv/ansible/inventories/ansible_inventory -l ${element(var.docker_machines_ips,count.index)} -e ansible_role_path=/srv/ansible /srv/ansible/rescan_scsi_bus.yml\""
  }
  depends_on = ["openstack_compute_volume_attach_v2.gluster_volumes_attach"]
}

resource "null_resource" "ansible_run" {
  provisionner "local-exec" {
    command = "ansible-playbook  -i /srv/ansible/inventories/ansible_inventory -e ansible_role_path=/srv/ansible /srv/ansible/docker.yml\""
  }
  depends_on = ["null_resource.discover_disks"]
}

I’m then just running Terrafrom to deploy the whole stack (machine creation, and Swarm cluster creation):

# ls -l deploy.tf
-rw-rw-r-- 1 root root 3335 Apr 20 19:23 /root/terraform_docker/deploy.tf
# terraform deploy

Use Ansible to drive your PowerVC Openstack infrastructure

The exact same thing can be done using Ansible. I’ll not go here into deep details but here are playbook samples of machine creation using Ansible:

# cat /tmp/myserver.json
{ "vm_name": "myserver", "fixed_ip": "10.10.10.115", "image_name": "aix72", "ec": "0.20", "vp": "2", "mem": "20480", "vlan_name": "vlan12", "availz": "myaz1" }
# cat create_server.yml
- name: PowerVC | Set Fact
  run_once: True
  set_fact:
    vm_file: "{{ lookup('file', '/tmp/{{ item }}/{{ item }}.json') | from_json }}"
  with_items:
    - "{{ servers }}"
  register: res
  delegate_to: localhost
- name: PowerVC | Create instance
  run_once: True
  os_server:
    state: present
    timeout: 600
    name: "{{ item.ansible_facts.vm_file.vm_name }}"
    #boot_volume: "{{ vm_name }}_boot1"
    image: "{{ item.ansible_facts.vm_file.image_name }}"
    flavor: "cloud"
    availability_zone: "{{ item.ansible_facts.vm_file.availz }}"
    nics:
      - port-name: "{{ item.ansible_facts.vm_file.vm_name }}-port"
  with_items:
    - "{{ res.results }}"
  delegate_to: localhost

Volume attachments can be done too, here is an extract of the playbook:

[..]
- name: PowerVC | Attaching volumes
  run_once: True
  os_server_volume:
    state: present
    server: "{{ item[0] }}"
    volume: "{{ item[1] }}"
  when: (( item[0]+"_" in item[1]) and ( "-boot" not in item[1]))
  with_nested:
    - "{{ servers }}"
    - "{{ volumes.stdout_lines }}"
  delegate_to: localhost
  tags: move

Log every Ansible run using Openstack ARA

the goal Openstack ARA is to provide to the final user a way to trace and debug their Ansible run. It’s a simple web interface storing and tracing every action that you have done with Ansible. I like it a lot. It respects the KISS principle (Keep it Soft and Simple). I’m my opinion ARA is ten time better than the Ansible tower for the reporting part. Here is my simple how to to run ARA in Docker containers:

Building the Docker Container for the ARA webapp

I’m going to build the ARA web application based on the “official” image of CentOS. I’m doing this because I do not want to use my custom images because I want to publish this image on the hub and obviously you don’t care about any of my customization. So let start:

# docker search ppc64le
NAME                                   DESCRIPTION                                     STARS     OFFICIAL   AUTOMATED
[..]
ppc64le/centos                                                                  0
hitomitak/centos72-ppc64le                                                      0
# docker pull ppc64le/centos
Using default tag: latest
latest: Pulling from ppc64le/centos
d2664d978557: Pull complete
Digest: sha256:997bd3301d0032ba1f04be925e006b9fdabc254be7dfbf8b0fc7bbab21108efc
Status: Downloaded newer image for ppc64le/centos:latest

I’m then writing a dockerfile in charge of installing ARA and the needed dependencies. A script called run.sh is copied into the container to run the ara-manager. I’ll later need to have an environment variable called ARA_DATABASE to tell the ara web application where the database is. I’m not putting this into the Dockerfile to allow the user to use whatever he wants to.

The image is build from the docker “ppc64le/centos” image.
I’m installing the needed dependencies for ARA.
I’m installing pip and ARA.
I’m creating a volume, copying the run.sh into the container and make it the entry point.
I’m buidling the container with the –build-args arguments because I’m using a proxy to access the internet.

# cat docker file
FROM ppc64le/centos

RUN yum -y update && yum -y install gcc python-devel libffi-devel openssl-devel python libxml2-devel libxslt-devel MySQL-python curl mailcap
RUN curl -k https://bootstrap.pypa.io/get-pip.py -o /get-pip.py && python /get-pip.py && pip install ara

EXPOSE 8080
COPY run.sh /
RUN chmod +x run.sh

ENTRYPOINT ["/run.sh"]
# docker build --build-arg http_proxy="http://myproxy:443" --build-arg https_proxy="http://myproxy:443" -t chmod666/ara_webapp .
Sending build context to Docker daemon 3.072 kB
Step 1 : FROM ppc64le/centos
 ---> 3d0ded8c6f42
Step 2 : RUN yum -y update && yum -y install gcc python-devel libffi-devel openssl-devel python libxml2-devel libxslt-devel MySQL-python
 ---> Using cache
 ---> 3d3e45af237b
Step 3 : RUN curl -k https://bootstrap.pypa.io/get-pip.py -o /get-pip.py && python /get-pip.py && pip install ara
 ---> Running in d84924b00812
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 1558k  100 1558k    0     0  1451k      0  0:00:01  0:00:01 --:--:-- 1450k
[..]

The run.sh script just run the ara-manage command, we will see at the execution time that we pass the ARA_DATABASE as an environment variable to allow the web application to query the database:

# cat run.sh
#!/bin/bash -e

ara-manage runserver -h 0.0.0.0 -p 8080

Building the Docker Container for the ARA databse

I now need a database to store the ara records. I’m using mariadb. In this image I’m creating the ara database, and an ara user with all the privileges needed to write in the database, here is the Dockerfile and the build execution:

#  docker build --build-arg http_proxy="http://myproxy:443" --build-arg https_proxy="http://myproxy:443" -t chmod666/ara_db .
Sending build context to Docker daemon  2.56 kB
Step 1 : FROM ppc64le/centos
 ---> 3d0ded8c6f42
Step 2 : RUN yum -y  update &&   yum -y install mariadb-server &&   echo "mysql_install_db --user=mysql" > /tmp/config &&   echo "mysqld_safe &" >> /tmp/config &&   echo "mysqladmin --silent --wait=30 ping || exit 1" >> /tmp/config &&   echo "mysql -e 'CREATE DATABASE ara;'" >> /tmp/config &&   echo "mysql -e \"CREATE USER ara@'%'IDENTIFIED BY 'password';\"" >> /tmp/config &&   echo "mysql -e 'GRANT ALL PRIVILEGES ON ara.* TO ara@\"%\";'" >> /tmp/config &&   echo "mysql -e 'FLUSH PRIVILEGES;'" >> /tmp/config &&   bash /tmp/config &&   rm -f /tmp/config
 ---> Running in 083ab7adc7fc
Loaded plugins: fastestmirror, ovl
Determining fastest mirrors
[..]
Step 3 : VOLUME /etc/mysql /var/lib/mysql
 ---> Running in 50230896793d
 ---> d77628675cf2
Removing intermediate container 50230896793d
Step 4 : CMD mysqld_safe
 ---> Running in 5a997c29b5c1
 ---> 9bbb54c36d41
Removing intermediate container 5a997c29b5c1
Step 5 : EXPOSE 3306
 ---> Running in 6340f6f9b975
 ---> 85fcb3f6436e
Removing intermediate container 6340f6f9b975
Successfully built 85fcb3f6436e

FROM ppc64le/centos

RUN \
  yum -y  update && \
  yum -y install mariadb-server && \
  echo "mysql_install_db --user=mysql" > /tmp/config && \
  echo "mysqld_safe &" >> /tmp/config && \
  echo "mysqladmin --silent --wait=30 ping || exit 1" >> /tmp/config && \
  echo "mysql -e 'CREATE DATABASE ara;'" >> /tmp/config && \
  echo "mysql -e \"CREATE USER ara@'%'IDENTIFIED BY 'password';\"" >> /tmp/config && \
  echo "mysql -e 'GRANT ALL PRIVILEGES ON ara.* TO ara@\"%\";'" >> /tmp/config && \
  echo "mysql -e 'FLUSH PRIVILEGES;'" >> /tmp/config && \
  bash /tmp/config && \
  rm -f /tmp/config

VOLUME ["/etc/mysql", "/var/lib/mysql"]

CMD ["mysqld_safe"]

EXPOSE 3306

I’m tagging and uploading my images into my private repository. I’ll not show here how to run ARA we will see this in the next part.:

# docker tag chmod666/ara_webapp registry.chmod666.org:5000/chmod666/ara_webapp
# docker tag chmod666/ara_db registry.chmod666.org:5000/chmod666/ara_db
# docker push registry.chmod666.org:5000/chmod666/ara_webapp
The push refers to a repository [registry.chmod666.org:5000/chmod666/ara_webapp]
e6daa00bf7b0: Pushed
66ff8c22376f: Pushed
6327608d1321: Pushed
4cdf21a0f079: Pushed
8feae09cb245: Pushed
latest: digest: sha256:2d097fee31170eb2b3fbbe099477d345d1beabbbd83788ca9d35b35cec353498 size: 1367
# docker push registry.chmod666.org:5000/chmod666/ara_webapp
[..]

Installing the callback on the Ansible master

After the service is started and created (look at the next part for this) you have to install the ARA call back on the Ansible master host. Use pip to do this. Then you just have to export some variable to tell Ansible where the call back is and where is the ARA database. I have personnally set this in the /etc/bashrc, by doing this every Ansible run is recorded into ARA:

# ARA env var
export ara_location=$(python -c "import os,ara; print(os.path.dirname(ara.__file__))")
export ANSIBLE_CALLBACK_PLUGINS=$ara_location/plugins/callbacks
export ANSIBLE_ACTION_PLUGINS=$ara_location/plugins/actions
export ANSIBLE_LIBRARY=$ara_location/plugins/modules
export ARA_DATABASE="mysql+mysqldb://ara:password@node1.chmod666.org/ara"

Verify everything is working as attended (access the ARA webapp by a web brower)

Run a few Ansible playbooks and verify that you know have some data in the ARA database, here are some example after a few day running Ansible in production:

Working with Traefik

Now that we have all our Docker images ready to build our Docker As a Service platform let’s see how to run those image behind the Traefik reserve proxy. To do so we will create “services” on our Swarm cluster. With the latest version of Swarm (Running in Docker 17.03) we now have the possibility to create compose files to run those services, theses services can be run using the “docker stack” command, let’s see how this is working. The first thing to do is to create a service to run Traefik itself. The yaml file below shows you how to create a service running Traefik, publish port 80 443 and 8080.

# cd stack_files
# cat traefik.yml
version: "3"
services:
  traefik:
    image: chmod666/traefik
    command: --docker --docker.swarmmode --docker.watch --web
    networks:
      - swarm_overlay
    ports:
      - "80:80"
      - "443:443"
      - "8080:8080"
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock
      - /data/docker/traefik/traefik.toml:/etc/traefik/traefik.toml
networks:
  swarm_overlay:
    external:
      name: mynet
# docker stack deploy --compose-file=traefik.yml traefik
Creating service traefik_traefik
# docker service ls
ID            NAME             MODE        REPLICAS  IMAGE
ycrymxbsif77  traefik_traefik  replicated  1/1       chmod666/traefik

You can then check that Traefik is running on one of the node of our cluster:

# docker stack ls
NAME     SERVICES
traefik  1
# docker stack ps traefik
ID            NAME               IMAGE                                            NODE                       DESIRED STATE  CURRENT STATE         ERROR  PORTS
dldc45i1j2w8  traefik_traefik.1  chmod666/traefik  node1  Running        Running 1 second ago
# docker ps
CONTAINER ID        IMAGE                                                    COMMAND                  CREATED             STATUS              PORTS                    NAMES
c64d7869bbfa        chmod666/traefik:latest   "/traefik"               13 seconds ago      Up 8 seconds        80/tcp                   traefik_traefik.1.dldc45i1j2w8wfvipm80u6nco
693fc9cd91db        registry                                                 "./entrypoint.sh /..."   2 weeks ago         Up 5 days           0.0.0.0:5000->5000/tcp   registry

Now that Traefik is running we will now run our Portainer service. Note here that I’m not exposing any port. This is the role of Traefik to do the job of a reverse proxy. So you can see that no port are exposed. So how does Traefik knows how to access Portainer ? This is achieved by the labels you see in the stack file (into the deploy section). The “traefik.port” label tells Traefik that this service is running on port 9000 (Traefik will here redirect all the traffic coming from the url having a ContextPath named /portainer into the Portainer container running on port 9000 (on the overlay network)). The “traefik.frontend.rule=PrefixPathStrip:/portainer” tells Traefik this service will be accessible using the context path “/portainer” (ie. for instance docker.chmod666.org/portainer). Most of the time this part is achieved using the “Host” rule (ie. “traefik.frontend.rule=Host:portainer.docker.chmod666.org), this is much simpler and work everytimes (some web application needs complex rewrite rules to work with a context path, and sometime the Traefik rules (Path, PathPrefix, PathStrip, PathPrefixStrip and AddPrefix) are not enough (I had some difficulties for with WordPress for instance). Problem is you have to create a new CNAME entry every time a new service is created …. not that scalable …. The solution to this problem is to create a DNS willcard entry (ie. *.docker.chmod666.org will tell a.docker.chmod666.org or randomname.docker.chmod666.org will be resolved with the same address). Unfortunately some dns providers does not allows doing this or if like me you can’t talk with you network team … you’re stuck and have to play with the other rules (Path ones). The example below shows you the utilization of “PathPrefixStrip”. Please also note that I’m running this service in global mode, telling I want this service to run one every node of the cluster:

version: "3"
services:
  portainer:
    image: chmod666/portainer
    command: -H unix:///var/run/docker.sock
    deploy:
      labels:
        - traefik.port=9000
        - traefik.frontend.rule=PathPrefixStrip:/portainer
      mode: global
    networks:
      - swarm_overlay
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock
networks:
  swarm_overlay:
    external:
      name: mynet
# docker stack deploy --compose-file=portainer.yml portainer
Creating service portainer_portainer
# docker stack ls
NAME       SERVICES
portainer  1
traefik    1
# docker stack ps portainer
ID            NAME                                           IMAGE                                              NODE                       DESIRED STATE  CURRENT STATE          ERROR  PORTS
ndmx4bvolrfn  portainer_portainer.636fmsoqyspcu9bbzs12r8fs8  chmod666/portainer  node1.chmod666.org  Running        Running 5 seconds ago
m2jtuz4hzcwf  portainer_portainer.0piqhrpzkjfynakyvek7r53mw  chmod666/portainer  node2.chmod666.org  Running        Running 2 seconds ago
# docker service ls
ID            NAME                 MODE        REPLICAS  IMAGE
gsmn18is9whg  portainer_portainer  global      2/2       chmod666/portainer
i7uucqflzewt  traefik_traefik      replicated  1/1       chmod666/traefik

If we have a look at the Traefik logs we can see that Traefik sees that new docker images were created allowing us to access Portainer using the context path “/portainer”. You can check the picture below prooving this. I repeat this but as the service is replicated if we stop one node of the swarm cluster portainer will still be accessible:

time="2017-04-15T21:08:29Z" level=debug msg="Last docker config received more than 2s, OK"
time="2017-04-15T21:08:29Z" level=debug msg="Creating frontend frontend-PathPrefixStrip-portainer"
time="2017-04-15T21:08:29Z" level=debug msg="Wiring frontend frontend-PathPrefixStrip-portainer to entryPoint http"
time="2017-04-15T21:08:29Z" level=debug msg="Creating route route-frontend-PathPrefixStrip-portainer PathPrefixStrip:/portainer"
time="2017-04-15T21:08:29Z" level=debug msg="Creating backend backend-portainer-portainer"
time="2017-04-15T21:08:29Z" level=debug msg="Creating load-balancer wrr"
time="2017-04-15T21:08:29Z" level=debug msg="Creating server server-portainer_portainer-7y0ujh71y4nkvt5ynng7il6bv at http://10.0.0.6:9000 with weight 0"
time="2017-04-15T21:08:29Z" level=debug msg="Creating server server-portainer_portainer-oz7b9ev1js17wgvktd6uuqdww at http://10.0.0.5:9000 with weight 0"
time="2017-04-15T21:08:29Z" level=info msg="Server configuration reloaded on :80"

For ARA I had tried different configuration of Traefik using the Path* rules but none of them was working. I had to create a new CNAME alias for my service and then add to use the “Host” rule. This is also a good way to show you rule with Host:

# cat ara.yml
version: "3"
services:
  ara:
    image: chmod666/ara_webapp
    environment:
      - ARA_DATABASE="mysql+mysqldb://ara:password@aradb/ara"
    deploy:
      labels:
        - traefik.port=80
        - traefik.frontend.rule=Host:unixara.chmod666.org
      mode: global
    networks:
      - aranet
  aradb:
    image: chmod666/ara_db
    ports:
      - "3306:3306"
    volumes:
      - /var/lib/mysql/:/data/docker/ara_db
    networks:
      - aranet
networks:
  aranet:
    external:
      name: mynet
# docker stack deploy --compose-file=ara.yml ara
Creating service ara_ara
Creating service ara_aradb

# docker logs 7484945344c4
time="2017-04-15T22:27:59Z" level=debug msg="Last docker config received more than 2s, OK"
time="2017-04-15T22:27:59Z" level=debug msg="Creating frontend frontend-Host-unixara-fr-net-intra"
time="2017-04-15T22:27:59Z" level=debug msg="Wiring frontend frontend-Host-unixara-fr-net-intra to entryPoint http"
time="2017-04-15T22:27:59Z" level=debug msg="Creating route route-frontend-Host-unixara-chmod666-org Host:unixara.chmod666.org"
time="2017-04-15T22:27:59Z" level=debug msg="Creating backend backend-ara-ara"

Mixing everything together

We now have every pieces of our puzzle ready to be assembled. The automation is realized by an Ansible playbook that I named “swarm_as_a_service_for_ppc64le”. This playbook is available on my github account at this address: https://github.com/chmod666org/swarm_as_a_service_ppc64le. The playbook is split in different roles:

Docker engine: This role install the Docker engine on the machines. It creates a dockervg (on a 500GB disk) and setup the device mapper using the RedHat Atomic host project “docker-storage-setup”. It also install docker-compose. Both tools are cloned from the official docker repository (cloned on the Ansible master, then copied on the nodes). After this step you have a ready to be used “docker machine”
Registry server: This role deploys a private registry on the first node of your managers node. This registry is secured, this means that self signed certificate are created during the process. All our images are pushed into this registry (ara, traefik, portainer). The registry is also authenticated by a user and a password.
Registry client: This role tells every nodes of the Swarm cluster to log on the private registry. Needed certificates are copied from the host hosting the registry to every node of the cluster. Once this is done images are pulled locally (we will need this later).
Swarm: This role creates the docker swarm. In the inventory every node should be “tagged” as a manager ([docker_swarm_managers]) or a worker ([docker_swarm_worker]). You can have as many nodes as you want.
Gluster: This role is in charge to create a gluster filesystem in /data/docker/[hostname_of_the_first_manager] available on every node of the cluster. Each node of the gluster cluster must have a 600GB disk. The default number of replicas is two.
Docker service: This role run our services (ie. traefik, ara, and portainer) in the cluster using the new “docker stack” command. The services are described in “docker-compose.yml” for each services. At this time after services are created a random password is generated for the Portainer admin user (extract of the playbook below) and a mail is send to the user telling him the Swarm cluster is ready to be used.

- name: docker_services | Setting Portainer root password
  run_once: True
  uri:
    url: 'http://{{ groups.docker_swarm_managers[0] }}/portainer/api/users/admin/init'
    method: POST
    validate_certs: no
    HEADER_Content-Type: "application/json"
    HEADER_Accept: "application/json"
    return_content: yes
    body_format: json
    body: "{{ lookup('template', 'portainer_admin.json.j2') }}"
    status_code: 200
    timeout: 240
  delegate_to: localhost
  when: '"portainer" not in docker_stacks.stdout'

Check the inventory is ok, and run the playbook. (Normally this is run by Terraform after the machine are created but for the convenience of this blog post I’m showing here how to launch the playbook by hand).

# cat inventories/hosts.docker
[docker]
node1.chmod666.org
node2.chmod66.org

[docker_registry_servers]
node1.chmod666.org

[docker_registry_clients]
node2.chmod666.org

[docker_swarm_managers]
node1.chmod666.org

[docker_swarm_workers]
node2.chmod666.org

[gluster_main]
node1.chmod666.org

[gluster_nodes]
node1.chmod666.org
node2.chmod666.org
# ansibe-playbook -i inventories/hosts.docker docker.yml

When everything is finish the final customer got an email telling him his Swarm Cluster is ready to be used:

Conclusion

You now do have no excuse anymore to tell your boss PowerSystems are ready for DevOps. Spread the word and tell everybody that we can do the exact same thing on Power than they do on x86. It’s a fact ! Finally think about the advantages of doing this on Power. You beat the moore law. You have a simple Openstack infrastructure ready to be used on Power. You can mix teams and OS (Linux and AIX) on the same hardware. Finally … you’ll be different by design. It is not good to be different ?

↧