OpenStack Hardware Node Maintenance


Introduction

Software in the OpenStack ecosystem evolves over time, either through new feature additions, bug fixes, or when vulnerabilities are patched. Part of operating an OpenStack cloud involves maintaining its software through updates. How do you ensure the operating system for each hardware node is updated? And how are the OpenStack services updated? This guide explains how to update the operating system and each OpenStack service.

Prerequisites

  • Root access to hardware nodes
  • Ansible experience
  • Linux command line experience
  • How to work with Kolla Ansible

Considerations for OpenStack Maintenance

If downtime cannot be afforded for instances hosted on a node requiring maintenance, it is best to live migrate instances from the target node to another, then perform the required tasks. When performing updates, maintenance should occur one node at a time.

Instances can be live migrated using Horizon or through the command line, using OpenStackClient.

How to Live Migrate an Instance Using Horizon

This section explains how to live migrate instances using Horizon.

Prerequisites

  • A user with administrative privileges
  • Horizon URL

Procedure

To live migrate an instance, first login to Horizon as a user with the administrator role.

Next, navigate on the left to Admin -> Compute -> Instances. This view shows you all instances for the currently selected project and allows you to perform administrative functions, such as live migrating instances.

This images shows an example of how this section appears: admin instance list The Host column indicates the hardware node from which an instance is hosted.

Choose the instance you’re migrating and from its drop down on the right, select the option for “Live Migrate Instance”.

The following form appears: live migrate instance You can either have the system determine a host or choose one from the drop down. Submit the form to live migrate the instance.

Private Cloud Core clouds are running Ceph as a shared storage backend, in which the instance data lives. The options in the form for Block Migration and Disk Over Commit do not apply to clouds using shared storage. These options are used if instance data is stored locally on the hardware nodes.

Migration Status

Back in the instance listing, you will see an indicator the instance is being migrated.

If the migration succeeded you will see a different host under the Host column for the instance.

Live migrate all the instances from this host until none remain. Then move on to performing any required maintenance tasks.


How to Perform Operating System Updates

This section explains how to perform operating system updates for any Private Cloud Core clouds running CentOS 8.


Getting Started

To get started, login as root over SSH into a node requiring maintenance. Next, the following commands are used to update the operating system:

systemctl disable docker.socket
systemctl stop docker.socket
systemctl disable docker.service
systemctl stop docker service
dnf -y update
reboot

Each command needs to be run in order. Once updates are complete through DNF, reboot the hardware node. When the first node restarts successfully and has rejoined the OpenStack and Ceph clusters, move on to the next node and perform the same steps until no nodes remain.


The following demonstrates running the above commands in detail.

1. Disable Docker socket

Use systemctl disable docker.socket to disable the Docker socket.

Example:

[[email protected] ~]# systemctl disable docker.socket
[[email protected] ~]#

Running this command returns no output and has succeeded.

2. Stop Docker socket

Use systemctl stop docker.socket to stop the Docker socket.

Example:

[[email protected] ~]# systemctl stop docker.socket
[[email protected] ~]#

Running this command returns no output and has succeeded.

3. Disable Docker service

Use systemctl disable docker.service to disable the Docker service.

Example:

[[email protected] ~]# systemctl disable docker.service
Removed /etc/systemd/system/multi-user.target.wants/docker.service.

4. Stop Docker service

Use systemctl stop docker.service to stop the Docker service.

Example:

[[email protected] ~]# systemctl stop docker.service
[[email protected] ~]#

Running this command returns no output and has succeeded.

5. Update the package manager, DNF

Use dnf update to update the package manager.

Example:

[[email protected] ~]# dnf update
Last metadata expiration check: 0:11:51 ago on Tue 17 Aug 2021 02:30:48 PM UTC.
Dependencies resolved.
===============================================================================================
 Package                           Architecture   Version                 Repository      Size
===============================================================================================
Upgrading:
 containerd.io                     x86_64         1.4.9-3.1.el8           docker          30 M
 docker-ce                         x86_64         3:20.10.8-3.el8         docker          22 M
 docker-ce-cli                     x86_64         1:20.10.8-3.el8         docker          29 M
 docker-ce-rootless-extras         x86_64         20.10.8-3.el8           docker         4.6 M
 epel-release                      noarch         8-11.el8                epel            23 k

Transaction Summary
===============================================================================================
Upgrade  5 Packages

Total download size: 86 M
Is this ok [y/N]:

[...output truncated...]

Upgraded:
  containerd.io-1.4.9-3.1.el8.x86_64        docker-ce-3:20.10.8-3.el8.x86_64
  docker-ce-cli-1:20.10.8-3.el8.x86_64      docker-ce-rootless-extras-20.10.8-3.el8.x86_64
  epel-release-8-11.el8.noarch

Complete!

6. Reboot the node

Use reboot to restart the node.

Example:

[[email protected] ~]# reboot
Connection to node3 closed by remote host.
Connection to node3 closed.

Use ping to watch for when the node comes back online.

Example:

$ ping node3
PING node3 (173.231.217.231) 56(84) bytes of data.
64 bytes from node3 (173.231.217.231): icmp_seq=1 ttl=60 time=12.1 ms
64 bytes from node3 (173.231.217.231): icmp_seq=2 ttl=60 time=13.9 ms
64 bytes from node3 (173.231.217.231): icmp_seq=3 ttl=60 time=15.1 ms

Note! — If a node fails to come back online, please contact support for assistance.

7. Verify success

When the node comes back online, SSH into it to verify the OpenStack Docker containers have started and to check Ceph’s cluster status.

To verify the Docker containers have started, use docker ps. You should see a number of Docker containers running. Under the STATUS column, each container should reflect the status Up.

Example:

[[email protected] ~]# docker ps
CONTAINER ID   IMAGE                                                                        COMMAND                  CREATED        STATUS                          PORTS     NAMES
6f7590bc2191   harbor.imhadmin.net/kolla/centos-binary-telegraf:victoria                    "dumb-init --single-…"   20 hours ago   Restarting (1) 14 seconds ago             telegraf
67a4d47e8c78   harbor.imhadmin.net/kolla/centos-binary-watcher-api:victoria                 "dumb-init --single-…"   3 days ago     Up 6 minutes                              watcher_api
af815b1dcb5d   harbor.imhadmin.net/kolla/centos-binary-watcher-engine:victoria              "dumb-init --single-…"   3 days ago     Up 6 minutes                              watcher_engine
a52ab61933ac   harbor.imhadmin.net/kolla/centos-binary-watcher-applier:victoria             "dumb-init --single-…"   3 days ago     Up 6 minutes                              watcher_applier
[...output truncated...]

Next, if this node is part of a Ceph cluster, check Ceph’s status using ceph status.

Example:

[[email protected] ~]# ceph status
  cluster:
    id:     06bf4555-7c0c-4b96-a3b7-502bf8f6f213
    health: HEALTH_OK
[...output truncated...]

The above output shows the status as HEALTH_OK, indicating the Ceph cluster is healthy. Ceph is naturally resilient and should recover from a node being rebooted.

How to Obtain Latest OpenStack Images using Kolla Ansible

OpenStack has been deployed using Kolla Ansible. Kolla Ansible deploys OpenStack services as Docker images. These images may need updates periodically. This section explains how to pull the latest images and deploy them.

Getting Started

To get started, first login to a node requiring maintenance as root over SSH. Next, ensure you have prepared a Kolla Ansible environment.


With Kolla Ansible prepared, this section explains how to pull the latest Kolla images.

1. Pull latest Kolla Ansible images

Using Kolla Ansible, pull down the latest images using kolla-ansible -i <path-to-inventory> pull.

The inventory file used is /etc/fm-deploy/kolla-ansible-inventory.

Example:

$ kolla-ansible -i /etc/fm-deploy/kolla-ansible-inventory pull
Pulling Docker images : ansible-playbook -i /etc/fm-deploy/kolla-ansible-inventory -e @/etc/kolla/globals.yml  -e @/etc/kolla/passwords.yml -e CONFIG_DIR=/etc/kolla  -e kolla_action=pull /opt/kolla-ansible/.venv/share/kolla-ansible/ansible/site.yml
[WARNING]: Invalid characters were found in group names but not replaced, use -vvvv to see details
[WARNING]: Could not match supplied host pattern, ignoring: enable_nova_True

PLAY [Gather facts for all hosts] *************************************************************************************************************

TASK [Gather facts] ***************************************************************************************************************************
ok: [localhost]
ok: [smiling-pelican]
ok: [intelligent-squirrel]
ok: [charming-stoat]

[...output truncated...]

PLAY RECAP ************************************************************************************************************************************
charming-stoat             : ok=48   changed=2    unreachable=0    failed=0    skipped=56   rescued=0    ignored=0
intelligent-squirrel       : ok=48   changed=2    unreachable=0    failed=0    skipped=57   rescued=0    ignored=0
localhost                  : ok=4    changed=2    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0
smiling-pelican            : ok=48   changed=3    unreachable=0    failed=0    skipped=56   rescued=0    ignored=0

The above indicates there were no issues with each host when pulling the latest images. You can move on to the next step to deploy the images.

2. Deploy Kolla Ansible images

Next, use Kolla Ansible to deploy the images with the command kolla-ansible -i <path-to-inventory> deploy.

Example:

$ kolla-ansible -i /etc/fm-deploy/kolla-ansible-inventory deploy
Deploying Playbooks : ansible-playbook -i /etc/fm-deploy/kolla-ansible-inventory -e @/etc/kolla/globals.yml  -e @/etc/kolla/passwords.yml -e CONFIG_DIR=/etc/kolla  -e kolla_action=deploy /opt/kolla-ansible/.venv/share/kolla-ansible/ansible/site.yml
[WARNING]: Invalid characters were found in group names but not replaced, use
-vvvv to see details
[WARNING]: Could not match supplied host pattern, ignoring: enable_nova_True

PLAY [Gather facts for all hosts] ***********************************************

TASK [Gather facts] *************************************************************
ok: [localhost]
ok: [smiling-pelican]
ok: [intelligent-squirrel]
ok: [charming-stoat]

[...output truncated...]

PLAY RECAP **********************************************************************
charming-stoat             : ok=300  changed=77   unreachable=0    failed=0    skipped=167  rescued=0    ignored=0
intelligent-squirrel       : ok=458  changed=94   unreachable=0    failed=0    skipped=188  rescued=0    ignored=0
localhost                  : ok=4    changed=2    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0
smiling-pelican            : ok=300  changed=80   unreachable=0    failed=0    skipped=167  rescued=0    ignored=0

The results of this run indicate a successful deployment and this cloud is now using the latest Kolla Ansible images.

NW
Nick West Systems Engineer

Nick is an avid aggressive inline skater, nature enthusiast, and loves working with open source software in a Linux environment.

More Articles by Nick

Was this article helpful? Let us know!