RDO Community News

See also blogs.rdoproject.org

Some lessons an IT department can learn from OpenStack

I have spent a lot of my professional career working as an IT Consultant/Architect. In those positions, you talk to many customers with different backgrounds, and see companies that run their IT in many different ways. Back in 2014, I joined the OpenStack Engineering team at Red Hat, and started being involved with the OpenStack community. And guess what, I found yet another way of managing IT.

These last 3 years have taught me a lot about how to efficiently run an IT infrastructure at scale, and what's better, they proved that many of the concepts I had been previously preaching to customers (automate, automate, automate!) are not only viable, but also required to handle ever-growing requirements with a limited team and budget.

So, would you like to know what I have learnt so far in this 3-year journey?


The OpenStack community relies on several processes to develop a cloud operating system. Most of these processes have evolved over time, and together they allow a very large contributor base to collaborate effectively. Also, we need to manage a complex infrastructure to support this our processes.

  • Infrastructure as code: there are several important servers in the OpenStack infrastructure, providing service to thousands of users every day: the Git repositories, the Gerrit code review infrastructure, the CI bits, etc. The deployment and configuration of all those pieces is automated, as you would expect, and the Puppet modules and Ansible playbooks used to do so are available at their Git repository. There can be no snowflakes, no "this server requires a very specific configuration, so I have to log on and do it manually" excuses. If it cannot be automated, it is not efficient enough. Also, storing our infrastructure definitions as code allows us to take changes through peer-review and CI before applying in production. More about that later.

  • Development practices: each OpenStack project follows the same structure:

    • There is a Project Team Leader (PTL), elected from the project contributors every six months. A PTL acts as a project coordinator, rather than a manager in the traditional sense, and is usually expected to rotate every few cycles.
    • There are several core reviewers, people with enough knowledge on the project to judge if a change is correct or not.
    • And then we have multiple project contributors, who can create patches and peer-review other people's patches.

    Whenever a patch is created, it is sent to review using a code review system, and then:

    • It is checked by multiple CI jobs, that ensure the patch is not breaking any existing functionality.
    • It is reviewed by other contributors.

    Peer review is done by core reviewers and other project contributors. Each of them have the rights to provide different votes:

    • A +2 vote can only be set by a core reviewer, and means that the code looks ok for that core reviewer, and he/she thinks it can be merged as-is.
    • Any project contributor can set a +1 or -1 vote. +1 means "code looks ok to me" while -1 means "this code needs some adjustments". A vote by itself does not provide a lot of feedback, so it is expanded by some comments on what should be changed, if needed.
    • A -2 vote can only be set by a core reviewer, and means that the code cannot be merged until that vote is lifted. -2 votes can be caused by code that goes against some of the project design goals, or just because the project is currently in feature freeze and the patch has to wait for a while.

    When the patch passes all CI jobs, and received enough +2 votes from the core reviewers (usually two), it goes through another round of CI jobs and is finally merged in the repository.

    This may seem as a complex process, but it has several advantages:

    • It ensures a certain level of quality on the master branch, since we have to ensure that CI jobs are passing.
    • It encourages peer reviews, so code should always be checked by more than one person before merging.
    • It engages core reviewers, because they need to have enough knowledge of the project codebase to decide if a patch deserves a +2 vote.
  • Use the cloud: it would not make much sense to develop a cloud operating system if we could not use the cloud ourselves, would it? As expected, all the OpenStack infrastructure is hosted in OpenStack-based clouds donated by different companies. Since the infrastructure deployment and configuration is automated, it is quite easy to manage in a cloud environment. And as we will see later, it is also a perfect match for our continuous integration processes.

  • Automated continuous integration: this is a key part of the development process in the OpenStack community. Each month, 5000 to 8000 commits are reviewed in all the OpenStack projects. This requires a large degree of automation in testing, otherwise it would not be possible to review all those patches manually.

    • Each project defines a number of CI jobs, covering unit and integration tests. These projects are defined as code using Jenkins Job Builder, and reviewed just like any other code contribution.
    • For each commit:
      • Our CI automation tooling will spawn short-lived VMs in one of the OpenStack-based clouds, and add them to the test pool
      • The CI jobs will be executed on those short-lived VMs, and the test results will be fed back as part of the code review
      • The VM will be deleted at the end of the CI job execution

    This process, together with the requirement for CI jobs to pass before merging any code, minimizes the amount of regressions in our codebase.

  • Use (and contribute to) Open Source: one of the "Four Opens" that drive the OpenStack community is Open Source. As such, all of the development and infrastructure processes happen using Open Source software. And not just that, the OpenStack community has created several libraries and applications with great potential for reuse outside the OpenStack use case. Applications like Zuul and nodepool, general-purpose libraries like pbr, or the contributions to the SQLAlchemy library are good examples of this.


So, which tools do we use to make all of this happen? As stated above, the OpenStack community relies on several open source tools to do its work:

  • Infrastructure as code
    • Git to store the infrastructure definitions
    • Puppet and Ansible as configuration management and orchestration tools
  • Development
    • Git as a code repository
    • Gerrit as a code review and repository management tool
    • Etherpad as a collaborative editing tool
  • Continuous integration
    • Zuul as an orchestrator of the gate checks
    • Nodepool to automate the creation and deletion of short-lived VMs for CI jobs across multiple clouds
    • Jenkins to execute CI jobs (actually, it has now been replaced by Zuul itself)
    • Jenkins Job Builder as a tool to define CI jobs as code

Replicating this outside OpenStack

It is perfectly possible to replicate this model outside the OpenStack community. We use it in RDO, too! Although we are very closely related to OpenStack, we have our own infrastructure and tools, following a very similar process for development and infrastructure maintenance.

We use an integrated solution, SoftwareFactory, which includes most of the common tools described earlier (and then some other interesting ones). This allows us to simplify our toolset and have:

  • Infrastructure as code
  • Development and continuous integration
    • https://review.rdoproject.org, our SoftwareFactory instance, to integrate our development and CI workflow
    • Our own RDO Cloud as an infrastructure provider

You can do it, too

Implementing this way of working in an established organization is probably a non-straightforward task. It requires your IT department and application owners to become as cloud-conscious as possible, reduce the amount of micro-managed systems to a minimum, and establish a whole new way of managing your development… But the results speak for themselves, and the OpenStack community (also RDO!) is a proof that it works.

View article »

Running (and recording) fully automated GUI tests in the cloud

The problem

Software Factory is a full-stack software development platform: it hosts repositories, a bug tracker and CI/CD pipelines. It is the engine behind RDO's CI pipeline, but it is also very versatile and suited for all kinds of software projects. Also, I happen to be one of Software Factory's main contributors. :)

Software Factory has many cool features that I won't list here, but among these is a unified web interface that helps navigating through its components. Obviously we want this interface thoroughly tested; ideally within Software Factory's own CI system, which runs on test nodes being provisioned on demand on an OpenStack cloud (If you have read Tristan's previous article, you might already know that Software Factory's nodes are managed and built by Nodepool).

When it comes to testing web GUIs, Selenium is quite ubiquitous because of its many features, among which:

  • it works with most major browsers, on every operating system
  • it has bindings for every major language, making it easy to write GUI tests in your language of choice.¹

¹ Our language of choice, today, will be python.

Due to the very nature of GUI tests, however, it is not easy to fully automate Selenium tests into a CI pipeline:

  • usually these tests are run on dedicated physical machines for each operating system to test, making them choke points and sacrificing resources that could be used somewhere else.
  • a failing test usually means that there is a problem of a graphical nature; if the developer or the QA engineer does not see what happens it is difficult to qualify and solve the problem. Therefore human eyes and validation are still needed to an extent.

Legal issues preventing running Mac OS-based virtual machines on non-Apple hardware aside, it is possible to run Selenium tests on virtual machines without need for a physical display (aka "headless") and also capture what is going on during these tests for later human analysis.

This article will explain how to achieve this on linux-based distributions, more specifically on CentOS.

Running headless (or "Look Ma! No screen!")

The secret here is to install Xvfb (X virtual framebuffer) to emulate a display in memory on our headless machine …

My fellow Software Factory dev team and I have configured Nodepool to provide us with customized images based on CentOS on which to run any kind of jobs. This makes sure that our test nodes are always "fresh", in other words that our test environments are well defined, reproducible at will and not tainted by repeated tests.

The customization occurs through post-install scripts: if you look at our configuration repository, you will find the image we use for our CI tests is sfstack-centos-7 and its customization script is sfstack_centos_setup.sh.

We added the following commands to this script in order to install the dependencies we need:

sudo yum install -y firefox Xvfb libXfont Xorg jre
sudo mkdir /usr/lib/selenium /var/log/selenium /var/log/Xvfb
sudo wget -O /usr/lib/selenium/selenium-server.jar http://selenium-release.storage.googleapis.com/3.4/selenium-server-standalone-3.4.0.jar
sudo pip install selenium```

The dependencies are:

* __Firefox__, the browser on which we will run the GUI tests
* __libXfont__ and __Xorg__ to manage displays
* __Xvfb__
* __JRE__ to run the __selenium server__
* the __python selenium bindings__

Then when the test environment is set up, we start the selenium server and Xvfb
in the background:

/usr/bin/java -jar /usr/lib/selenium/selenium-server.jar -host >/var/log/selenium/selenium.log 2>/var/log/selenium/error.log
Xvfb :99 -ac -screen 0 1920x1080x24 >/var/log/Xvfb/Xvfb.log 2>/var/log/Xvfb/error.log```

Finally, set the display environment variable to :99 (the Xvfb display) and run your tests:

export DISPLAY=:99

The tests will run as if the VM was plugged to a display.

## Taking screenshots

With this headless setup, we can now run GUI tests on virtual machines within our
automated CI; but we need a way to visualize what happens in the GUI if a test

It turns out that the selenium bindings have a screenshot feature that we can use
for that. Here is how to define a decorator in python that will save a screenshot
if a test fails.

import functools
import os
import unittest
from selenium import webdriver


def snapshot_if_failure(func):
    def f(self, *args, **kwargs):
            func(self, *args, **kwargs)
        except Exception as e:
            path = '/tmp/gui/'
            if not os.path.isdir(path):
            screenshot = os.path.join(path, '%s.png' % func.__name__)
            raise e
    return f

class MyGUITests(unittest.TestCase):
    def setUp(self):
        self.driver = webdriver.Firefox()

    def test_login_page(self):

If test_login_page fails, a screenshot of the browser at the time of the exception will be saved under /tmp/gui/test_login_page.png.

Video recording

We can go even further and record a video of the whole testing session, as it turns out that ffmpeg can capture X sessions with the "x11grab" option. This is interesting beyond simply test debugging, as the video can be used to illustrate the use cases that you are testing, for demos or fancy video documentations.

In order to have ffmpeg on your test node, you can either add compilation steps to the node's post-install script or go the easy way and use an external repository:

# install ffmpeg
sudo rpm --import http://li.nux.ro/download/nux/RPM-GPG-KEY-nux.ro
sudo rpm -Uvh http://li.nux.ro/download/nux/dextop/el7/x86_64/nux-dextop-release-0-1.el7.nux.noarch.rpm
sudo yum update
sudo yum install -y ffmpeg

To record the Xfvb buffer, you'd simply run

export FFREPORT=file=/tmp/gui/ffmpeg-$(date +%Y%m%s).log && ffmpeg -f x11grab -video_size 1920x1080 -i$DISPLAY -codec:v mpeg4 -r 16 -vtag xvid -q:v 8 /tmp/gui/tests.avi ```

The catch is that ffmpeg expects the user to press __q__ to stop the recording
and save the video (killing the process will corrupt the video). We can use
[tmux](https://tmux.github.io/) to save the day; run your GUI tests like so:

export DISPLAY=:99
tmux new-session -d -s guiTestRecording 'export FFREPORT=file=/tmp/gui/ffmpeg-$(date +%Y%m%s).log && ffmpeg -f x11grab -video_size 1920x1080 -i'$DISPLAY' -codec:v mpeg4 -r 16 -vtag xvid -q:v 8 /tmp/gui/tests.avi && sleep 5'
tmux send-keys -t guiTestRecording q

Accessing the artifacts

Nodepool destroys VMs when their job is done in order to free resources (that is, after all, the spirit of the cloud). That means that our pictures and videos will be lost unless they're uploaded to an external storage.

Fortunately Software Factory handles this: predefined publishers can be appended to our jobs definitions; one of which allows to push any artifact to a Swift object store. We can then retrieve our videos and screenshots easily.


With little effort, you can now run your selenium tests on virtual hardware as well to further automate your CI pipeline, while still ensuring human supervision.

Further reading

View article »

Recent blog posts

Here's what other RDO users have been blogging about in the past few weeks:

Dear RDO Enthusiast by rainsdance

Remember that post from over a month ago?

Read more at http://groningenrain.nl/dear-rdo-enthusiast/

Red Hat Virtualization 4.1 is LIVE! by CaptainKVM

Today marks another milestone in the evolution of our flagship virtualization platform, Red Hat Virtualization (RHV), as we announce the release of version 4.1. There are well over 165 new features, and while I don’t have the space to cover all of the new features, I would like to highlight some of them, especially in the area of integration. But first I’d like to put that integration into perspective.

Read more at http://rhelblog.redhat.com/2017/04/19/red-hat-virtualization-4-1-is-live/

More than 60 Red Hat-led sessions confirmed for OpenStack Summit Boston by Peter Pawelski, Product Marketing Manager, Red Hat OpenStack Platform

This Spring’s 2017 OpenStack Summit in Boston should be another great and educational event. The OpenStack Foundation has posted the final session agenda detailing the entire week’s schedule of events. And once again Red Hat will be very busy during the four-day event, including delivering more than 60 sessions, from technology overviews to deep dive’s around the OpenStack services for containers, storage, networking, compute, network functions virtualization (NFV), and much, much more. 

Read more at http://redhatstackblog.redhat.com/2017/04/18/60-red-hat-sessions-openstack-summit-boston/

Using the OPTIONS Verb for RBAC by Adam Young

Lets say you have  RESTful Web Service.  For any given URL, you might support one or more of the HTTP verbs:  GET, PUT, POST, DELETE and so on.  A user might wonder what they mean, and which you actually support.  One way of reporting that is by using the OPTION Verb.  While this is a relatively unusual verb, using it to describe a resource is a fairly well known mechanism.  I want to take it one step further.

Read more at http://adam.younglogic.com/2017/04/options-verb-rbac/

OpenStack Days Poland 2017 by rainsdance

I am super excited to introduce guest blogger, Ana Krivokapić, who ran over to OpenStack Days Poland to represent Red Hat as the TripleO RDO OpenStack guru in residence. Thanks so much to Ana for attending, answering any technical TripleO RDO OpenStack questions sent her way, and reporting back on her experience.

Read more at http://groningenrain.nl/openstack-days-poland-2017/

3 ways you find the right type of contributor and where to find them by rbowen

Another one of Stormy’s questions caught my eye:

Read more at http://drbacchus.com/3-ways-you-find-the-right-type-of-contributor-and-where-to-find-them/

View article »

Introducing Rain Leander

Dear RDO Enthusiast,

We have some important news this week about what’s shifting in the RDO community.

As you may know, Rich Bowen has been serving in the role of Community Liaison for the last 4 years. In that capacity, he’s done a variety of things for the community, including event coordination, social media, podcasts and videos, managing the website, and so on.

Starting next week, this role is going to include Rain Leander as TripleO Community Liaison. Rain has been working with the RDO community, and, more generally, with Red Hat’s upstream OpenStack development efforts, for the past 18 months. She’s helped out at a number of events, including two OpenStack Summits, numerous OpenStack Days and Meetups. And she’s been a passionate advocate of TripleO in the community at large.

You may have seen her at some of these events and you’ve probably seen her on IRC as leanderthal. Please give her all the support that you’ve given Rich as she moves into this role.

If you have any questions about how this is going to work, what Rain’s priorities will be in the coming months, or concerns about getting stuff done in the next few months, please don’t hesitate to contact either one of us via email (rain@redhat.com and rbowen@redhat.com), on IRC (leanderthal and rbowen) or via Twitter (@rainleander and @rbowen).


Rich Bowen and Rain Leander

View article »

Recent blog posts

I haven't done an update in a few weeks. Here are some of the blog posts from our community in the last few weeks.

Red Hat joins the DPDK Project by Marcos Garcia - Principal Technical Marketing Manager

Today, the DPDK community announced during the Open Networking Summit that they are moving the project to the Linux Foundation, and creating a new governance structure to enable companies to engage with the project, and pool resources to promote the DPDK community. As a long-time contributor to DPDK, Red Hat is proud to be a founding Gold member of the new DPDK Project initiative under the Linux Foundation.

Read more at http://redhatstackblog.redhat.com/2017/04/06/red-hat-joins-the-dpdk-project/

What’s new in OpenStack Ocata by rbowen

OpenStack Ocata has now been out for a little over a month – https://releases.openstack.org/ – and we’re about to see the first milestone of the Pike release. Past cycles show that now’s about the time when people start looking at the new release to see if they should consider moving to it. So here’s a quick overview of what’s new in this release.

Read more at http://drbacchus.com/whats-new-in-openstack-ocata/

Steve Hardy: OpenStack TripleO in Ocata, from the OpenStack PTG in Atlanta by Rich Bowen

Steve Hardy talks about TripleO in the Ocata release, at the Openstack PTG in Atlanta.

Read more at http://rdoproject.org/blog/2017/04/steve-hardy-openstack-tripido-in-ocata-from-the-openstack-ptg-in-atlanta/

Using a standalone Nodepool service to manage cloud instances by tristanC

Nodepool is a service used by the OpenStack CI team to deploy and manage a pool of devstack images on a cloud server for use in OpenStack project testing.

Read more at http://rdoproject.org/blog/2017/03/standalone-nodepool/

Red Hat Summit 2017 – Planning your OpenStack labs by Eric D. Schabell

This year in Boston, MA you can attend the Red Hat Summit 2017, the event to get your updates on open source technologies and meet with all the experts you follow throughout the year.

Read more at http://redhatstackblog.redhat.com/2017/04/04/red-hat-summit-2017-planning-your-openstack-labs/

Stephen Finucane - OpenStack Nova - What's new in Ocata by Rich Bowen

At the OpenStack PTG in February, Stephen Finucane speaks about what's new in Nova in the Ocata release of OpenStack.

Read more at http://rdoproject.org/blog/2017/03/stephen-finucane-openstack-nova-whats-new-in-ocata/

Zane Bitter - OpenStack Heat, OpenStack PTG, Atlanta by Rich Bowen

At the OpenStack PTG last month, Zane Bitter speaks about his work on OpenStack Heat in the Ocata cycle, and what comes next.

Read more at http://rdoproject.org/blog/2017/03/zane-bitter-openstack-heat-openstack-ptg-atlanta/

The journey of a new OpenStack service in RDO by amoralej

When new contributors join RDO, they ask for recommendations about how to add new services and help RDO users to adopt it. This post is not a official policy document nor a detailed description about how to carry out some activities, but provides some high level recommendations to newcomers based on what I have learned and observed in the last year working in RDO.

Read more at http://rdoproject.org/blog/2017/03/the-journey-of-a-service-in-rdo/

InfraRed: Deploying and Testing Openstack just made easier! by bregman

Deploying and testing OpenStack is very easy. If you read the headline and your eyebrows raised, you are at the right place. I believe that most of us, who experienced at least one deployment of OpenStack, will agree that deploying OpenStack can be a quite frustrating experience. It doesn’t matter if you are using it for […]

Read more at http://abregman.com/2017/03/20/infrared-deploying-and-testing-openstack-just-made-easier/

View article »

Steve Hardy: OpenStack TripleO in Ocata, from the OpenStack PTG in Atlanta

Steve Hardy talks about TripleO in the Ocata release, at the Openstack PTG in Atlanta.

Steve: My name is Steve Hardy. I work primarily on the TripleO project, which is an OpenStack deployment project. What makes TripleO interesting is that it uses OpenStack components primarily in order to deploy a production OpenStack cloud. It uses OpenStack Ironic to do bare metal provisioning. It uses Heat orchestration in order to drive the configuration workflow. And we also recently started using Mistral, which is an OpenStack workflow component.

So it's kind of different from some of the other deployment initiatives. And it's a nice feedback loop where we're making use of the OpenStack services in the deployment story, as well as in the deployed cloud.

This last couple of cycles we've been working towards more composability. That basically means allowing operators more flexibility with service placement, and also allowing them to define groups of node in a more flexible way so that you could either specify different configurations - perhaps you have multiple types of hardware for different compute configurations for Nova, or perhaps you want to scale services into particular groups of clusters for particular services.

It's basically about giving more choice and flexibility into how they deploy their architecture.

Rich: Upgrades have long been a pain point. I understand there's some improvement in this cycle there as well?

Steve: Yes. Having delivered composable services and composable roles for the Newton OpenStack release, the next big challenge was giving operators the flexibility to deploy services on arbitrary nodes in your OpenStack environment, you need some way to upgrade, and you can't necessarily make assumptions about which service is running on which group of nodes. So we've implented the new feature which is called composable upgrades. That uses some Heat functionality combined with Ansible tasks, in order to allow very flexible dynamic definition of what upgrade actions need to take place when you're upgrading some specific group of nodes within your environment. That's part of the new Ocata release. It's hopefully going to provide a better upgrade experience, for end-to-end upgrades of all the OpenStack services that TripleO supports.

Rich: It was a very short cycle. Did you get done what you wanted to get done, or are things pushed off to Pike now.

Steve: I think there's a few remaining improvements around operator-driven upgrades, which we'll be looking at during the Pike cycle. It certainly has been a bit of a challenge with the short development timeframe during Ocata. But the architecture has landed, and we've got composable upgrade support for all the services in Heat upstream, so I feel like we've done what we set out to do in this cycle, and there will be further improvements around operator-drive upgrade workflow and also containerization during the Pike timeframe.

Rich: This week we're at the PTG. Have you already had your team meetings, or are they still to come.

Steve: The TripleO team meetings start tomorrow, which is Wednesday. The previous two days have mostly been cross-project discussion. Some of which related to collaborations which may impact TripleO features, some of which was very interesting. But the TripleO schedule starts tomorrow - Wednesday and Thursday. We've got a fairly packed agenda, which is going to focus around - primarily the next steps for upgrades, containerization, and ways that we can potentially collaborate more closely with some of the other deployment projects within the OpenStack community.

Rich: Is Kolla something that TripleO uses to deploy, or is that completely unrelated?

Steve: The two projects are collaborating. Kolla provides a number of components, one of which is container definitions for the OpenStack services themselves, and the containerized TripleO architecture actually consumes those. There are some other pieces which are different between the two projects. We use Heat to orchestrate container deployment, and there's an emphasis on Ansible and Kubernetes on the Kolla side, where we're having discussions around future collaboration.

There's a session planned on our agenda for a meeting between the Kolla Kubernetes folks and TripleO folks to figure out of there's long-term collaboration there. But at the moment there's good collaboration around the container definitions and we just orchestrate deploying those containers.

We'll see what happens in the next couple of days of sessions, and getting on with the work we have planned for Pike.

Rich: Thank you very much.

View article »

Using a standalone Nodepool service to manage cloud instances

Nodepool is a service used by the OpenStack CI team to deploy and manage a pool of devstack images on a cloud server for use in OpenStack project testing.

This article presents how to use Nodepool to manage cloud instances.


For the purpose of this demonstration, we'll use a CentOS system and the Software Factory distribution to get all the requirements:

sudo yum install -y --nogpgcheck https://softwarefactory-project.io/repos/sf-release-2.5.rpm
sudo yum install -y nodepoold nodepool-builder gearmand
sudo -u nodepool ssh-keygen -N '' -f /var/lib/nodepool/.ssh/id_rsa

Note that this installs nodepool version 0.4.0, which relies on Gearman and still supports snapshot based images. More recent versions of Nodepool require a Zookeeper service and only support diskimage builder images. Even though the usage is similar and easy to adapt.


Configure a cloud provider

Nodepool uses os-client-config to define cloud providers and it needs a clouds.yaml file like this:

cat > /var/lib/nodepool/.config/openstack/clouds.yaml <<EOF
      username: "${OS_USERNAME}"
      password: "${OS_PASSWORD}"
      auth_url: "${OS_AUTH_URL}"
    project_name: "${OS_PROJECT_NAME}"
      - "${OS_REGION_NAME}"

Using the OpenStack client, we can verify that the configuration is correct and get the available network names:

sudo -u nodepool env OS_CLOUD=le-cloud openstack network list

Diskimage builder elements

Nodepool uses disk-image-builder to create images locally so that the exact same image can be used across multiple clouds. For this demonstration we'll use a minimal element to setup basic ssh access:

mkdir -p /etc/nodepool/elements/nodepool-minimal/{extra-data.d,install.d}

In extra-data.d, scripts are executed outside of the image and the one bellow is used to authorize ssh access:

cat > /etc/nodepool/elements/nodepool-minimal/extra-data.d/01-user-key <<'EOF'
set -ex
cat /var/lib/nodepool/.ssh/id_rsa.pub > $TMP_HOOKS_PATH/id_rsa.pub
chmod +x /etc/nodepool/elements/nodepool-minimal/extra-data.d/01-user-key

In install.d, scripts are executed inside the image and the following is used to create a user and install the authorized_key file:

cat > /etc/nodepool/elements/nodepool-minimal/install.d/50-jenkins <<'EOF'
set -ex
useradd -m -d /home/jenkins jenkins
mkdir /home/jenkins/.ssh
mv /tmp/in_target.d/id_rsa.pub /home/jenkins/.ssh/authorized_keys
chown -R jenkins:jenkins /home/jenkins

# Nodepool expects this dir to exist when it boots slaves.
mkdir /etc/nodepool
chmod 0777 /etc/nodepool
chmod +x /etc/nodepool/elements/nodepool-minimal/install.d/50-jenkins

Note: all the examples in this articles are available in this repository: sf-elements. More information to create elements is available here.

Nodepool configuration

Nodepool main configuration is /etc/nodepool/nodepool.yaml:

elements-dir: /etc/nodepool/elements
images-dir: /var/lib/nodepool/dib

  cleanup: '*/30 * * * *'
  check: '*/15 * * * *'

  - name: default

  - host: localhost

  - name: dib-centos-7
      - centos-minimal
      - vm
      - dhcp-all-interfaces
      - growroot
      - openssh-server
      - nodepool-minimal

  - name: default
    cloud: le-cloud
      - name: centos-7
        diskimage: dib-centos-7
        username: jenkins
        private-key: /var/lib/nodepool/.ssh/id_rsa
        min-ram: 2048
      - name: defaultnet
    max-servers: 10
    boot-timeout: 120
    clean-floating-ips: true
    image-type: raw
    pool: nova
    rate: 10.0

  - name: centos-7
    image: centos-7
    min-ready: 1
      - name: default

Nodepool uses a gearman server to get node requests and to dispatch image rebuild jobs. We'll uses a local gearmand server on localhost. Thus, Nodepool will only respect the min-ready value and it won't dynamically start node.

Diskimages define images' names and dib elements. All the elements provided by dib, such as centos-minimal, are available, here is the full list.

Providers define specific cloud provider settings such as the network name or boot timeout. Lastly, labels define generic names for cloud images to be used by jobs definition.

To sum up, labels reference images in providers that are constructed with disk-image-builder.

Create the first node

Start the services:

sudo systemctl start gearmand nodepool nodepool-builder

Nodepool will automatically initiate the image build, as shown in /var/log/nodepool/nodepool.log: WARNING nodepool.NodePool: Missing disk image centos-7. Image building logs are available in /var/log/nodepool/builder-image.log.

Check the building process:

# nodepool dib-image-list
| ID | Image        | Filename                                      | Version    | State    | Age         |
| 1  | dib-centos-7 | /var/lib/nodepool/dib/dib-centos-7-1490688700 | 1490702806 | building | 00:00:00:05 |

Once the dib image is ready, nodepool will upload the image: nodepool.NodePool: Missing image centos-7 on default When the image fails to build, nodepool will try again indefinitely, look for "after-error" in builder-image.log.

Check the upload process:

# nodepool image-list
| ID | Provider | Image    | Hostname | Version    | Image ID | Server ID | State    | Age         |
| 1  | default  | centos-7 | centos-7 | 1490703207 | None     | None      | building | 00:00:00:43 |

Once the image is ready, nodepool will create an instance nodepool.NodePool: Need to launch 1 centos-7 nodes for default on default:

# nodepool list
| ID | Provider | AZ   | Label    | Target  | Manager | Hostname           | NodeName           | Server ID | IP   | State    | Age         |
| 1  | default  | None | centos-7 | default | None    | centos-7-default-1 | centos-7-default-1 | XXX       | None | building | 00:00:01:37 |

Once the node is ready, you have completed the first part of the process described in this article and the Nodepool service should be working properly. If the node goes directly from the building to the delete state, Nodepool will try to recreate the node indefinitely. Look for errors in nodepool.log. One common mistake is to have an incorrect provider network configuration, you need to set a valid network name in nodepool.yaml.

Nodepool operations

Here is a summary of the most common operations:

  • Force the rebuild of an image: nodepool image-build image-name
  • Force the upload of an image: nodepool image-upload provider-name image-name
  • Delete a node: nodepool delete node-id
  • Delete a local dib image: nodepool dib-image-delete image-id
  • Delete a glance image: nodepool image-delete image-id

Nodepool "check" cron periodically verifies that nodes are available. When a node is shutdown, it will automatically recreate it.

Ready to use application deployment with Nodepool

As a Cloud developper, it is convenient to always have access to a fresh OpenStack deployment for testing purpose. It's easy to break things and it takes time to recreate a test environment, so let's use Nodepool.

First we'll add a new elements to pre-install the typical rdo requirements:

  - name: dib-rdo-newton
      - centos-minimal
      - nodepool-minimal
      - rdo-requirements
      RDO_RELEASE: "ocata"

  - name: default
      - name: rdo-newton
        diskimage: dib-rdo-newton
        username: jenkins
        min-ram: 8192
        private-key: /var/lib/nodepool/.ssh/id_rsa
        ready-script: run_packstack.sh

Then using a ready-script, we can execute packstack to deploy services after the node has been created:

  - name: rdo-ocata
    image: rdo-ocata
    min-ready: 1
    ready-script: run_packstack.sh
      - name: default

Once the node is ready, use nodepool list to get the IP address:

# ssh -i /var/lib/nodepool/.ssh/id_rsa jenkins@node
jenkins$ . keystonerc_admin
jenkins (keystone_admin)$ openstack catalog list
| Name      | Type      | Endpoints                     |
| keystone  | identity  | RegionOne                     |
|           |           |   public: http://node:5000/v3 |

To get a new instance, either terminate the current one, or manually delete it using nodepool delete node-id. A few minutes later you will have a fresh and pristine environment!

View article »

Stephen Finucane - OpenStack Nova - What's new in Ocata

At the OpenStack PTG in February, Stephen Finucane speaks about what's new in Nova in the Ocata release of OpenStack.

Stephen: I'm Stephen Finucane, and I work on Nova for Red Hat.

I've previously worked at Intel. During most of my time working on Nova I've been focused on the same kind of feature set, which is what Intel liked to call EPA - Enhanced Platform Awareness - or NFV applications. Making Nova smarter from the perspective of Telco applications. You have all this amazing hardware, how do you expose that up and take full advantage of that when you're running virtualized applications?

The Ocata cycle was a bit of an odd one for me, and probably for the project itself, because it was really short. The normal cycle runs for about six months. This one ran for about four.

During the Ocata cycle I actually got core status. That was probably as a result of doing a lot of reviews. Lot of reviews, pretty much every waking hour, I had to do reviews. And that was made possible by the fact that I didn't actually get any specs in for that cycle.

So my work on Nova during that cycle was mostly around reviewing Python 3 fixes. It's still very much a community goal to get support in Python 3. 3.5 in this case. Also a lot of work around improving how we do configuration - making it so that administrators can actually understand what different knobs and dials Nova exposes, what they actually mean, and what the implications of changing or enabling them actually are.

Both of these have been going in since before the Ocata cycle, and we made really good progress during the Ocata cycle to continue to get ourselves 70 or 80% of the way there, and in the case of config options, the work is essentially done there at this point.

Outside of that, the community as a whole, most of what went on this cycle was again a continuation of work that has been going on the last couple cycles. A lot of focus on the maturity of Nova. Not so much new features, but improving how we did existing features. A lot of work on resource providers, which are a way that we can keep track of the various resources that Nova's aware of, be they storage, or cpu, or things like that.

Coming forward, as far as Pike goes, it's still very much up in the air. That's what we're here for this week discussing. There would be, from my perspective, a lot of the features that I want to see, doubling down on the NFV functionality that Nova supports. Making things like SR-IOV easier to use, and more performant, where possible. There's also going to be some work around resource providers again for SR-IOV and NFV features and resources that we have.

The other stuff that the community is looking at, pretty much up in the air. The idea of exposing capabilities, something that we've had a lot of discussion about already this week, and I epxect we'll have a lot more. And then, again, evolution of the Nova code base - what more features the community wants, and various customers want - going and providing those.

This promises to be a very exciting cycle, on account of the fact that we're back into the full six month mode. There's a couple of new cores on board, and Nova itself is full steam ahead.

View article »

Zane Bitter - OpenStack Heat, OpenStack PTG, Atlanta

At the OpenStack PTG last month, Zane Bitter speaks about his work on OpenStack Heat in the Ocata cycle, and what comes next.

Rich: Tell us who you are and what you work on.

Zane: My name is Zane Bitter, and I work at Red Hat on Heat … mostly on Heat. I'm one of the original Heat developers. I've been working on the project since 2012 when it started.

Heat is the orchestration service for OpenStack. It's about managing how you create and maintain your resources that you're using in your OpenStack cloud over time. It manages dependencies between various things you have to spin up, like servers, volumes, networks, ports, all those kinds of things. It allows you to define in a declarative way what resources you want and it does the job of figuring out how to create them in the right order and do it reasonably efficiently. Not waiting too long between creating stuff, but also making sure you have all the dependencies, in the right order.

And then it can manage those deployments over time as well. If you want to change your thing, it can figure out what you need do to change, if you need to replace a resource, what it needs to do to replace a resource, and get everything pointed to the right things again.

Rich: What is new in Ocata? What have you been working on in this cycle?

Zane: What I've been working on in Ocata is having a way of auto-healing services. If your service dies for some reason, you'd like that to recover by itself, rather than having to page someone and say, hey, my service is down, and then go in there and manually fix things up. So I've been working on integration between a bunch of different services, some of which started during the previous cycle.

I was working with Fei Long Wang from Catalyst IT who is PTL of Zaqar, getting some integration work between Zaqar and Mistral, so you can now trigger a Mistral workflow from a message on the Zaqar queue. So if you set that up as a subscription in Zaqar, it can fire off a thing when it gets a message on that queue, saying, hey, Mistral, run this workflow.

That in turn is integrated with Aodh - ("A.O.D.H". as, some people call it. I'm told the correct pronunciation is Aodh.) - which is the alarming service for OpenStack. It can …

Rich: For some reason, I thought it was an acronym.

Zane: No, it's an Irish name.

Rich: That's good to know.

Zane: Eoghan Glynn was responsible for that one.

You can set up the alarm action for an alarm in Aodh to be to post a message to this queue. When you combine these together, that means that when an alarm goes off, it posts a message to a queue, and that can trigger a workflow.

What I've been working on in Ocata is getting that all packaged up into Heat templates so we have all the resources to create the alarm in Aodh, hook it up with the subscription … hook up the Zaqar queue to a Mistral subscription, and have that all configured in a template along with the workflow action, which is going to call Heat, and say, this server is unhealthy now. We know from external to Heat, we know that this server is bad, and then kick off the action which is to mark the server unhealthy. We then create a replacement, and then when that service is back up, we remove the old one.

Rich: Is that done, or do you still have stuff to do in Pike.

Zane: It's done. It's all working. It's in the Heat templates repository, there's an example in there, so you can try that out. There's a couple caveats. There's a missfeature in Aodh - there's a delay between when you create the alarm and when … there's a short period where, when an event comes in, it may not trigger an alarm. That's one caveat. But other than that, once it's up and working it works pretty reliably.

The other thing I should mention is that you have to turn on event alarms in Aodh, which is basically triggering alarms off of events in the … on the Oslo messaging notification bus, which is not on by default, but it's a one line configuration change.

Rich: What can we look forward to in Pike, or is it too early in the week to say yet?

Zane: We have a few ideas for Pike. I'm planning to work on a template where … so, Zaqar has pre-signed URLs, so you can drop a pre-signed URL into an instance, and allow that instance … node server, in other words … to post to that Zaqar queue without having any Keystone credentials, and basically all it can do with that URL is post to that one queue. Similar to signed URLs in ____. What that should enable us to do is create a template where we're putting signed URLs, with an expiry, into a server, and then we can, before that expires, we can re-create it, so we can have updating credentials, and hook that up to a Mistral subscription, and that allows the service to kick off a Mistral work flow to do something the application needs to do, without having credentials for anything else in OpenStack. So you can let both Mistral and Heat use Keystone trusts, to say, I will offer it on behalf of the user who created this workflow. So if we can allow them to trigger that through Zaqar, there's a pretty secure way of giving applications access to modify stuff in the OpenStack cloud, but locking it down to only the stuff you want modified, and not risking that if someone breaks into your VM, they've got your Keystone credentials and can do whatever they want withour account.

That's one of the things I'm hoping to work on.

As well, we're continuing with Heat development. We've switched over to the new convergence architecture. In Newton, I think, was the first release to have that on by default. We're looking at improving performance with that now. We've got the right architecture for scaling out to a lot of Heat engines. Right now, it's a little heavy on database, a little heavy on memory, which is the tradeoff you make when you go from a monolithic architecture, which can be quite efficient, but doesn't scale out well, to, you scale out but there's potentially performance problems. I think there's some low-hanging fruit there, we should be able to crank up performance. Memory use, and database accesses. Look for better performance out of the convergence architecture in Heat, coming up in Pike.

View article »

The journey of a new OpenStack service in RDO

When new contributors join RDO, they ask for recommendations about how to add new services and help RDO users to adopt it. This post is not a official policy document nor a detailed description about how to carry out some activities, but provides some high level recommendations to newcomers based on what I have learned and observed in the last year working in RDO.

Note that you are not required to follow all these steps and even you can have your own ideas about it. If you want to discuss it, let us know your thoughts, we are always open to improvements.

1. Adding the package to RDO

The first step is to add the package(s) to RDO repositories as shown in RDO documentation. This tipically includes the main service package, client library and maybe a package with a plugin for horizon.

In some cases new packages require some general purpose libraries. If they are not in CentOS base channels, RDO imports them from Fedora packages into a dependencies repository. If you need a new dependency which already exists in Fedora, just let us know and we'll import it into the repo. If it doesn't exist, you'll have to add the new package into Fedora following the existing process.

2. Create a puppet module

Although there are multiple deployment tools for OpenStack based on several frameworks, puppet is widely used by different tools or even directly by operators so we recommend to create a puppet module to deploy your new service following the Puppet OpenStack Guide. Once the puppet module is ready, remember to follow the RDO new package process to get it packaged in the repos.

3. Make sure the new service is tested in RDO-CI

As explained in a previous post we run several jobs in RDO CI to validate the content of our repos. Most of the times the first way to get it tested is by adding the new service to one of the puppet-openstack-integration scenarios which is also recommended to get the puppet module tested in upstream gates. An example of how to add a new service into p-o-i is in this review.

4. Adding deployment support in Packstack

If you want to make it easier for RDO users to evaluate a new service, adding it to Packstack is a good idea. Packstack is a puppet-based deployment tool used by RDO users to deploy small proof of concept (PoC) environments to evaluate new services or configurations before deploying it in their production clouds. If you are interested you can take a look to these two reviews which added support for Panko and Magnum in Ocata cycle.

5. Add it to TripleO

TripleO is a powerful OpenStack management tool able to provision and manage cloud environments with production-ready features, as high availability, extended security, etc… Adding support for new services in TripleO will help the users to adopt it for their cloud deployments. The TripleO composable roles tutorial can guide you about how to do it.

6. Build containers for new services

Kolla is the upstream project providing container images and deployment tools to operate OpenStack clouds using container technologies. Kolla supports building images for CentOS distro using binary method which uses packages from RDO. Operators using containers will have it easier it if you add containers for new services.

Other recomendations

Follow OpenStack governance policies

RDO methodology and tooling is conceived according to OpenStack upstream release model, so following policies about release management and requirements is a big help to maintain packages in RDO. It's specially important to create branches and version tags as defined by the releases team.

Making potential users aware of availability of new services or other improvements is a good practice. RDO provides several ways to do this as sending mails to our mailing lists, writing a post in the blog, adding references in our documentation, creating screencast demos, etc… You can also join the RDO weekly meeting to let us know about your work.

Join RDO Test Days

RDO organizes test days at several milestones during each OpenStack release cycle. Although we do Continuous Integration testing in RDO, it's good to test that it can be deployed following the instructions in the documentation. You can propose new services or configurations in the test matrix and add a link to the documented instructions about how to do it.

Upstream documentation

RDO relies on upstream OpenStack Installation Guide for deployment instructions. Keeping it up to date is recommended.

View article »