Bursting to the Cloud with VMware Event Broker Appliance, AWS Lambda and VMware Cloud on AWS

I was doing some research on who came up with the term “Cloud Bursting” – essentially the ability to leverage resources in the Cloud when demand for an on-prem application exceeds current capacity – and unsurprisingly, I came across Jeff Barr’s blog post from 2008 (Jeff is the Chief Evangelist for AWS).

The way Jeff defined “cloudbursting” in the post was the following:

Cloudbursting is an application hosting model which combines existing corporate infrastructure with new, cloud-based infrastructure to create a powerful, highly scalable application hosting environment.

There are many ways you might want to interpret “cloud bursting” but there are a couple of obvious requirements:

  1. Ability to monitor the “on-prem” capacity and trigger some event/alarm when the capacity level passes a threshold.
  2. If/when this happens, we need the ability to automatically create resources in the cloud.
  3. We need the ability for the application to be re-balanced between on-prem and the infrastructure in the cloud. It might be done, for example, through some sort of load-balancing, but the main aspect is to relieve pressure on the on-prem environment.

Having met with hundreds of customers since I joined VMware 4.5 years ago, I have met many who, at one point or another, ran into some capacity issues and would have benefited from cloud bursting.

And while VMware Cloud on AWS is a strong option to expand an existing VMware environment, we didn’t have the ability to automatically create VMware Cloud resources when we crossed a capacity threshold.

Wouldn’t it be nice if you could deploy automatically a VMC on AWS SDDC if/when you run out of capacity on-prem? Wouldn’t it be nice if you could have fewer resources on-site and rely on the Cloud if/when you had that seasonal or project demand?

Introducing VMware Cloud Bursting (not an actual name – I just made this one up 😁 ).

Let’s start with the demo first and then I’ll walk through how it works under the hood.

What is the overall concept behind it?

The concept is:

  1. Monitor the capacity of your on-prem vCenter (in my example, I will monitor the vSAN capacity but you could look at memory or CPU thresholds too).
  2. When the capacity exceeds a threshold, it will trigger an alarm and/or event.
  3. Based on the event, we will dynamically deploy a VMware Cloud on AWS environment (SDDC).

How does it work under the hood?

We need an engine to trigger an action based on an event. For VMware events, the best way to do that would be to leverage the VMware Event Broker Appliance (VEBA).

VEBA, in my simple terms, provides a serverless engine for VMware infrastructure (VEBA ==== Lambda for VMware). It picks up events coming from vCenter and can automatically trigger actions automatically from it. There are some good examples here.

VEBA works with two Function-As-A-Service engines: OpenFaaS or Amazon EventBridge. Most VEBA users might leverage OpenFaas but I will leverage EventBridge as it integrates easily with AWS Lambda (and you can see from previous posts I am a fan of Lambda).

EventBridge is essentially a rebranded version of CloudWatch Events, but with wider support of triggers. I used CloudWatch Events for a previous blog post.

VEBA will forward the event to EventBridge who will in turn trigger a Lambda function (written in Python) that will ask VMware Cloud to deploy a VMC SDDC.

Do I need Python or Lambda skills?

Nope, you will see it’s pretty straight-forward to set up. The only thing you need is a vCenter and a VMware Cloud on AWS account to burst into.

Note that the instructions below are by leveraging Amazon EventBridge so you will need an AWS account (you would need one anyway if you’re going to burst into VMware Cloud on AWS).

The main steps are:

  1. Set up Amazon EventBridge
  2. Set up AWS Lambda
  3. Deploy VMware Event Broker Appliance

Setting up Amazon EventBridge

William Lam and Patrick Kremer have published some excellent posts on this topic so I won’t go into a huge amount of details here.

In summary, the Amazon EventBridge bus and rules associated with the Event Bus will watch for an event (forwarded by VEBA) and will trigger the execution of an AWS Lambda function.

Go to the AWS Console. Create a new Event Bus. This is mine:

Once you’ve done this, go to the “Rules” and create a new Rule. Make sure you select the new Event bus you have created in the previous step.

Create the rule. Select “Event pattern” and “Custom Pattern”.

Here, what we’re doing is specifying which event will trigger the Lambda function by defining an Event pattern. We don’t want to trigger the function for any event but for a very specific one.

When we exceed capacity in the cluster on-prem, we get an alarm and the vSAN health goes from “Green” to “Yellow”.

Initially, I wasn’t sure which format the alert would come. Michael Gasch gave some tips on how to create the correct syntax:

With EventBridge, you can filter which events will trigger an action based on content in the packet event. The guide here explains some of the concepts.

{
  "detail": {
    "subject": [
      "AlarmStatusChangedEvent"
    ]
  }
}

If you filter based on the Event Pattern above, all events with “AlarmStatusChangedEvent” will be accepted. All events are actually logged into AWS CloudWatch Logs. For example, one of the alarms I got when we exceeded the storage threshold was in the following format:

{
    "version": "0",
    "id": "64b65d37-99f0-38f3-ef12-659547cf5711",
    "detail-type": "AlarmStatusChangedEvent",
    "source": "https://vcenter.sddc-A-B-C-D.vmwarevmc.com/sdk",
    "account": "614055364343",
    "time": "2020-05-14T15:35:04Z",
    "region": "eu-central-1",
    "resources": [],
    "detail": {
        "id": "50ad464b-3171-4151-9e0b-ce7eef983ddd",
        "source": "https://vcenter.sddc-A-B-C-D.vmwarevmc.com/sdk",
        "specversion": "1.0",
        "type": "com.vmware.event.router/event",
        "subject": "AlarmStatusChangedEvent",
        "time": "2020-05-14T15:35:04.299909036Z",
        "data": {
            "Key": 23555,
            "ChainId": 23555,
            "CreatedTime": "2020-05-14T15:35:03.288149Z",
            "UserName": "",
            "Datacenter": {
                "Name": "SDDC-Datacenter",
                "Datacenter": {
                    "Type": "Datacenter",
                    "Value": "datacenter-3"
                }
            },
            "ComputeResource": {
                "Name": "Cluster-1",
                "ComputeResource": {
                    "Type": "ClusterComputeResource",
                    "Value": "domain-c8"
                }
            },
            "Host": null,
            "Vm": null,
            "Ds": null,
            "Net": null,
            "Dvs": null,
            "FullFormattedMessage": "Alarm 'vSAN health alarm 'Cluster disk space utilization'' on Cluster-1 changed from Green to Yellow",
            "ChangeTag": "",
            "Alarm": {
                "Name": "vSAN health alarm 'Cluster disk space utilization'",
                "Alarm": {
                    "Type": "Alarm",
                    "Value": "alarm-183"
                }
            },
            "Source": {
                "Name": "Datacenters",
                "Entity": {
                    "Type": "Folder",
                    "Value": "group-d1"
                }
            },
            "Entity": {
                "Name": "Cluster-1",
                "Entity": {
                    "Type": "ClusterComputeResource",
                    "Value": "domain-c8"
                }
            },
            "From": "green",
            "To": "yellow"
        },
        "datacontenttype": "application/json"
    }
}

As you can see in the lines I highlighted, when the event pattern below matches the actual event format, the event is accepted by the Event bus and will invoke a Target. This is the config I used for my demo (you might want to use another event).

{
  "detail": {
    "subject": [
      "AlarmStatusChangedEvent"
    ],
    "data": {
      "FullFormattedMessage": [
        {
          "prefix": "Alarm 'vSAN health alarm 'Cluster disk space utilization'' on Cluster-1 changed from Green to Yellow"
        }
      ]
    }
  }
}

In our case, we target CloudWatch to store the events and we target AWS Lambda to trigger a function called “spinUpSDDC”.

Once the rule has been created, note the rule ARN. You will need it later:

Setting up AWS Lambda

Once the event is triggered and matched by the Amazon Event Bridge event pattern, it will trigger the Lambda function:

To create the spinUpSDDC Lambda function, just create one from scratch in Python 3.8:

AWS Lambda Function

Then you will need to copy from my GitHub.

I initally used the code from Gilles Chekroun on how to stand up an SDDC with Python. I made some minor challenges and adapted it for Lambda but a lot of credit goes to Gilles and to Matt Dreyer from the VMC Product Team.

Save both files in a directory. Go to the directory.

You need to update the config.ini with your own credentials. Again, I would recommend you follow the instructions in Gilles’ blog post.

Install the Python packages using the following commands:

pip install requests -t . --upgrade
pip install configparser -t . --upgrade

Finally, you need to zip them up, for example with the following command.

zip -r9 ${OLDPWD}/function.zip .

What you have just done is packaging up the Python file and all required packages for the Python script to be executable.

You now have a Zip file (called function.zip if you followed the command above) to upload straight to Lambda.

Before you save, I would recommend you update the default Lambda timeout:

The default timeout of 3 seconds will not be enough so you should probably increase 60 seconds or more. It took about 20 seconds in my demo for the entire function to be executed.

Setting up VMware Event Broker Appliance

Patrick Kremer has done a fantastic job with his blog posts and I would recommend you check his posts. All the documentation built by the VEBA team and early adopters has made it very easy to deploy.

First, head out to the VEBA Fling page and download the OVA. Go through the standard deployment of an OVA.

The only notable aspect is the AWS EventBridge configuration.

For the Access Key and Access Secret, you need to create a Role on AWS IAM with AmazonEventBridgeFullAccess credentials.

The Event Bus name is the name of the Event Bus you created earlier. The Rule ARN is something noted earlier after creating the rule.

Once deployed, log onto to the web console (or SSH to the VM) and you can see all the components of VEBA running as containers. Again, Patrick did a great job explaining it in one of his posts. The key one is the VMware Event Router.

root@veba-cloud-bursting [ ~ ]# kubectl get pods -A
NAMESPACE        NAME                                          READY   STATUS      RESTARTS   AGE
kube-system      coredns-584795fc57-hbr57                      1/1     Running     0          5d23h
kube-system      coredns-584795fc57-nzwdq                      1/1     Running     0          5d23h
kube-system      etcd-veba-cloud-bursting                      1/1     Running     0          5d23h
kube-system      kube-apiserver-veba-cloud-bursting            1/1     Running     0          5d23h
kube-system      kube-controller-manager-veba-cloud-bursting   1/1     Running     0          5d23h
kube-system      kube-proxy-fnjpl                              1/1     Running     0          5d23h
kube-system      kube-scheduler-veba-cloud-bursting            1/1     Running     0          5d23h
kube-system      weave-net-k52n8                               2/2     Running     0          5d23h
projectcontour   contour-5cddfc8f6-hp7kg                       1/1     Running     0          5d23h
projectcontour   contour-5cddfc8f6-t8x2g                       1/1     Running     0          5d23h
projectcontour   contour-certgen-l75pm                         0/1     Completed   0          5d23h
projectcontour   envoy-rfgpr                                   1/1     Running     0          5d23h
vmware           tinywww-6777fcc5dd-cc898                      1/1     Running     0          5d23h
vmware           vmware-event-router-5dd9c8f858-lqszq          1/1     Running     0          5d23h

If VEBA is working as expected, you will see a lot of events being pulled from vCenter to VMware Event Router. To check them, we must run the following command:

kubectl logs vmware-event-router-5dd9c8f858-lqszq -n vmware --follow

That will display all the events pulled from the vCenter.

Back to the demo

In the demo, I have a vCenter with a single host with roughly 10TB of storage. When the demo starts, roughly 6.6TB of data is already consumed. When I clone a VM with 600GB of hard disk (thick provisioned) and boot it up, the storage will exceed 70% of utilization.

The vCenter storage capacity is exceeded and an event is sent to the VMware Event Router.

[AWS EventBridge] 2020/05/14 15:35:04 processing event [0] of type *types.AlarmStatusChangedEvent from source https://vcenter.sddc-18-132-133-144.vmwarevmc.com/sdk: &{AlarmEvent:{Event:{DynamicData:{} Key:23555 ChainId:23555 CreatedTime:2020-05-14 15:35:03.288149 +0000 UTC UserName: Datacenter:0xacd40a0 ComputeResource:0xacd4200 Host:<nil> Vm:<nil> Ds:<nil> Net:<nil> Dvs:<nil> FullFormattedMessage:Alarm 'vSAN health alarm 'Cluster disk space utilization'' on Cluster-1 changed from Green to Yellow ChangeTag:} Alarm:{EntityEventArgument:{EventArgument:{DynamicData:{}} Name:vSAN health alarm 'Cluster disk space utilization'} Alarm:Alarm:alarm-183}} Source:{EntityEventArgument:{EventArgument:{DynamicData:{}} Name:Datacenters} Entity:Folder:group-d1} Entity:{EntityEventArgument:{EventArgument:{DynamicData:{}} Name:Cluster-1} Entity:ClusterComputeResource:domain-c8} From:green To:yellow}

[AWS EventBridge] 2020/05/14 15:35:04 successfully sent event(s) from source https://vcenter.sddc-18-132-133-144.vmwarevmc.com/sdk: {
  Entries: [{
      EventId: "64b65d37-99f0-38f3-ef12-659547cf5711"
    }],
  FailedEntryCount: 0
} batch: 0

Because it matches the event pattern we defined in Amazon EventBridge, it will send the event to EventBridge who will trigger the VMC SDDC deployment.

And that’s how the magic happens 😁

The cool thing is that when the Lambda function is deployed, it’s logged in CloudWatch logs and we also get details about what was deployed:

The first event triggered the deployment of an SDDC. The second event (because obviously the on-prem vCenter is still alerting as its disk is still over-used) does not trigger another SDDC as the Python code checks if there’s already an SDDC deployed with the “nvibert-API-2” name. This gives you the protection to avoid a situation where you might end up with multiple SDDCs by accident.

If you think that’s something you would like to use or if you think the process should be simplified, let me know via a Tweet.

If you want to learn more about the VMware Event Broker Appliance, join the VMware {code} Slack and find the #vcenter-event-broker-appliance channel.

Hang on… was it really Cloud Bursting?

If we look at the Cloud Bursting requirements I listed earlier:

  1. Ability to monitor and the “on-prem” capacity and trigger some event/alarm when the capacity level passes a threshold.
  2. If/when this happens, we need the ability to automatically create resources in the cloud.
  3. We need the ability for the application to be re-balanced between on-prem and the infrastructure in the cloud. It might be done, for example, through some sort of load-balancing, but the main aspect is to relieve pressure on the on-prem environment.

You could argue that the demo doesn’t quite deliver on the 3rd aspect. By the end of the video, we’ve got a brand new vCenter running in the Cloud but the one on-prem is still full. If you use an automation engine like vRealize Automation or Terraform, then you need to point them to the new vCenter.

What would be nice would be to auto-deploy HCX straight after the Cloud vCenter is deployed, start connecting up both vCenters and migrate VMs from on-prem to the Cloud without disruption to balance resources. In other words, the next logical step is to build Hybrid Elastic DRS… Watch this space!

Thanks to Patrick Kremer, William Lam, Michael Gasch, and the extended VEBA team for their help with this project.

Thanks for reading.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s