Python Tips – Parsing, Dictionaries and APIs

This post will go through various Python tips on topics such as:

  • Parsing Python strings
  • Extracting entries from Python dictionaries and list
  • Making API requests from Python
Python
Python

In a previous post, I wrote a AWS Lambda function in Python that synchronized vSphere tags with NSX Security tags.

The main coding challenges were from extracting some of the information in the Python strings, objects and dictionaries – it took a few hours to get this right and a lot of trial-and-error so I thought I’d record this in a blog post (to jog my own memory and for anyone with similar challenges).

Throughout the post, I use a concrete example to help the reader understand the various Python methods.

Python String Parsing

I’ll use the same example from my previous post and walk you through what we have to do.

If you recall from the post, we are sending an alert in the JSON format to AWS Lambda and we want to extract the name of the tag (‘nico-tag’ or ‘nico-test’ in the examples below) and the name of the object (‘nico-vm’ or ‘centos’ in the examples below). Sometimes the tag is ‘attached’ and sometimes the tag is ‘detached’.

First example:

{
  "source": "<99>1 2019-09-19T13:21:34.725374+00:00 vcenter vpxd 60557 - -  Event [2138825] [1-1] [2019-09-19T13:21:34.724991Z] [vim.event.EventEx] [info] [VMC.LOCAL\\cloudadmin] [] [2138825] [User VMC.LOCAL\\cloudadmin detached tag nico-tag from object nico-vm]"
}

Second example:

{
  "source": "<99>1 2019-09-19T13:21:34.725374+00:00 vcenter vpxd 60557 - -  Event [2138825] [1-1] [2019-09-19T13:21:34.724991Z] [vim.event.EventEx] [info] [VMC.LOCAL\\cloudadmin] [] [2138825] [User VMC.LOCAL\\cloudadmin attached tag nico-test to object centos]"
}

In AWS API Gateway, we had selected “Use Lambda Proxy integration“. to proxy the content of the API request to the ‘event’ of the AWS Lambda handler function. So the string above are passed onto Lambda and my first task is to convert it into a string.

def lambda_handler(event, context):
    string_event = str(event)

Now that we have a string, let’s use some of the native Python methods to inspect it.

String Find() method

In my scenario, I needed to do different actions whether the tag was attached or whether it was detached.

I used a the string find() method to search for the ‘attached tag’ string within the logs (read more on find() here).

If the search of the results returns -1, it means ‘attached tag’ is not in the string. If it returns a positive number, it returns the value of the first occurrence of the specified value. Assume ‘attached tag’ is not there:

if string_event.find('attached tag') == -1:

If that’s the case, it means we have a ‘detached tag’ string such as the one below.

{u'source': u'<99>1 2019-09-19T13:21:34.725374+00:00 vcenter vpxd 60557 - -  Event [2138825] [1-1] [2019-09-19T13:21:34.724991Z] [vim.event.EventEx] [info] [VMC.LOCAL\\cloudadmin] [] [2138825] [User VMC.LOCAL\\cloudadmin detached tag nico-tag from object centos]'}

All the stuff at the beginning is rubbish I don’t care about – I only want the name of the object (‘centos’). First, we use the find() method again and look for “from object”. The logs always have the same format and we know that the object name I am looking for is located between the characters “from object” and the “]” at the end of the message.

indexToObject below is a string position number of where ‘from object’ starts in the string.

indexToObject = string_event.find("from object")

Then we use simple Python indexing (using square brackets []) to define our tagged_VM ‘centos’. It’s located 12 characters ahead of the ‘f’ of ‘from object’ and finishes at the square bracket ‘]’.

tagged_VM = string_event[indexToObject + 12 : string_event.find("]", indexToObject)]

print(tagged_VM)
centos

Another example to hammer the point. If I want to extract the tag name (nico-test) in the following sequence, it’s a similar Python string indexing job.

{
  "source": "<99>1 2019-09-19T13:21:34.725374+00:00 vcenter vpxd 60557 - -  Event [2138825] [1-1] [2019-09-19T13:21:34.724991Z] [vim.event.EventEx] [info] [VMC.LOCAL\\cloudadmin] [] [2138825] [User VMC.LOCAL\\cloudadmin attached tag nico-test to object centos]"
}

For tag_name below, we look for the string ‘attached tag ‘ and look ahead 13 characters ahead to where the name of the tag is. The end of the tag will be a character before ‘to object’.

tag_name = string_event[string_event.find('attached tag') + 13 : (string_event.find('to object') - 1)]

Python API Requests

Now that we’ve extracted all the relevant data, we are going to make an API call in the JSON format, using some of the extracted data. That’s why I had the following modules at the beginning of my function:

import json
import requests

The first one gives the ability to manipulate JSON data and the second one lets us make API requests with Python.

In my Lambda script, I had to make some GET and POST API calls. With the ‘requests’ modules, it is straight-forward: requests.get will make GET calls and requests.post will make POST calls.

Inside requests.get, you need to specify the information you would expect in a HTTP request: you need to specify the header (including the token if necessary) and the JSON payload if you’re doing a POST.

For example, for a GET:

params = {'refresh_token': Refresh_Token}
headers = {'Content-Type': 'application/json'}
response = requests.post('https://console.cloud.vmware.com/csp/gateway/am/api/auth/api-tokens/authorize', params=params, headers=headers)

One thing you might want to capture is the HTML status code. 204 means we got a successful query:

return response.status_code
204

If you want to POST something, you probably going to need to create some content in JSON. This is what I did with the following JSON:

json_data = {
    "virtual_machine_id": extracted_VM_external_id,
    "tags":[
      {
         "scope":"",
         "tag": tag_name
      }
   ]
   }

Finally, I can post something using the following Python request:

response = requests.post(URL, json = json_data, params={'action': 'update_tags'}, headers=headers)

Python Dictionaries

In my script, I needed to extract a value out of a JSON response.

First, we needed to convert a JSON to a Python Dictionary.

response_dictionary = response.json()

The response was pretty lengthy and I only wanted to extract this value: ‘50110458-2523-da59-3f06-a0da7c3ca808’ from the response below.

{
     "results": [
         {
             "resource_type": "VirtualMachine",
             "display_name": "telegraf",
             "compute_ids": [
                 "moIdOnHost:10",
                 "hostLocalId:10",
                 "locationId:564db6f5-aa90-7f9f-2d89-c16395e9ccb9",
                 "instanceUuid:5011ce05-8b10-8d29-3305-e3ef1b089c7b",
                 "externalId:5011ce05-8b10-8d29-3305-e3ef1b089c7b",
                 "biosUuid:42110c2c-56ad-123b-c473-8712fe8c9910"
             ],
             "external_id": "5011ce05-8b10-8d29-3305-e3ef1b089c7b",
             "source": {
                 "target_display_name": "esx-10.2.32.4",
                 "is_valid": true,
                 "target_type": "HostNode",
                 "target_id": "75841618-af75-4bef-8711-1472fa0c943e"
             },
             "type": "REGULAR",
             "power_state": "VM_RUNNING",
             "host_id": "75841618-af75-4bef-8711-1472fa0c943e",
             "local_id_on_host": "10",
             "_last_sync_time": 1568711597653
         },
         {
             "resource_type": "VirtualMachine",
             "display_name": "centos",
             "tags": [
                 {
                     "scope": "",
                     "tag": "tag03"
                 }
             ],
             "compute_ids": [
                 "moIdOnHost:11",
                 "hostLocalId:11",
                 "locationId:564d0801-98d0-3ea8-fe76-cc9b8c70536e",
                 "instanceUuid:50110458-2523-da59-3f06-a0da7c3ca808",
                 "externalId:50110458-2523-da59-3f06-a0da7c3ca808",
                 "biosUuid:4211ebbc-f6f0-5486-c530-9ffb6271d7e9"
             ],
             "external_id": "50110458-2523-da59-3f06-a0da7c3ca808",
             "source": {
                 "target_display_name": "esx-10.2.32.4",
                 "is_valid": true,
                 "target_type": "HostNode",
                 "target_id": "75841618-af75-4bef-8711-1472fa0c943e"
             },
             "type": "REGULAR",
             "power_state": "VM_RUNNING",
             "host_id": "75841618-af75-4bef-8711-1472fa0c943e",
             "local_id_on_host": "11",
             "_last_sync_time": 1568711642639
         }
     ]
 }

The output below is actually a dictionary of a list of dictionaries. So we’re going to extract each value a step at a time:

The following command will extract the first layer out. This is using the fact that a dictionary is essentially of a collection of key-value pairs. In the output above, we have a dictionary with one key name [‘results’] and one value (everything after ‘results’). To extract the item of a dictionary, you only need to refer to its key name, inside square brackets:

extracted_dictionary = response_dictionary['results']

The output is the following:

{
             "resource_type": "VirtualMachine",
             "display_name": "telegraf",
             "compute_ids": [
                 "moIdOnHost:10",
                 "hostLocalId:10",
                 "locationId:564db6f5-aa90-7f9f-2d89-c16395e9ccb9",
                 "instanceUuid:5011ce05-8b10-8d29-3305-e3ef1b089c7b",
                 "externalId:5011ce05-8b10-8d29-3305-e3ef1b089c7b",
                 "biosUuid:42110c2c-56ad-123b-c473-8712fe8c9910"
             ],
             "external_id": "5011ce05-8b10-8d29-3305-e3ef1b089c7b",
             "source": {
                 "target_display_name": "esx-10.2.32.4",
                 "is_valid": true,
                 "target_type": "HostNode",
                 "target_id": "75841618-af75-4bef-8711-1472fa0c943e"
             },
             "type": "REGULAR",
             "power_state": "VM_RUNNING",
             "host_id": "75841618-af75-4bef-8711-1472fa0c943e",
             "local_id_on_host": "10",
             "_last_sync_time": 1568711597653
         },
         {
             "resource_type": "VirtualMachine",
             "display_name": "centos",
             "tags": [
                 {
                     "scope": "",
                     "tag": "tag03"
                 }
             ],
             "compute_ids": [
                 "moIdOnHost:11",
                 "hostLocalId:11",
                 "locationId:564d0801-98d0-3ea8-fe76-cc9b8c70536e",
                 "instanceUuid:50110458-2523-da59-3f06-a0da7c3ca808",
                 "externalId:50110458-2523-da59-3f06-a0da7c3ca808",
                 "biosUuid:4211ebbc-f6f0-5486-c530-9ffb6271d7e9"
             ],
             "external_id": "50110458-2523-da59-3f06-a0da7c3ca808",
             "source": {
                 "target_display_name": "esx-10.2.32.4",
                 "is_valid": true,
                 "target_type": "HostNode",
                 "target_id": "75841618-af75-4bef-8711-1472fa0c943e"
             },
             "type": "REGULAR",
             "power_state": "VM_RUNNING",
             "host_id": "75841618-af75-4bef-8711-1472fa0c943e",
             "local_id_on_host": "11",
             "_last_sync_time": 1568711642639
         }

Now as you can see, I have now a Python list called extracted_dictionary with two dictionary entries – one Virtual Machine whose name is centos and one is called is telegraf. Using the python next() method below, I can extract only the entry where the ‘display_name‘ matched the ‘tagged_VM‘ entry I had defined previously.

extracted_VM = next(item for item in extracted_dictionary if item["display_name"] == tagged_VM)

The output is the single dictionary entry:

{
             "resource_type": "VirtualMachine",
             "display_name": "centos",
             "tags": [
                 {
                     "scope": "",
                     "tag": "tag03"
                 }
             ],
             "compute_ids": [
                 "moIdOnHost:11",
                 "hostLocalId:11",
                 "locationId:564d0801-98d0-3ea8-fe76-cc9b8c70536e",
                 "instanceUuid:50110458-2523-da59-3f06-a0da7c3ca808",
                 "externalId:50110458-2523-da59-3f06-a0da7c3ca808",
                 "biosUuid:4211ebbc-f6f0-5486-c530-9ffb6271d7e9"
             ],
             "external_id": "50110458-2523-da59-3f06-a0da7c3ca808",
             "source": {
                 "target_display_name": "esx-10.2.32.4",
                 "is_valid": true,
                 "target_type": "HostNode",
                 "target_id": "75841618-af75-4bef-8711-1472fa0c943e"
             },
             "type": "REGULAR",
             "power_state": "VM_RUNNING",
             "host_id": "75841618-af75-4bef-8711-1472fa0c943e",
             "local_id_on_host": "11",
             "_last_sync_time": 1568711642639
         }

extracted_VM is a single dictionary and I can simply access the external_id it as dictionaries are based on key value pairs.

extracted_VM_external_id = extracted_VM['external_id']

print extracted_VM_external_id
50110458-2523-da59-3f06-a0da7c3ca808

That’s it – hopefully this concrete example of leveraging Python lists, dictionaries, methods and modules helped!

Thanks for reading.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s