Policy-As-Code for Networking-As-Code

(Cringey) Video blog

As much as I promote Terraform and other tools to automate and deploy infrastructure, I also begin to appreciate that the easier we make it for folks to deploy resources, the quicker it gets out of control. If you give a developer or an infrastructure operator the keys to the AWS account, then you might quickly regret it if the user starts deploying costly cloud services or worse, creates something that might prove to be a security weakness.

It also applies to your Spotify account – if you give somebody access to it, you might end up with terrible music on your playlists. Unless you policy it.

The same applies to cloud networking.


Policy As Code for Networking As Code

If you’re going to use Terraform to deploy security rules – whether it’s, let’s say NSX or Fortinet – you want to make sure it’s done properly.

If you’re going to use Terraform to deploy a VPN, you might want to make sure it’s not using a vulnerable cypher.

If you’re going to use Terraform to create a AWS VPC, you might want to ensure it’s set up properly with monitoring (with VPC Flow Logs for example).

If you’re going to use Terraform to create AWS security groups, you might want to use only security groups within it or that you don’t accidentally create a group with inbound rule allowing any incoming traffic.

If you’re going to use AWS broadly, you might want your networking entities to be tagged appropriately.

If you’re going to use Terraform to create a VPN, you might want to make sure nobody is going to use a vulnerable IKE parameter, for example Diffie-Hellman Group 2 or that the VPN is built resiliently.

There are multiple ways to enforce policies. You could use some kind of self-service portal with drop-down menus to only let users to very specific things. That can be hard to maintain and regulate.

You could build Terraform modules – blueprints if you like – and limit their consumption of Terraform to these specific modules.

You can also write Policy-As-Code policies to make sure the users don’t accidentally violate a business or security policy.

In the remaining of this post, I will walk through a couple of Sentinel policies above to make sure our network-as-code doesn’t go awry.

Sentinel Recap

I’ve already blogged a fair bit about Sentinel (including a 2-part intro (part 1 and part 2) and even how to use it with Spotify) but if I had to resume briefly:

  1. Sentinel is a framework to apply some guard rails to cloud infrastructure deployment.
  2. Those guard rails are referred as policies and are written as code, in the Sentinel syntax
  3. The Sentinel syntax is not as easy to read as the HCL syntax you use with most of the HashiCorp products and it can be a bit challenging at first.
  4. The reason we couldn’t use HCL for this is that applying policy is more complex and required functions and operators that were not available in HCL.
  5. Sentinel is a premium feature of the HashiCorp products: ie they’re not available in the open source tools but you can test them with the Sentinel CLI, the Sentinel playground or by requesting access for a free trial for 30 days to the Team & Governance tier on Terraform Cloud.
  6. Sentinel is the not the only policy as code framework – OPA is becoming very popular and is regularly used with Kubernetes.
  7. Sentinel policies are applied between Terraform Plan and Terraform Apply.
  8. Sentinel policies can represent any kind of policy: security and governance policies are the most common, but you could have any kind of policies applied.
  9. A common Sentinel example is probably one to ensure that an AWS S3 bucket is not exposed publicly. It’s a security & governance policy that can easily be written into code and apply to a Terraform configuration.
  10. One of the benefits of Policy-As-Code is that the code can be shared. You will see lots of policies on this GitHub repo. What I usually do when I write policies is consult existing ones to save me some time.

Let’s walk through three examples. Some of them were already in the repo above but some are a slight variation on them.

Ensuring a user creates a VPN with the right encryption parameters

Network engineers have to routinely create VPN to remote sites and it might be something you could provision through Terraform as the AWS provider supports VPN resources.

By default, when you build a site-to-site VPN on AWS, you can use the default VPN parameters and AWS will negotiate with the remote end the parameters such as encryption, DH groups and authentication.

There’s an option to specify the algorithms, IKE version and DH groups.

AWS VPN Parameters

What you might accidentally do is pick a vulnerable cypher. This what it would look like in your Terraform config:

resource "aws_vpn_connection" "main" {
  vpn_gateway_id                  = aws_vpn_gateway.vpn_gateway.id
  customer_gateway_id             = aws_customer_gateway.customer_gateway.id
  type                            = "ipsec.1"
  static_routes_only              = true
  tunnel1_phase1_dh_group_numbers = [2]
}

Imagine for example that you have a security policy that only allows Diffie-Hellman (DH) Groups 19, 20 and 21. You want to prevent a user from creating the VPN with any other DH groups. This is the sort of Sentinel Policy you would apply. This file will be saved as a .sentinel file:

# This policy uses the Sentinel tfplan/v2 import to require that
# AWS VPNs only used allowed DH groups

# Import common-functions/tfplan-functions/tfplan-functions.sentinel
# with alias "plan"
import "tfplan-functions" as plan

allowed_dh_groups = [19,20,21]

# Get all VPN Connections
allVPNConnections = plan.find_resources("aws_vpn_connection")

# Filter to VPN with violations
# Warnings will be printed for all violations since the last parameter is true
violatingVPNs = plan.filter_attribute_contains_items_not_in_list(allVPNConnections,
                        "tunnel1_phase1_dh_group_numbers", allowed_dh_groups, true)

# Count violations
violations = length(violatingVPNs["messages"])

# Main rule
main = rule {
  violations is 0
}

The essence of the code is to:

  • go through the output of Terraform Plan
  • list all the VPN resources
  • count any of the VPN resources are not using the recommended DH Groups values
  • report a policy fail if the count is not 0

Let’s go through it in more details:

# Import common-functions/tfplan-functions/tfplan-functions.sentinel
# with alias "plan"
import "tfplan-functions" as plan

First thing to know is that we are going to cheat. When you write anything in any language, you cheat by using libraries, packages, SDKs – whatever bit of code you can reuse to save you some time. Here, we import “tfplan-functions” as plan and we will be using the functions defined in tfplan-functions throughout our code.

Where does tfplan-functions come from? Well, what you also need outside of the .sentinel policies is a sentinel.hcl file which is the main configuration file that specifies the policies to be used and the enforcement level of the policy. Again, I talked about it in previous posts.

module "tfplan-functions" {
    source = "https://raw.githubusercontent.com/hashicorp/terraform-guides/master/governance/third-generation/common-functions/tfplan-functions/tfplan-functions.sentinel"
}

policy "soft-mandatory-vpn" {
  source            = "./only-allow-selective-dh-groups-aws-vpn.sentinel"
  enforcement_level = "soft-mandatory"
}

Back to our policy. What we do next is write a list of the values acceptable for the DH attributes. In this instance, our policy only accepts DH19, DH20 and DH21.

# Allowed DH Group Types
allowed_dh_groups = [19,20,21]

Next, we’re going to crawl through the output of the Terraform Plan and find all the resources called “aws_vpn_connection”. allVPNConnections is a list. We use the find_resources function we imported earlier.

# Get all VPN Connections
allVPNConnections = plan.find_resources("aws_vpn_connection")

Finally, we use another function we had imported beforehand. The function does all the logic for us so why not using it? The function filters a collection of resources, data sources, or blocks to those with an attribute that contains any items that are not in a given list. 

In our case, we look at all the VPN connections with the attributes “tunnel1_phase1_dh_group_numbers”. If any of attributes is not using DH19, DH20 or DH21, we add them to a a list referred as violatingVPNs.

# Filter to VPN with violations
violatingVPNs = plan.filter_attribute_contains_items_not_in_list(allVPNConnections,
                        "tunnel1_phase1_dh_group_numbers", allowed_dh_groups, true)

# Count violations
violations = length(violatingVPNs["messages"])

# Main rule
main = rule {
  violations is 0
}

Finally, if that list is empty, then the policy passes. If it’s not empty, the policy fails.

# Count violations
violations = length(violatingVPNs["messages"])

# Main rule
main = rule {
  violations is 0
}

Without the comments, it’s 9 lines of code. Not that hard, is it?

So how do I apply this in practice?

Well, you need to push your policy as code your Version Control System (Github for me) and associate it with your Terraform Workspaces.

Associating Policies

That’s assuming you had created the workspace associated to the repo where you have the code representing your infra.

When you trigger a Terraform Run:

Terraform Plan

Sentinel will be executed by “terraform plan” and “terraform apply”.

Soft Failed Policy – Can be Overriden

Sentinel is telling me exactly what I am doing wrong. Note that the enforcement level for this policy is “soft-mandatory”: the Terraform applied is paused until the user approves and overrides the change.

You may be building a VPN to a remote firewall that only supports DH2 for example, so there might be a reason you picked this settings.

There might be some more severe policies you might want to enforce, with no option to override. For example:

Preventing a user to create a wide open security group

You might want to block anyone who creates a security group with a rule to allow 0.0.0.0/0 as an ingress CIDR block. In your sentinel.hcl file, you will need:

policy "hard-mandatory-policy" {
  source            = "./restrict-ingress-sg-rule-cidr-blocks.sentinel"
  enforcement_level = "hard-mandatory"
}

I am re-using a policy already written (there are some great examples of policies here). The structure will be similar to the one written previously: we are scanning the Terraform plan output and looking for any with the forbidden values.

It will be slightly more complex but mainly because there are two ways to create AWS security group rules with Terraform: with the aws_security_group resource (where rules are embedded as blocks within the main resource) and with the aws_security_group_rule resource (where each rule is a distinct resource and each refers to the security group ID the rule belongs to). So the policy has to parse both of them.

To keep it simple for this blog, I will just look at the aws_security_group.

Here is the policy:

# Forbidden CIDRs
forbidden_cidrs = ["0.0.0.0/0"]

# Get all Security Groups
allSGs = plan.find_resources("aws_security_group")

# Validate Security Groups
violatingSGsCount = 0
for allSGs as address, sg {

  # Find the ingress rules of the current SG
  ingressRules = plan.find_blocks(sg, "ingress")

  # Filter to violating CIDR blocks
  # Warnings will not be printed for violations since the last parameter is false
  violatingIRs = plan.filter_attribute_contains_items_from_list(ingressRules,
                 "cidr_blocks", forbidden_cidrs, false)

  # Print violation messages
  if length(violatingIRs["messages"]) > 0 {
    violatingSGsCount += 1
    print("SG Ingress Violation:", address, "has at least one ingress rule",
          "with forbidden cidr blocks")
    plan.print_violations(violatingIRs["messages"], "Ingress Rule")
  }  // end if

} // end for SGs

# Main rule
validated = length(violatingSGRules["messages"]) is 0 and violatingSGsCount is 0
main = rule {
  validated is true
}

Similarly to the first policy, we import existing functions to make our lives easier:

# Import the tfplan/v2 import, but use the alias "tfplan"
import "tfplan/v2" as tfplan

# Import common-functions/tfplan-functions/tfplan-functions.sentinel
# with alias "plan"
import "tfplan-functions" as plan

This time though, we specify the value we don’t want to the user to specify:

# Forbidden CIDRs
forbidden_cidrs = ["0.0.0.0/0"]

We then get all the aws_security_group resources from the Terraform Plan output and we are going to iterate through the list of aws_security_groups and all the blocks within these groups.

# Get all Security Groups
allSGs = plan.find_resources("aws_security_group")

# Validate Security Groups
violatingSGsCount = 0
for allSGs as address, sg {

  # Find the ingress rules of the current SG
  ingressRules = plan.find_blocks(sg, "ingress")

  # Filter to violating CIDR blocks
  # Warnings will not be printed for violations since the last parameter is false
  violatingIRs = plan.filter_attribute_contains_items_from_list(ingressRules,
                 "cidr_blocks", forbidden_cidrs, false)

We use”for” to iterate over the collection of security groups. On line 6, address will be the “index” and sg will be the value of the security group.

We use the function “find_blocks” that will find all the instances of the “ingress” blocks within the Terraform config.

So when Sentinel inspects my Terraform plan:

resource "aws_security_group" "allow_all" {
  name        = "allow_all"
  description = "Allow All inbound traffic"
  vpc_id      = aws_vpc.main.id

  ingress {
    description = "Allow Ingress All"
    from_port   = 0
    to_port     = 0
    protocol    = "-1"
    cidr_blocks = ["0.0.0.0/0"]
  }

  egress {
    from_port   = 0
    to_port     = 0
    protocol    = "-1"
    cidr_blocks = ["0.0.0.0/0"]
  }

  tags = {
    Name = "nico-vibert-sg-allow-all"
  }
}

We will call the function:

plan.filter_attribute_contains_items_from_list(ingressRules,
                 "cidr_blocks", forbidden_cidrs, false)

And it will find the ingress block with the cidr_blocks containing the forbidden value “0.0.0.0/0”.

Sentinel can give you plenty of logs as well, which is handy when you have hundreds of objects to inspect.

  # Print violation messages
  if length(violatingIRs["messages"]) > 0 {
    violatingSGsCount += 1
    print("SG Ingress Violation:", address, "has at least one ingress rule",
          "with forbidden cidr blocks")
    plan.print_violations(violatingIRs["messages"], "Ingress Rule")
  }  // end if

}

The outcome when I run Terraform is that Sentinel will give me the middle finger:

Hard stop

Harsh!

You might want Terraform to be a more gentle and just notify when you do something minor, like forgetting to set up your AWS tags.

Ensuring networking resources all have a tag

Here, we are only going to set up the policy to “advisory” – Sentinel is just letting know you’re being a bit naughty when you omitted to tag resources but it will not interrupt the Terraform process.

If we look back at my Terraform config, some of my resources have tags but some have not:

resource "aws_vpc" "main" {
  cidr_block = "10.0.0.0/16"
  tags = {
    Name = "nico-vibert"
  }

}

resource "aws_security_group" "allow_tls" {
  name        = "allow_tls"
  description = "Allow TLS inbound traffic"
  vpc_id      = aws_vpc.main.id

  ingress {
    description = "TLS from VPC"
    from_port   = 443
    to_port     = 443
    protocol    = "tcp"
    cidr_blocks = [aws_vpc.main.cidr_block]
  }

  egress {
    from_port        = 0
    to_port          = 0
    protocol         = "-1"
    cidr_blocks      = ["0.0.0.0/0"]
    ipv6_cidr_blocks = ["::/0"]
  }

  tags = {
    Name = "nico-vibert-sg-allow-tls"
  }
}

resource "aws_security_group" "allow_all" {
  name        = "allow_all"
  description = "Allow All inbound traffic"
  vpc_id      = aws_vpc.main.id

  ingress {
    description = "Allow Ingress All"
    from_port   = 0
    to_port     = 0
    protocol    = "-1"
    cidr_blocks = ["0.0.0.0/0"]
  }

  egress {
    from_port   = 0
    to_port     = 0
    protocol    = "-1"
    cidr_blocks = ["0.0.0.0/0"]
  }

  tags = {
    Name = "nico-vibert-sg-allow-all"
  }
}

resource "aws_vpn_gateway" "vpn_gateway" {
  vpc_id = aws_vpc.main.id
  tags = {
    Name = "nico-vibert-vpn-gateway"
  }
}

resource "aws_customer_gateway" "customer_gateway" {
  bgp_asn    = 65000
  ip_address = "172.0.0.1"
  type       = "ipsec.1"
}

resource "aws_vpn_connection" "main" {
  vpn_gateway_id                  = aws_vpn_gateway.vpn_gateway.id
  customer_gateway_id             = aws_customer_gateway.customer_gateway.id
  type                            = "ipsec.1"
  static_routes_only              = true
  tunnel1_phase1_dh_group_numbers = ["2"]
}

What we will be doing here is just check specific AWS networking resources and make sure they have “Name” tags.

We use Sentinel parameters which can either specified directly into the code as below or as input parameters within the Terraform Cloud console.

# This policy uses the Sentinel tfplan/v2 import to require that
# specified AWS resources have all mandatory tags

# Import common-functions/tfplan-functions/tfplan-functions.sentinel
# with alias "plan"
import "tfplan-functions" as plan

# Import aws-functions/aws-functions.sentinel
# with alias "aws"
import "aws-functions" as aws

# List of resources that are required to have name/value tags
param resource_types default [
  "aws_vpc",
  "aws_security_group",
  "aws_customer_gateway",
  "aws_vpn_connection",
  "aws_vpn_gateway_vpn",
]

# List of mandatory tags

param mandatory_tags default ["Name"]

# Get all AWS Resources with standard tags
allAWSResourcesWithStandardTags =
                          aws.find_resources_with_standard_tags(resource_types)

# Filter to AWS resources with violations
# Warnings will be printed for all violations since the last parameter is true
violatingAWSResources =
        plan.filter_attribute_not_contains_list(allAWSResourcesWithStandardTags,
                        "tags", mandatory_tags, true)

# Main rule
main = rule {
  length(violatingAWSResources["messages"]) is 0
}

Similarly to before, we are looking for all AWS resources that can be tagged and then look at the tags – if any – and check the value.

Sentinel advises me that a couple of resources are missing Name tags but it doesn’t interrupt the Terraform run.

Advisory Tag

Hopefully that gives you an idea how what we can be done to control your networking-as-code.

Thanks for reading.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s