Terraform and Splunk Part 1: Building a Splunk instance in AWS

As I work with global financial customers in my day job at HashiCorp, auditing and logging are topics that come up regularly. The vast majority of my customers would use Splunk as a log engine (Gartner-ly known as SIEM (Security Information and Event Management)) for a variety of use cases and HashiCorp partnered with Splunk to integrate it directly with Terraform Cloud. Kyle wrote a great article that covers how to set it up and the docs are also pretty clear. I am going to expand a bit further on Kyle’s post as I spent a couple of weeks having a bit of fun with Splunk.

Why?

I sometimes skip over the why when I blog in my rush to document my experiments so I will try to lay out exactly the problems we’re trying to solve:

  • Terraform Open Source, as good as it is, doesn’t include any built-in logging capabilities. It didn’t matter that much to me when I was running Terraform locally for my labs but it does matter for large enterprise customers. Sure, many of them integrate Terraform in a CI/CD pipeline and work around this by adding webhooks in your pipeline and forward logs to collectors…
  • Alternatively, you might use Terraform Cloud or Terraform Enterprise and log everything you do out, to S3, CloudWatch, Splunk, etc…
  • What I am particularly focusing on are the Auditing logs, especially for Policy-As-Code: as my customers look at building secure self-service platforms using Policy-As-Code and Sentinel, they need the ability to build searches that help them understand who submitted the changes, which runs failed the policy checks, who decided to override the runs that failed a soft policy, etc….
  • As Splunk is the main logging platform and given that there is Terraform Cloud for Splunk app, we’re going to focus on Splunk.
  • This is going to be a multi-part series:
    1. Part 1: Deploy Splunk instance in AWS with Terraform
    2. Part 2: Set up Terraform Cloud for Splunk app and build searches
    3. Part 3: Use the Terraform for Splunk provider to configure Splunk

If you already have a Splunk collector available, you can probably skip the rest of the post and move on to Parts 2 and 3.


My first thought was: which Splunk should I use? Should I deploy a Splunk Enterprise instance on EC2 (AWS remains my go-to platform, especially because of its marketplace) or should I use the SaaS Splunk, Splunk Cloud?

Splunk Cloud

I had already tried Splunk via the AWS Marketplace in my previous gig so I thought I’d check out Splunk Cloud. I was impressed by how quickly I got access to a free Splunk Cloud instance. Integrating with Terraform Cloud took about 5 minutes and I got my dashboard up and running.

I built a number of search queries and dashboards and that worked a treat. And when I got stuck, the Splunk community was quick to come to the rescue (thanks again).

However… When I tried to build my searches and dashboards using Terraform, I quickly hit a roadblock.

The Free Trial doesn’t let you access the REST API.

Never mind. Time to build my own Splunk instance and of course, I’ll use Terraform for it.

Deploying Splunk Enterprise on EC2 with Terraform

There are some very popular modules for the AWS provider on the Terraform Registry but frankly sometimes they can be overkilled for my use case which is:

  • Deploy a VPC
  • Deploy a subnet
  • Deploy an IGW
  • Deploy a default route table and set default route to IGW
  • Deploy a network interface
  • Create a security group allowing access from your public IP address for UI and for API access and for access from the instance to Terraform Cloud (it’s a PULL model: Splunk pulls the data from Terraform Cloud, towards the API IP range specified here and over port 443).
  • Deploy the Splunk instance, with aforementioned network interface in the newly created VPC:

Here is the Terraform configuration:

provider "aws" {
  region = var.region
}

# Get Availability zones in the Region
data "aws_availability_zones" "AZ" {}

# Get My Public IP
data "http" "my_public_ip" {
  url = "https://ipinfo.io/json"
  request_headers = {
    Accept = "application/json"
  }
}

locals {
  public_ip = jsondecode(data.http.my_public_ip.body).ip
}

resource "aws_vpc" "my_vpc" {
  cidr_block = var.vpc_cidr
  tags = {
    Name = "tf-example-2"
  }
}

resource "aws_internet_gateway" "gw" {
  vpc_id = aws_vpc.my_vpc.id
}

resource "aws_subnet" "my_subnet" {
  vpc_id                  = aws_vpc.my_vpc.id
  cidr_block              = var.subnet
  availability_zone       = data.aws_availability_zones.AZ.names[0]
  map_public_ip_on_launch = true
  tags = {
    Name = "nico-vibert-subnet"
  }
}

resource "aws_network_interface" "foo" {
  subnet_id       = aws_subnet.my_subnet.id
  private_ips     = [var.private_ip]
  security_groups = [aws_security_group.allow_ssh_and_tls.id]
  tags = {
    Name = "primary_network_interface"
  }
}

resource "aws_route_table" "public" {
  vpc_id = aws_vpc.my_vpc.id
}

resource "aws_route" "public_internet_gateway" {
  route_table_id         = aws_route_table.public.id
  destination_cidr_block = "0.0.0.0/0"
  gateway_id             = aws_internet_gateway.gw.id
}

resource "aws_main_route_table_association" "a" {
  vpc_id         = aws_vpc.my_vpc.id
  route_table_id = aws_route_table.public.id
}

data "aws_ami" "splunk" {
  most_recent = true
  owners      = ["679593333241"] ## Splunk Account 

  filter {
    name   = "name"
    values = ["splunk_AMI*"]
  }

  filter {
    name   = "root-device-type"
    values = ["ebs"]
  }

  filter {
    name   = "virtualization-type"
    values = ["hvm"]
  }
}


resource "aws_instance" "splunk" {
  ami           = data.aws_ami.splunk.id
  instance_type = var.instance_type
  tags = {
    Name = "nico-terraform"
  }
  network_interface {
    network_interface_id = aws_network_interface.foo.id
    device_index         = 0
  }
  availability_zone = data.aws_availability_zones.AZ.names[0]
}

output "splunk_public_ip" {
  value = aws_instance.splunk.public_ip
}

output "splunk_default_username" {
  value = "admin"
}

output "splunk_default_password" {
  value = "SPLUNK-${aws_instance.splunk.id}"
}

resource "aws_security_group" "allow_ssh_and_tls" {
  name        = "allow_tls"
  description = "Allow TLS inbound traffic"
  vpc_id      = aws_vpc.my_vpc.id

  ingress {
    description = "API Access"
    from_port   = 8089
    to_port     = 8089
    protocol    = "tcp"
    cidr_blocks = ["${local.public_ip}/32"]
  }
  ingress {
    description = "UI Access"
    from_port   = 8000
    to_port     = 8000
    protocol    = "tcp"
    cidr_blocks = ["${local.public_ip}/32"]
  }

  egress {
    from_port        = 0
    to_port          = 0
    protocol         = "-1"
    cidr_blocks      = ["0.0.0.0/0"]
    ipv6_cidr_blocks = ["::/0"]
  }

  egress {
    from_port   = 0
    to_port     = 0
    protocol    = "-1"
    cidr_blocks = ["0.0.0.0/0"]
    ### To be more accurate, we could specify the Terraform Cloud public IP ranges used for API communications. Uncomment the line below and the ip_ranges datasource and public_ip_range locals if required.
    //cidr_blocks      = local.public_ip_range.api

    ipv6_cidr_blocks = ["::/0"]
  }

  tags = {
    Name = "allow_tls"
  }
}


# Get Terraform Cloud IP ranges
/*data "http" "ip_ranges" {
  url = "https://app.terraform.io/api/meta/ip-ranges"
  # Optional request headers
  request_headers = {
    Accept = "application/json"
  }
}

locals {
  public_ip_range = jsondecode(data.http.ip_ranges.body)
}
*/

The fun thing with Terraform is that you can pretty much read the configuration above and work out what I am doing.

The only funky thing you may not have come across before is the use of the http provider, which enables me to pull dynamic information such my own public IP and, optionally, the IP ranges used by Terraform Cloud.

I published the Terraform configuration as a module here. You can use it by specifying, in your main.tf file, the following module block:

module "splunk" {
  source  = "nvibert/splunk/aws"
  version = "1.2.0"
}

You just need your AWS access key and secret access key stored as environment variables and you’re good to go. That’s 4 lines of code to build your log collection engine!

You can now access your Splunk Enterprise instance over its public IP (over port 8000). The default username is admin and password SPLUNK-$instance_id. Conveniently, that’s one of the outputs of the Terraform execution.

splunk_default_password = "SPLUNK-i-0XXXXXXXXXXXXXX"
splunk_default_username = "admin"
splunk_public_ip = "A.B.C.D"

TF Cloud

If you want to use this module with Terraform Cloud, just create a repo with these few lines of code (update the version accordingly and add these variables if you need to change from the default options).

module "splunk" {
  source  = "nvibert/splunk/aws"
  version = "1.2.0"
}

variable "region" {
}

variable "availability_zone" {
}

output "splunk_public_ip" {
  value = module.splunk.splunk_public_ip
}
output "splunk_default_username" {
  value = module.splunk.splunk_default_username
}
output "splunk_default_password" {
  value = module.splunk.splunk_default_password
}

Create a workspace, attach it to the repo, add your variables and you’re good to go!

TF Cloud Variables

A couple of minutes after queueing the run, you will be up and running:

Successful Run

And we’ve got our IP address and we’re good to go:


Now that we’ve used Terraform to set up Splunk, let’s up Splunk to monitor Terraform before, using Terraform to configure Splunk to monitor Terraform 🙃

2 thoughts on “Terraform and Splunk Part 1: Building a Splunk instance in AWS

Leave a comment