Latest Posts

Packer builder

 packer automation

Overview

Using shatterdome to build images with inspec.

Read More →

Creating dynamic subnets and modular design.

 cloudformation

Overview

In this episode we'll tackle enterprise network layouts. This work is a reflection of the layouts that I've seen at various enterprise companies over the years. The high level idea here is to use subnets to isolate network traffic by segmenting networks into network contexts. Each context is thought of as a layer which can only talk to other specific layers.

This gives us some advantages:

  • Granular, specific control of network flows from one subnet ( or group of subnets ) to another subnet ( or group of subnets ) through the use of NACL's and SecurityGroups ( SGs ).
  • With this granular control we can add network flow logs with something like Splunk to further analyze the flow of data.
  • Isolation at the network level prevents malicious actors from accessing more of our network.
  • Happy security teams!

Contexts

Here are the contexts, or layers, that are in play. The intent here is that each context will be represented as at least one subnet within the VPC. Most of the contexts will need to expand across all zones because we want to use everything available to us, however, something like the bastion context doesn't need cross zone functionality, at least not in this example.

  • PublicLoadBalancer: ALB's and ELB's live here.
  • Application: EC2 instances that run application workloads.
  • Cache: EC2 instances or hosted services for memcache/redis and the like
  • Data: Same as cache, but specifically called out as reserved for data services like mysql or pg.
  • Security: Any type of security related software as required by our security teams.
  • Bastion: Entry point into the network; this is also where we put the NAT
  • Monitoring: Splunk, ELKS, or whatever you use for monitoring things

In some cases people are tempted to try to make this stuff by hand stuffing all of the subnet Id's into a yaml file somewhere. The math on this is difficult given the complexity here, which is why I've done so much work around programming and automation in this space.

us-east-1 has 6 zones, so if we state that we want 6 contexts in 6 zones that 36 subnet id's for one environment alone, not to mention the CIDR block management and subnet associations involved in this. This would be an absolute nightmare to try to pull off by hand. In fact, any manual work here would probably end up being largely painful here.

And that's why we automate!

Modular design

During this body of work I realized that I needed to expand the modular design of how I'm compiling things.

Network

Here is the network yaml for the dev environment. This is showing how I intend to build out the dev network. The really interesting parts here are how each subnet has a size value which determines how many IP's are in a subnet's CIDR block.

  • small: 1 bit ( /24 )
  • medium: 2 bits ( /23 )
  • large: 4 bits ( /22 )

This allows me to have t-shirt sizes for my subnets which is easier to conceptualize than trying to do subnet cidr math in my head.

Stacks

I wanted to have a real, working application running inside the network I've created that could mirror an actual application. To this end, I ended up breaking this into three stacks:

  • network
  • bastion
  • workout tracker

Each stack is described in more detail below. In each section is a link to the yaml file that was used as the profile, the stack template json, and the params json. These are the working stacks that actually work do exactly what I'm describing.

aws cloudformation get-template --stack-name network-dev-0-3-0|jq '.TemplateBody' > template.json
aws cloudformation describe-stacks --stack-name network-dev-0-3-0|jq '.Stacks[0].Parameters' > params.json

Network

A context with public will cause the subnet to be associated with the default public route table. The public route table has a default route pointing to the IGW. However, if it's not public, then the assumption is that it's private and therefor requires routing out of the vpc via a NAT.

In this design the default route for a private network will always route out of the NAT gateway that is hosted in the bastion subnet. The cross_zone variable allows me to have only one subnet for a given subnet. If we changed cross_zone for bastion, then we'd end up with 6 subnets, but only one NAT and a bit of confusion. I'd have to put more work into creating dynamic NAT's with associated EIP allocations. I punted on the dynamic NAT generation here because I don't currently need this functionality. A single NAT is fine for a dev environment. Also, not everything needs to be automated all at once, we can do this part in a later iteration.

Bastion

The bastion host is fairly simple in that it's just an ASG where all EC2 instances will have a public IP. I decided to punt on the work which binds the external IP to a DNS entry dynamically because I'm going to cover that in a different post and probably use lambda.

I use the ssh proxy command trick to make jumping into an instance easier.

StrictHostKeyChecking no

Host bastion-dev.krogebry.com
  User ec2-user
  IdentityFile ~/.ssh/keys/devops-1.pem

Host 10.1.*.*
  User ec2-user
  IdentityFile ~/.ssh/keys/devops-1.pem
  ProxyCommand ssh bastion-dev.krogebry.com nc %h %p

Workout tracker

Finally we have the actual application itself which is hosted in ECS and fronted by an ALB. The best part about this is that the SG is specifically locked down to only allowing the ECS port range from the load balancer subnets. This happens because of the security_group module.

  - name: "EC2ClusterInstances"
    type: "security_group"
    description: "EC2 cluster instances."
    params:
      allow:
        - subnet: Bastion
          to: 22
          from: 22
          protocol: tcp
        - subnet: PublicLoadBalancer
          to: 65535
          from: 32768
          protocol: tcp

In this case a single security group will be created which will have 7 total rules. 6 rules are created from expanding the PublicLoadBalancer subnets, and 1 group which is expanded from the Bastion subnet. This is done by using a helper function which gets our list of subnets:

  def get_subnets( vpc_id, subnet_name )
    cache_key = format('subnets_%s_%s_%s', subnet_name, vpc_id, ENV['AWS_DEFAULT_REGION'])
    subnets = @cache.cached_json(cache_key) do
      filters = [{
        name: 'tag:Name',
        values: [subnet_name]
      },{
        name: 'vpc-id',
        values: [vpc_id]
      }]
      Log.debug('Subnet filters: %s' % filters.inspect)
      creds = Aws::SharedCredentials.new()
      ec2_client = Aws::EC2::Client.new(credentials: creds)
      ec2_client.describe_subnets(filters: filters).data.to_h.to_json
    end
  end

This is how we can use the tags to find things in the AWS ecosystem and generally make our lives a little easier. This is also using a file based caching mechanism which makes things easier for development. I'm not hitting the API's every single time for the same information that isn't changing. This is a handy trick when dealing with clients who are constantly on the edge of their API rate limits ( which happens often when splunk and other SaaS's are in play ).

Example

The isolation here keeps things from bleeding over into different networks, but more importantly this allows me to express complicated network rules for the thing I'm protecting. At the same time I'm keeping the protection bundled with the actual objects I'm protecting. Along with this I'm also expressing complicated network rules while not burning up large groups of security group allocations. In some cases people are tempted to create a unique SG for each rule, however, AWS allows us to express many rules in a single group.

The alternative model here is to split the SG's out into their own unique entities separate from the environment or stacks. I think this is wrong because of the blast radius argument. If someone changes a security group in that model, then you run the risk of exposing anything else that's also connected to that SG. In contrast, this model specifically isolates the potential danger down the stack of resources, thus hampering the blast radius of damage that could occur. In this way I'm following the intent of the subnet layering by providing an additional construct of abstraction and isolation.

Read More →

Implementing security with Workout Tracker using S3 SSE and KMS

 cloudformation

Overview

Security is probably my least favorite topic all around. In this post I'm going to explore how we can use AWS S3 with KMS to lock down our secrets. I'm choosing to go with this method over something like vault because I think this method is probably the easiest and covers the most use cases for what we want here. My goal with this work is to simply secure the secrets for my application.

Goals

  • Secure secrets that can be exposed as environment variables and used at run time.
  • Prevent any exposure of these secrets through things like log files.
  • Give the operations teams a way to manage the secrets

Implementation

We start this by creating a script that we'll use as the entry point.

aws_sse_s3.sh
#!/bin/bash
$(/opt/wt/bin/get_s3_env)
/opt/wt/bin/wt-api $@

The implementation is used in docker as such:

ENTRYPOINT ["/opt/wt/bin/aws_sse_s3.sh"]

Next the get_s3_env script will do the work to export the environment variables from the file we pull down from s3.

get_s3_env
#!/usr/bin/env ruby
# https://gist.github.com/themoxman/1d137b9a1729ba8722e4
require 'aws-sdk'
s3_client = Aws::S3::Client.new(region: 'us-east-1')
kms_client = Aws::KMS::Client.new(region: 'us-east-1')

# retrieve cmk key id
aliases = kms_client.list_aliases.aliases
key = aliases.find { |alias_struct| alias_struct.alias_name == format("alias/workout-tracker-%s", ENV['ENV_NAME'] }
key_id = key.target_key_id

# encryption client
s3_encryption_client = Aws::S3::Encryption::Client.new(
  client: s3_client,
  kms_key_id: key_id,
  kms_client: kms_client,
)

response = s3_encryption_client.get_object(bucket: 'chime-secrets', key: '.env')
response.body.read.each_line { |line| exports << "export #{line.chomp};" }

This process allows us to create entry points for operations to modify the thing that gets our secrets, and exposes them to the applicaiton. This way, all the application has to do is incorporate environment variables for secrets. Obviously there are different ways of doing this for example baking vault integration into the application.

What I've found is that sometimes we need easy, simple ways to integrate our secrets. This is a super simple way of giving us a channel into secrets management that does the job, and allows us a way to replace the system later. This is a great solution for a rev1 release where we're just trying to get things up and rocking.

Ops

Now let's talk about how we allow our operations engineers to manage the secrets. I usually wrap these things into Rake tasks rather than Make tasks because API's are fun. I'm using a very simple, generic function to encapsulate the logic which gives me an encrypted s3 client using a specific KMS key.

def get_enc_client
  creds = Aws::SharedCredentials.new()
  s3_client = Aws::S3::Client.new(region: ENV['AWS_DEFAULT_REGION'], credentials: creds)
  kms_client = Aws::KMS::Client.new(region: ENV['AWS_DEFAULT_REGION'], credentials: creds)

  aliases = kms_client.list_aliases.aliases
  key = aliases.find { |alias_struct| alias_struct.alias_name == format("alias/workout-tracker-%s", ENV['ENV_NAME']) }
  key_id = key.target_key_id

  Aws::S3::Encryption::Client.new(
    client: s3_client,
    kms_key_id: key_id,
    kms_client: kms_client
  ) 
end

As you can see, the KMS key alias is a calculated string using a variable name which is passed in from the docker environment.

Now we implement this with our two tasks: secrets:push secrets:pull.

namespace :secrets do

  desc "Push secrets"
  task :push do |t,args|
    mk_secrets_dir
    s3_enc_client = get_enc_client()
    s3_enc_client.put_object(
      key: '%s/env' % ENV['ENV_NAME'],
      body: File.read('/tmp/secrets/workout-tracker/%s/env' % ENV['ENV_NAME']),
      bucket: 'workout-tracker'
    )
  end

  desc "Pull secrets"
  task :pull do |t,args|
    mk_secrets_dir
    s3_enc_client = get_enc_client()
    File.open('/tmp/secrets/workout-tracker/%s/env' % ENV['ENV_NAME'], 'w') do |f|
      s3_enc_client.get_object(
        key: '%s/env' % ENV['ENV_NAME'],
        bucket: 'workout-tracker'
      ) do |chunk|
        f.write(chunk)
      end
    end
  end

end

How it works

When we push a secret up to S3 we're using the KMS key for this environment, then we're pushing the local file to s3 using SSE and KMS. We can test the encryption by pulling the file down from s3 without using SSE.

$ aws s3 cp s3://workout-tracker/dev/env ./test
download: s3://workout-tracker/dev/env to ./test                 
$ cat test 
mIpRr??
??^?-

This shows that if someone was able to somehow access the s3 object, they wouldn't be able to see the contents without also being able to access the KMS key.

Read More →

Implementing weighted DNS into the workout tracker

 cloudformation

Overview

Weighted DNS is probably one of the coolest things one can do in AWS. This is how we can do b/g deployments at the infrastructure level. I'm going to implement this with the workout tracker by implementing a simple r53 rule with a default weight of 0. Then we change the weight of the new stack when we're ready. We can change the weight to whatever we want.

Implementation

The way this works is to setup a weighted DNS entry for our FQDN. In this came I'm using wt-dev.krogebry.com to point to my stack.

    "DNSEntry": {
      "Type": "AWS::Route53::RecordSetGroup",
      "Properties": {
        "Comment": "DNS entry point for ALB or ELB.",
        "HostedZoneId": "Z1WUMG9UYDKDTR",
        "RecordSets": [
          {
            "TTL": "900",
            "Name": "wt-dev.krogebry.com",
            "Type": "CNAME",
            "Weight": 0,
            "SetIdentifier": {
              "Ref": "AWS::StackName"
            },
            "ResourceRecords": [
              {
                "Fn::GetAtt": [
                  "EcsALB",
                  "DNSName"
                ]
              }
            ]
          }
        ]
      }
    }

When a new stack is created we'll have a DNS pointer to wt-dev.krogebry.com. When the first stack is created, the weight will be set to 0. However, since we have a total of 1 entry in the member pool, all traffic will be sent to the first stack by default. When we create a new stack, both DNS entries will have a weight of 0.

$ aws route53 list-resource-record-sets --hosted-zone-id Z1WUMG9UYDKDTR|jq '.'
    {
      "TTL": 900,
      "Type": "CNAME",
      "ResourceRecords": [
        {
          "Value": "Worko-EcsAL-QDLTVZFR4TIA-4632763.us-east-1.elb.amazonaws.com"
        }
      ],
      "Name": "wt-dev.krogebry.com.",
      "SetIdentifier": "WorkoutTracker-0-1-4",
      "Weight": 0
    },
    {
      "TTL": 900,
      "Type": "CNAME",
      "ResourceRecords": [
        {
          "Value": "Worko-EcsAL-HLL9F1B1G68O-1976995246.us-east-1.elb.amazonaws.com"
        }
      ],
      "Name": "wt-dev.krogebry.com.",
      "SetIdentifier": "WorkoutTracker-0-1-5",
      "Weight": 0
    }

In this case all traffic will be sent to both ALB's. We can change that behavior by setting one or the other to a value of 100. This will send 100% of the traffic to that stack.

Keep in mind that this has nothing to do with the ECS deployments. This is strictly for doing b/g deployments at the infrastructure level. This is important because it allows us to completely isolate all of our resources for a given stack, almost like a silo. Whatever happens in that silo will *only* happen to that silo of things. So, for example, if someone changes the properties of a security group in the first silo, it won't impact anything in the second silo.

We can switch 100% of the load over to the new stack and keep the old stack running as needed until we're ready to completely take it down. Another advantage to this methodology is that when we remove the old stack, we know that everything attached to that stack is removed.

This is a slight advantage over terraform in that we're using an AWS service to manage AWS resources where terraform relies on it's own manifest file to manage resources and state.

Read More →