Iskandar Setiadi

Flag #9 - Scalable Deployment with Terraform + Docker + ECS



Update (March 2, 2018): This blog post is originally written on April 2016 with Terraform 0.6. On March 2018, ELB is updated to ALB and the code is now tested with Terraform 0.11.

What have changed (compared to 2016 version):
- Terraform version is locked to 0.11.X for preventing breaking changes. Feel free to update it :)
- ELB component is changed to ALB
- Before running terraform plan, you need to execute terraform init first for 1st time initialization
- vpc_id is now added in asg/configuration.tfvars
- AMI image is updated to the latest (ami-2015.09.g --> ami-2017.09.i)


Hello everyone, freedomofkeima here, it's been a while. So, it's been six months for me to live in Japan, time flies so fast :') Anyway, let's have some technical discussions today :D

So, deploying and maintaining software applications are two continuous tasks which we often face after writing our lovely piece of codes. Amazon Web Services (AWS) provides a lot of helpful services to help developers deploying their system in the cloud. Initially, we only need one running server and we start using EC2. A week later, we decide to serve static files and write our system logs to S3. As the company grows, we need to use Autoscaling Group (ASG) in managing our EC2 instances. At this point, we write a very long script in "User Data" (Launch Configuration) to configure our instances. It's getting messy when our web application crashes and we need to configure recovery mechanism (cronjob, monit, AWS Lambda, etc). Several months later, we need to send notification services to our users and we decide to create a new worker cluster which utilizes SQS, SNS, etc. We may also need to couple different subnets via VPC, and so on and so forth. The real problem comes: at a certain point, you need more than 1 hour to deploy newer version of your system to AWS. There's also some factors of human error which may crash your entire system during deployment.

Terraform is a common configuration that you need to write one time for your entire infrastructure. Terraform helps us in storing current environment states and changing infrastructure as needed. In this blog post, I'll limit our discussion to AWS-related infrastructure only. I've already prepared a small example at https://github.com/freedomofkeima/terraform-docker-ecs which will be explained thoroughly in this post. Feel free to ask here if you need to clarify something :)

Goals

Let's put it short. First, we don't want to read the bloated syntax of CloudFormation. Also, Terraform has a good visualization tools and planning phases. Well, I don't say you should choose Terraform over CloudFormation, but anyway, I've tried both and I prefer using Terraform to CloudFormation.

Second, you want to create a number of different resources across services. Terraform is not limited to AWS, so we can even maintain our infrastructure across different providers.

Third, you want to update your infrastructure in one or two clicks and let the script run the automation for you. With Terraform, you can even release a newer version of application via Docker tags and ECS task definition in one click.

Also, if you're wondering, Terraform is kind of different compared to Chef and Puppet: https://www.terraform.io/intro/vs/chef-puppet.html. So, shall we start the technical side? :D

Prerequisite

Go to https://www.terraform.io/downloads.html and download the latest version of Terraform. Also, you need to set AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY environment variables in your local machine. Precaution note, don't let these values out from your local machine!

IAM

First, we need to create some IAM related configuration for our web application. In this scenario, we will create instance profile for our instances and IAM role for ECS services.

$ git clone https://github.com/freedomofkeima/terraform-docker-ecs  # clone the repository
$ cd static/iam/   # access iam directory
$ vi/vim configurations.tfvars  # Fill the variables: name_prefix for your environment and which region you want to deploy
$ terraform plan -var-file=configuration.tfvars  # Check the changes which will be executed by Terraform
$ terraform apply -var-file=configuration.tfvars

At this point, you will see the following outputs if there's no problem with the execution.

VPC and Security Groups

Now, it's time to create VPC stuffs and security groups for our system. In order to increase the availability of our websites, we will deploy our web application in two different availability zones: ap-northeast-1a and ap-northeast-1b. In production, we may also need to deploy it in different regions (ap-southeast-1, us-east-1, us-west-1, etc).

$ cd common/  # access common directory
$ vi/vim configurations.tfvars  # Fill the variables: same as above, with addition of subnet_azs='a,b' for ap-northeast-1a and ap-northeast-1b
$ terraform plan -var-file=configuration.tfvars  # Check the changes which will be executed by Terraform

As you can see, Terraform states there are 14 new things to add here. With Terraform, it means that you don't need to create these 14 different things manually in each region, and instead, you can create it in a single terraform apply command.

Finally, let's create these environments with $ terraform apply -var-file=configuration.tfvars. Now, you will most likely see the following outputs:

ASG + ELB, and ECS

Finally, at this point, we will create the core part of our system: ASG + ELB, and ECS cluster. If you access configuration.tfvars in asg directory, you can see the following configurations:

name_prefix = "tutorial-test"
aws_region = "ap-northeast-1"
ecs_image_id.ap-northeast-1 = "ami-b3afa2dd"
count_webapp = 2
desired_capacity_on_demand = 2
ec2_key_name = "key-name"
instance_type = "t2.micro"
minimum_healthy_percent_webapp = 50

webapp_docker_image_name = "training/webapp"
webapp_docker_image_tag = "latest"

sg_webapp_elbs_id = "sg-12345678"
sg_webapp_instances_id = "sg-23456789"
subnet_ids = "subnet-34567890,subnet-4567890a"

ecs_instance_profile = "arn:aws:iam::123456789012:instance-profile/tutorial-test_ecs_instance_profile"
ecs_service_role = "tutorial-test_ecs_service_role"

The last 5 variables are outputs from the prior two terraform apply that we have done previously.
count_webapp is the number of desired tasks in ECS console. For example, if we have a worker cluster, we may have a lot of different task definitions that we want to run in one instance.
desired_capacity_on_demand is the number of EC2 instances that we want to launch.
If you're rich enough, you may want to launch 10, 20, 30 instances simultaneously :) ec2_key_name is your instance's SSH key name.

Same as the previous two runs, we will now execute the following:

$ terraform plan -var-file=configuration.tfvars  # Check the changes which will be executed by Terraform
$ terraform apply -var-file=configuration.tfvars

If everything is OK, you can access the given URL elb_dns_name in your browser. Normally, you need to wait around 5 to 10 minutes before your first ECS task starts properly (instance preparation, ELB health check, etc).

The following screenshot is taken from ECS console after all tasks are started properly:

And, TA-DA~ "Hello World!" is shown.

If we want to do versioning, we can simply change the value of webapp_docker_image_tag to the newer version of tag. For example, with the configuration above, if we change the value of webapp_docker_image_tag, ECS will kill 1 task from 2 (count_webapp = 2, minimum_healthy_percent_webapp = 50) and replace it with the newer version. After our newer version is up, ECS will proceed in killing the old one and launch the newer version of your application.

account_name/docker_image:tag_version1 --> running = 2
account_name/docker_image:tag_version2 --> running = 0

to

account_name/docker_image:tag_version1 --> running = 1
account_name/docker_image:tag_version2 --> running = 1

to

account_name/docker_image:tag_version1 --> running = 0
account_name/docker_image:tag_version2 --> running = 2

Closing Remarks

If you want to use private Docker hub instead of public one (of course a lot of you want this one, lol), it's time to write some Terraform configurations as a small practice :p The idea is to utilize S3 bucket: you need to add IAM role policy to read S3 bucket for storing Docker credentials and load it during application startup (via asg/autoscaling_user_data.tpl).

So, that's all for today. See you next time!

--
Iskandar Setiadi
Software Engineer at HDE, Inc.
Freedomofkeima's Github

Author: Iskandar Setiadi - Type: Work - Date: 19th Mar, 2016


# Back

  • Mark

    Is it possible you can elaborate on the "private Docker hub" configuration perhaps?
    1
  • Neill Turner

    Updated your example to use an ALB instead of an ELB. Hope that's helpful. Nice simple example of using ECS. LOt of other examples folks have written are very complex with too many features for an intro example,
    2