Software Engineering

Building a Kubernetes Cluster on AWS EKS using Terraform – Part IV

Part IV – creating a resilient cluster

In the last article of the series, we defined and configured some Security Groups and configured rules for them as an introduction to their functionality. There will be more additional Security Groups for resources we create in this and following articles, but they will work the same way.

In this article, we will finally set up the actual resources for the EKS cluster, using all of the infrastructure we prepared until now. Be wary though: these will be the first resources that can create significant costs on your AWS account – but they will be be pretty low, especially if you only run your instances to try out the terraform scripts and destroy them right after.

The Kubernetes part

If you have worked with Kubernetes before, you know that a cluster needs at least one master node, which coordinates information between all nodes in the cluster, and a variable number of application nodes to run your actual deployments.

Before we create the master, we need to set up one more pile of resources in preparation – EKS requires a set of permissions for your AWS account to set up everything it needs. For this, we need to use IAM, the permission control service of AWS, to set up a role for our EKS master including relevant permissions in the form of policies:

We can then finally set up the master node, using EKS. We will use the Security Group we created for the master in the last article for this. We also connect it to all of our application subnets:

And that’s already it – we don’t need anything else to create our master!

Highly available worker nodes

The worker nodes are a little bit more tricky. We want to make sure that we always have a certain number of nodes running to deploy our services later. We will use a combination of AWS resources to make sure that this is the case. First, we need to set up some more IAM roles:

These permissions allow our following resources to start up and control EC2 instances, which is the service that will implement our actual application nodes.

First, we need to decide on a base image that can be used for every node we set up. Amazon provides and regularly updates a set of images for EKS application nodes:

Next, we will use this AMI as base for a Launch Configuration. These serve as manual used by our next resource for setting up the EC2 instances we need:

We first use the CLI tool xtrace to put some information from our EKS master in the right format. Then we declare the Lauch Configuration for our worker nodes, using the AMI image, a name prefix, the security group we created for the nodes previously and the user data we set up with xtrace. You can also associate a key pair with your nodes, allowing you to connect to them using SSH at a later point. We also define the instance type we want our nodes to use; if you want to minimize costs, you use smaller instance types. For the real deployments at a later point, I recommend putting variable values like the instance_type into variables so that you can decide their value when you run Terraform.

With the Launch Configuration created, we now build the resource that actually creates and monitors our instances – an Autoscaling Group.

The Autoscaling Group tries and adhere to a set of rules defined in this resource – with this example, it will try to always have two worker node instances running. It never stops the last instance so that always one node is running (min_size) and it also never starts up a fourth instance, even if all already existing instances are running out of computing resources (max_size). It creates those instances in the subnets we put into vpc_zone_identifier, where we input our application Subnets again.

Accepting the nodes on the Kubernetes level

There is one more tricky thing to do: as it is, our worker nodes try to register at our EKS master, but they are not accepted into the cluster. We need to create a config map in our running Kubernetes cluster to accept them. This can be done directly using Kubernetes using the CLI tool kubectl, but you can also use Terraform to do this. You need to set up a new provider for this to address your Kubernetes cluster.

EKS integrates with Amazon’s account and permission services, which means that you need an AWS IAM token to connect to the master. To obtain the token, we use the AWS CLI tool and define a command as a data source. For this to work, make sure that the AWS CLI tool is set up on your computer, including the credentials required to connect to the AWS account containing the EKS master. We then use the token and some information from our EKS master to set up the provider connecting to Kubernetes. Then we use it to create the mentioned config map, allowing the IAM role created for our nodes into the cluster:

Now the cluster finally comes together – the nodes can connect to the master and the master can now manage the cluster. But wait – there’s still one more thing that would be pretty handy: connecting to our cluster manually to actually administrate it. For that, we use the CLI tool kubectl. You can easily set it up for your cluster by defining an output in Terraform to generate the configuration file you need:

This configuration file again uses the AWS CLI tool to obtain a token to log into your cluster. By defining it as an output, you can use Terraform to build it after the infrastructure is set up:

It will then be emitted at the end of a successful „terraform apply“ or when you directly generate it using „terraform output kubeconfig“.

Up and running, but not reachable?

With all of this finished, our cluster is now running in the cloud. Using kubectl, you can check out your nodes after they joined your cluster via „kubectl get nodes“. To actually use the cluster to deploy services, we still need more infrastructure elements though: in the next article, we will create and configure an AWS Application Load Balancer to serve as the endpoint for our cluster.

You can check out the code in my GitHub-Repository for the article series, which includes the Security Group and variable configuration we didn’t go very deeply into in this article. I also made a structural change, getting rid of the „security_groups“ module in favor of an „eks“ module, creating Security Groups closer to the resources they are made for. Don’t forget to enter your values for the access keys and region in the .tfvars file and the state bucket configuration before running it.