The Gist: In this post I fill a gap exposed by using NAT Instances over NAT Gateways by using EC2 Auto Scaling Groups (ASGs) to monitor my NAT Instance. Along the way I point out some trouble spots
In my post New Way To Run WordPress on AWS Fargate I show how its possible to use container technology to host a blog site – this site to be exact. As I explain, the key driver was to move away from always-on (or always paying) when your situation does not require it. In this post Saving Money (Pretty Easily) With NAT Instances dramatic cost savings are achieved by using a NAT Instance over a NAT Gateway as depicted in the image below.
Here I want to reinforce the choice of using NAT Instances and attempt to fill a gap exposed by using them over NAT Gateways. Specifically, I want to enhance availability by using EC2 Auto Scaling Groups (ASGs) to monitor my NAT Instance.
So What Gap?
The gap with NAT Instances is that you need to manage the resource yourself. Arguably, one of the biggest gaps is with availability. If the instance unexpectedly terminates your resources will lose internet access. NAT Gateways allow you to protect resources in a private subnet and connect to services outside of you VPC while preventing external services from initiating connections with your resources. NAT Gateways are managed services, meaning you do not have to worry about its availability, scalability, or failures. These qualities are awesome given the right circumstances. The price for this in the US East Ohio region amounts to a minimum charge of $32.40 a month or $389 a year. This does not include data transferred or processed. For a sometimes used environment or application that is not paying for itself, that adds up fast. This role can easily be filled with a NAT instance, and my earlier post demonstrates how. So, how you can enhance the availability of your NAT Instance?
Enter EC2 Auto Scaling Groups
In a nutshell, ASGs allow you to monitor EC2 instances via health checks and replace them automatically when they fail or terminate. So in this case, the specs for the NAT instance are defined in a Launch Template. The ASG uses this template as a specification when replacements are required. ASGs can monitor various metrics and add or remove instances as needed to attain required metrics. My needs are simple: I only needed a maximum of one running instance at any time in one Availability Zone (AZ) and my network performance needs are very low.
Use the Console — Not So Fast
Usually when I try something new I like to use console to get a feel for what information is required and in theory access user-friendly error messages as you learn how to use the service. So, that’s what I did after reading through the guided tutorial on EC2 Auto Scaling. Using the console I ran into this error message:
Incompatible Launch Template: network interface id cannot be specified as console support using an existing network interface with Auto scaling is not available. So…on to the CLI! Overall, at least two things have to occur:
- Create the specification for the instance: place the characteristics of the NAT instance in a Launch Template.
- Create an ASG based on the launch template: To maintain the constant number of instances I want (one) EC2 Auto Scaling detects and responds to EC2 health checks automatically. If the instance cannot be reached, the ASG will launch another soon after detection.
Using the CLI
Let’s review the spec: my instance is playing the role of NAT for my private subnets. This means, that the route table that these subnets depend on requires a route from the private subnet to the NAT. The occurs as you place a route for 0.0.0.0/0 to the network interface of the NAT instance (ENI). So in order for any replacement instances to slide into this role, we can either share the ENI between instances or assemble the required commands to replace the old ENI with the new one when it gets created via the launch template or perhaps tap into the ASG event and write a Lambda function to run the commands – yet another serverless use case! I chose the former as it seems like less moving parts for my needs.
With reusing the ENI in mind, if you try and create the launch template, you’ll quickly realize that you need to first create the network interface. Ordinarily, when you create a launch template you specify the subnet in which to create the new ENI. However, if you are specifying an existing ENI the instance is launched in the subnet where the network interface is located. So when we create the ENI, we’ll need to specify the subnet at this time:
aws ec2 create-network-interface \ --description "my-ENI-public-us-east 1b" \ --groups sg-064555599aacccccb \ --subnet-id subnet-099cc23f2e \ --tag-specifications "[{\"ResourceType\": \"network-interface\",\"Tags\": [{\"Key\": \"role\",\"Value\": \"nat instance\"}]}]"
Then we can create the route and disable source/destination checks. Recall from the earlier post that in order to perform as a NAT, source/destination checks must be disabled. This is because AWS ensures that all EC2 instances respond to only their requests. A NAT has to respond on behalf of other requests.
aws ec2 create-route \ --route-table-id rtb-0000d0da00a00000 \ --destination-cidr-block 0.0.0.0/0 \ --network-interface-id eni-01aaccbb012345abc
aws ec2 modify-network-interface-attribute \ --network-interface-id eni-01aaccbb012345abc \ --no-source-dest-check
Now we can create the Launch Template. Recall that two commands are needed to create the NAT:
-
sysctl -w net.ipv4.ip_forward=1
-
/sbin/iptables -t nat -A POSTROUTING -o eth0 -j MASQUERADE
User Data is an easy way to launch an instance and then run commands on your Linux instance. There are some limitations as specified here.
aws ec2 create-launch-template \ --launch-template-name my-nat-instance-for-auto-scaling \ --version-description version1 \ --launch-template-data \ '{"NetworkInterfaces": [{"DeviceIndex":0,"NetworkInterfaceId":"eni-01aaccbb012345abc","AssociatePublicIpAddress":true,"Groups":["sg-064555599aacccccb"], "DeleteOnTermination":false, }],"ImageId":"ami-0a281754c3786ceb2","InstanceType":"t2.nano","KeyName":"my-NAT-instance-key-pair",}', \ "InstanceInitiatedShutdownBehavior":"terminate", \ "UserData":"Content-Type: multipart/mixed; boundary=\"//\" MIME-Version: 1.0 --// Content-Type: text/cloud-config; charset=\"us-ascii\" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename=\"cloud-config.txt\" #cloud-config cloud_final_modules: - [scripts-user, always] --// Content-Type: text/x-shellscript; charset=\"us-ascii\" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename=\"userdata.txt\" #!/bin/bash sysctl -w net.ipv4.ip_forward=1 /sbin/iptables -t nat -A POSTROUTING -o eth0 -j MASQUERADE currentDate=`date` /bin/echo \"NAT Instance launched\" ${currentDate} >> /tmp/userDataLaunchRun.txt --//"
This is what has been specified:
- assigned an existing network interface that is getting a public IP address and a security group
- specifies the Amazon Machine Image, instance type and key-pair
- specifies the Linux commands to run at launch
And finally, create the ASG:
aws autoscaling create-auto-scaling-group \ --auto-scaling-group-name my-single-instance-NAT-ASG \ --launch-template LaunchTemplateName=my-nat-instance-for-auto-scaling,Version='1' \ --min-size 1 \ --max-size 1 \ --availability-zones us-east-1b
You can see here that I’m asking for only on instance to be running at any time.
Note: When you use a launch template that specifies an existing ENI for eth0 (see example #4 here), you must specify an Availability Zone for the Auto Scaling group that matches the network interface, without also specifying a subnet ID in the request.
*Done*
So now you can play around: if you terminate the running instance you’ll soon see another launch in its place. So there you have it, a pretty straight forward way to reinforce your NAT Instance on the cheap.
Onward!