The Gist: I discuss some of the expectations, surprises, and nuances I experienced migrating from Amazon RDS MySQL 5.7 to Aurora Serverless v1 MySQL
If you’ve read even a couple of my posts such as Saving Money (Pretty Easily) With NAT Instances you’ll understand that I like to get maximum use for a little money as possible. I like leveraging the pay-as-you go model rather than choosing the ‘always-on’ (or always paying) model especially when your use case makes it feasible to do so.
Looking back I could not wait to try out the Amazon RDS service given my background with relational databases. Seemed hassle free: easy to setup, use, and was compatible with the version of MySQL which I was using as a back-end to an app running in an AWS Fargate container. And yes, RDS is a great service removing the need to manage servers. However, even with the smallest instance type I was not enamored with paying about $15 a month when the service was not in use. Even if you stop an RDS database, it will auto-start after 7 days.
You can pay more up front to lower costs with a reserved instance commitment – but since I was kicking the tires on an idea, ‘pay as you go’ suited me just fine. So, for me, I want a way to pay (close) zero dollars when the database is not being used but also take advantage of what RDS as a service offers: easy setup, operations, and scaling.
Aurora Serverless seems to present just the thing I’m looking for: comes with a MySQL compatible option and challenges the notion of a database by offering an on-demand relational database that can scale up or down when necessary bypassing the need to provision compute and storage upfront. Furthermore, you are only charged for storage when its idle. This is ideal for non mission critical apps and non-production environments. I took on this personal challenge to migrate to Aurora Serverless as way to reduce my database costs.
Accessing the Database Cluster
You can use the CLI or AWS Console to create the serverless cluster, the console makes it easy. However, the database is only accessible via your VPC and the database cannot have a public IP address, which means there is no direct way to access the database over the internet. Its common to interact with a MySQL database via MySQL Workbench or similar application. But it you cannot gain access directly, how can you do your work? This is how I made that work:
- provisioned a small EC2 instance in a public subnet – accessible from the internet only by me
- allow the EC2’s security group to access the Aurora Serverless cluster, which is located in a private subnet
- setup a VNC server on the EC2 instance that enables remote desktop access to the EC2 where MySQL workbench is installed
- copied the original MySQL dump file to the EC2 instance for import
Features and Caveats of Aurora Serverless
Advertising for Aurora states there are two available versions one for MySQL and another for PostgreSQL, compatible with specific versions for each. Meaning, your existing code and DB driver libraries should not require modification or changes. I found this largely the case. I was pleased I did not have to make major changes to the code for Aurora Serverless. There are of course limitations and requirements of the service.
In consideration of my goals for this project, the service’s two biggest features are of primary interest:
- Scaling down to zero capacity (meaning no charge for compute resources when not in use) and
- Start up from zero capacity, when I want the resources to be active.
Clearly, these are the upsides. What follow are some nuances of these features.
Scaling to Zero Takes Time
While testing I noticed that after about 40 minutes of app inactivity the cluster was still up. Huh? I confirmed the cluster’s configuration for pausing the cluster after 5 minutes of inactivity. So why was the cluster still up? Soon after however, capacity went down to zero after about an hour. My expectations were that database inactivity would be detected sooner than an hour. So lets re-read the Pause and Resume Statement:
“You can choose to pause your Aurora Serverless v1 DB cluster after a given amount of time with no activity. You specify the amount of time with no activity before the DB cluster is paused.”
So the key is no activity. And while I saw that the app had no activity, you need to read carefully and understand the context: by no activity they are talking about database connections.
Handling database connections is a very important aspect of any meaningful application that needs to store state. Usually, it relatively costly (in terms of time and user experience) to create them when needed. Hence, a very common approach is to employ database connection pools, which are collections of active database connections on standby – ready to use, thereby cutting the time in creating them. These connections are managed by the pool and STAY connected to the database even as the application has no active users. Now in the context of using Aurora Serverless, this can be problematic: your application is not being used, yet you are being charged for an active database. So the key is getting to NO ACTIVITY as quickly as possible to prevent spending more money than necessary.
Revised Goal #1:
Scaling Down to Zero Capacity “faster than after an hour of inactivity”
Timeout Configuration – Not Quite There
MySQL has a configuration parameter that allows you to tell the server to kill inactive connections after a period of time. Specifically, these are the ‘wait_timeout’ and ‘interactive_timeout’ parameters. By default, MySQL sets the wait_timeout to 28800 seconds or 8 days. However, I had these parameters set for 480 seconds or 8 minutes. These configuration options are available to Aurora Serverless as well. So why was the cluster still active after 8 minutes of inactivity? I could see by running processList that there were sleeping connections that remained well after the 8 minute window. Basically, it seemed to me that the server configuration for wait_timeout was of no effect and ‘NO ACTIVITY’ was only detected after an hour, and 5 minutes later, database capacity scaled to zero. Still, should it take an hour to detect no activity? And while the DB engine needs to make this determination for its self and I’m sure its a complex undertaking, I was not satisfied waiting longer than I needed to when I know the app is inactive.
FYI : see this AWS Forums thread, which concludes with “that [i.e., wait_timeout] parameter (as well as many others) is not modifiable for Aurora Serverless yet.”
Monitoring Database Connection Activity
MySQL gives you the ability to kill sleeping connections yourself and AWS has implemented their own version of this procedure for Aurora called RDS_KILL. So my idea is that if the DB engine did not kill sleeping connections as per wait_timeout configuration, that I could somehow kill them myself from within the app. The implementation boils down to the following:
- Added functionality to track the acquisition and release times of database connections from the connection pool
- Use this information to track active connection ids; and after a determined period of application inactivity (15 minutes) call a DB function to terminate the connection IDs via the RDS_Kill procedure.
This worked exactly as planned! And as configured and advertised, 5 minutes later, database capacity scaled down to zero. Goal achieved!
(Note: with this new class of database server, it will make sense for connection pools to begin to add functionality to kill inactive connections when configured to do so – like Jeremy Daly’s serverless-mysql library, which I was not using but may in the future)
Log entry Terminating Inactive Database Connections
AWS Console Metric Snapshot of Active DB Connections
Turns out I’m not out of the woods yet. As advertised and expected, there is a 20 to 30 second delay as the cluster must startup and be ready to accept database connections. This does not sound like much, but it certainly translates into you user waiting for a webpage to respond. Granted, this happens to only the first user that comes uses your service after scaling down. Still this is not entirely desirable and the service does come with some caveats. Adding to this problem is that most database connection requests are configured to timeout after 10 seconds or so. Still I wondered of there was a way to weave a solution to this issue and still meet my goal of not paying for DB services when not is use.
Revised Goal #2:
“Eliminate impact to user experience” Starting up from zero capacity when I want the resources to be active.
Solving This Problem – Rethinking The Login Process
First, I tried to extend the connection timeout on the connection pool in an attempt to wait for the cluster while the request was being processed. This proved pointless as the result seemed to always be a timeout giving the user the impression the site was down or was experiencing issues. Which led me to rethink the login process: after a period of inactivity, the next request that comes in will be to login. Typically, just bringing up the login page does not require a database connection but processing a login request does.
Instead, when the login page is requested (prior to the actual login) I tap into the connection monitor data to first determine if the application is in an inactive state, which now means that the database cluster is not ready and the connection pool will be without an active connection available. If this is the case, initiate a database connection to startup the cluster and obtain a connection. This starts the cluster startup process BEFORE the user actually logins in. This seemed to work well and I got the results I was after: the app was available when the user wanted it. Goal achieved!
Final thoughts: Assuming the service offerings and limitations are not show stoppers for your use case, Aurora Serverless is a viable database option. If you require an always available database service, RDS should suffice just fine and may come in a little cheaper depending on your high availability needs. Certainly for infrequently used apps and testing environments I think its possible to utilize the serverless option to save money. Also, worth checking out is the serverless “clone” feature that can spin up databases very quickly and can be removed easily too, which is ideal for testing.
I must say this was a fun exercise figuring a way to tweak things to suit my needs. I can’t help but think that the serverless paradigm will become more prevalent and encourage new ways to think about solutions. Why pay for computing capacity when you’re not using it?