We recently had occasion to reconfigure some of our existing servers to use Amazon Web Services Elastic Load Balancers in front of them. Setting this up isn't hard, exactly, but there are a lot of moving parts that have to mesh correctly before things start to work, so I thought I'd write down what we did.
All of these tools have lots of options and ways to use them. I'm not trying to cover all the possibilities here. I'm just showing what we ended up doing.
We had some specific goals we wanted to achieve in this reconfiguration.
- There should be no outside requests sneaking in -- the only requests that should reach the backend servers are those that come through the load balancer. We'll achieve this by setting the backend servers' security group(s) to only allow incoming traffic on those ports from the load balancer's security group.
- The site should only handle requests that have the right Host header. We achieve this already by Nginx configuration (
server_name) and won't have to change anything.
- Redirect any non-SSL requests to SSL. The load balancer can't do this for us (as far as I could see), so we just forward incoming port 80 requests to the server’s port 80, and let our existing port 80 Nginx configuration continue to redirect all requests to our https: URL.
- All SSL connections are terminated at the load balancer. Our site certificate and key are only needed on the load balancer. The backend servers don't need to process encryption, nor do we need to maintain SSL configuration on them. We'll have the load balancers forward the connections, unencrypted, to a new listening port in nginx, 8088, because we're redirecting everything from port 80 to https. (We could have configured the port 80 server to figure out from the headers whether the connection came into the load balancer over SSL, but we didn't, figuring that using a separate port would be fool-proof.) If we were concerned about security of the data between the load balancer and the backend, for example if financial or personal information was included, we could re-encrypt the forwarded connections, maybe using self-signed certificates on the backend servers to simplify managing their configurations.
Strict-Transport-Securityheader - we add this already in our Nginx configuration and will include it in our port 8088 configuration.
- We need to access backend servers directly for deploys (via ssh). We achieve this by keeping our elastic IP addresses on our backend servers so they have stable IP addresses, even though the load balancers don't need them.
- Some of our servers use basic auth (to keep unreleased sites private). This is in our Nginx configuration, but we'll need to open up the health check URL to bypass basic auth, since the load balancers can't provide basic auth on health checks.
- Sites stay up through the change. We achieve this by making the changes incrementally, and making sure at all times there's a path for incoming requests to be handled.
All the pieces
Here are all the pieces that we had to get in place:
- The site's hostname is a CNAME for the elastic load balancer’s hostname, so that requests for the site go to the load balancer instead of the backend servers. Don’t use the load balancer IP addresses directly, since they’ll change over time.
- The backend servers' security group allows incoming requests on ports 80 and 8088, but only from the load balancer's security group. That allows the load balancer to forward requests, but requests cannot be sent directly to the backend servers even if someone knows their addresses.
- There's a health check URL on the backend server that the load balancer can access, and that returns a 200 status (not 301 or 401 or anything else), so the load balancers can determine if the backend servers are up.
- Apart from the health check, redirect port 80 requests to the https URL of the server (non-SSL to SSL), so that any incoming requests that aren't over SSL will be redirected to SSL.
- Get the data about the request's origin from the headers where the load balancer puts it, and pass it along to Django in the headers that our Django configuration is expecting. This lets Django tell whether a request came in securely.
- The load balancer must be in the same region as the servers (AWS requirement).
- Keep the elastic IP on our backend server so we can use that to get to it for administration. Deploys and other administrative tasks can no longer use the site domain name to access the backend server, since it now points at the load balancer.
Where we started
Before adding the load balancer, our site was running on EC2 servers with Ubuntu. Nginx was accepting incoming requests on ports 80 and 443, redirecting all port 80 requests to https, adding basic auth on port 443 on some servers, proxying some port 443 requests to gunicorn with our Django application, and serving static files for the rest.
To summarize our backend server configuration before and after the change:
- Port 80 redirects all requests to https://server_URL
- Port 443 terminates SSL and processes requests
- Server firewall and AWS security group allow all incoming connections on port 80 and 443
- Port 80 redirects all requests to https://server_URL
- Port 8088 processes requests
- Server firewall and AWS security group allow port 80 and 8088 connections from the load balancer only, and no port 443 connections at all.
Steps in order
- DNS: shorten the DNS cache time for the site domain names to something like 5 minutes, so when we start changing them later, clients will pick up the change quickly. We'll lengthen these again when we're done.
- Django: if needed, create a new view for health checks. We made one at
/health/that simply returned a response with status 200, bypassing all authentication. We can enhance that view later to do more checking, such as making sure the database is accessible.
- Nginx: We added a new port 8088 server, copying the configuration from our existing port 443 server, but removing the ssl directives. We did keep the line that added the
- Nginx: Added configuration in our new port 8088 to bypass basic auth for the
- Ufw: opened port 8088 in the Linux firewall.
- AWS: opened port 8088 in the servers' security group - for now, from all source addresses so we can test easily as we go.
- AWS: add the SSL certificate in IAM
- AWS: create a new load balancer in the same region as the servers
- AWS: configure the new load balancer:
- configure to use the SSL certificate
- set up a security group for the load balancer. It needs to accept incoming connections from the internet on ports 80 and 443.
- instances: the backend servers this load balancer will forward to
- health check: port 8088, URL
/health/. Set the period and number of checks small for now, e.g. 30 seconds and 2 checks.
- listeners: 80->80, 443 ssl -> 8088 non-ssl
- Tests: Now stop to make sure things are working right so far:
- The load balancer should show the instance in service (after the health check period has passed).
- With the site domain set in your local
/etc/hostsfile to point at one of the load balancer's IP addresses, the site should work on ports 80 & 443
undo your local /etc/hosts changes since the load balancer IPs will change over time!
AWS: update the backend servers' security group to only accept 8088 traffic from the load balancer's security group
Test: the health check should still pass, since it's coming in on port 8088 from the load balancer.
DNS: update DNS to make the site domain a CNAME for the load balancer's A name
- wait for DNS propagation
- test: site should still work when accessed using its hostname.
These steps should now be safe to do, but it doesn't hurt to test again after each step, just to be sure.
- Nginx: remove port 443 server from nginx configuration.
- AWS: remove port 443 from backend servers’ security group. Configure ports 80 and 8088 to only accept incoming connections from the load balancer's security group.
- Ufw: block port 443 in server firewall
- AWS: in the load balancer health check configuration, lengthen the time between health checks, and optionally require more passing checks before treating an instance as live.
- docs: Update your deploy docs with the changes to how the servers are deployed!
- DNS: lengthen the cache time for the server domain name(s) if you had shortened it before.