Optimize your EC2 connection to s3 within the same VPC with Gateway VPC endpoint
When an EC2 instance sends data to s3, the traffic by default will route through the internet. There might be security (or regulatory) concerns as traffic leaves the VPC. Or the obvious issue with the above approach is the additional cost involved. When traffic route through the internet, there will be charges for NAT Gateway (chargeable per GB processed). For example, if 100 GB of traffic was sent to s3, the cost in the Asia Pacific Singapore region would be $5.9. Extras that can be avoided.
Gateway VPC endpoint
In May 2015, this feature was announced for s3. Traffic from EC2 (or other resources within a VPC) to s3 does not have to go through the internet. When EC2 is sending to s3, the route-table entry will route the request over to S3 directly instead of NAT Gateway.
What about performance
If EC2 does not have to route through traffic, there’s one lesser dependent which is NAT Gateway bandwidth. Theoretically if traffic is entirely in the same AWS network, it should result in better performance. Let’s try and test this:
Test set-up
- Lambda function with a varying timeout
- Download an 80MB file from S3 repeatedly
- Measure the number of times the file is downloaded over the timeout window
Code on python3.7
import json
import urllib.parse
import boto3s3 = boto3.client('s3', region_name='ap-southeast-1')def lambda_handler(event, context):
totalDownload = 0;
# Infinite loop to download file and print the number of time
while True:
s3.download_file('bucket_name',
'file_name', '/tmp/file_name')
totalDownload += 1;
print('totalDownload:', totalDownload);
Result
Different lambda spec was used with a varying timeout. All the results show an improvement in the download count when the network is routed internally.
How can I set this up
You can refer to this well-written doc by AWS for the step by step instructions. Gateway VPC endpoint has expanded to include dynamoDB as well.