TLDR version: Ensure the MTU settings on your instances are set to 1500 if you’re using Redshift!
Have you ever had connectivity issues running SQL queries against Amazon’s Redshift? Are you using one of the newer EC2 instance types available (C3, R3, etc)? Then this article might help you out!
We recently discovered an issue running queries from EC2-Classic instances onto a Redshift cluster in EC2-VPC. We ran through a number of troubleshooting steps including:
- Testing different PSQL client versions
straceon the process
- Trying simple and common queries
After running through the above steps, we discovered:
- Testing different PSQL client versions => Made no difference
- When running a strace on the process => After the query was input, it was stuck on a WAIT state
- Trying simpler queries => Very simple queries that returned smaller amount of rows, ie. SELECT count(*) from tablename, showed no issue.
Then what could the cause be? From our deduction, we realized it had to be further into the stack and nothing on the application side. We then ran into a question on ServerFault (http://serverfault.com/questions/542083/why-i-cant-execute-sql-query-on-amazon-redshift-from-vpc-private-instance) that seemed to be the same issue we were having!
The answers talk about Maximum Transmission Unit (MTU) and they talk about instances being configured with different MTU settings depending on which environment and/or instance you were using.
What is MTU? MTU stands for Maximum Transmission Unit. It is the maximum size of a data unit measured in bytes. Usually, when dealing with MTU, the default setting for Ethernet is 1500. You can check it by running this (ie. Linux server example using device eth0):
$ ip addr show eth0 2: eth0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN qlen 1000
As you can see from the above, one of my instances were set at MTU of 1500. But, what happens when I run this on a new c3.large:
$ ip addr show eth0 2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9001 qdisc pfifo_fast state UP qlen 1000
MTU of 9001? What is this? Wasn’t the default MTU settings for Ethernet set to 1500?
Here is the answer: Newer EC2 instances (c3, m3, etc) come configured with a MTU setting categorized as “Jumbo Frames."
What are Jumbo Frames? “Jumbo Frames” is essentially a setting of MTU set to greater than 1500. Depending on your application/stack, it can be very useful by making your data transmission more efficient. For each frame, it will send a larger payload of data. This is great for things like large file transfers because each frame takes up CPU time. If you can send a bigger data set in the same (or lower) number of frames, this will significantly help the transfer rate between the two devices.
So, if my c3.large instance is set at MTU 9001, that’s great, right? Unfortunately, that’s not always the case. The reason is that both endpoints should be talking at the same rate, in this case the EC2 instance(s) and the Redshift cluster. How do I know what rate I should set my EC2 instances? Fortunately, AWS has provided a couple of useful links on this very subject:
From the articles, AWS recommends that if you are in a mixed environment and/or using one of the newer of EC2 instance types, we should set the MTU size to 1500:
$ ip link set dev eth0 mtu 1500 #(for device eth0)
Once we had applied this setting, all of our queries ran perfectly and without any issues. This was fantastic as it was such an amazingly simple fix (also very easy to add to your Chef recipes)! While I understand why Jumbo Frames were set by default on the AWS AMI's, I wish this was an additional option rather than a default setting.
Hopefully this helps you when setting up your environment!
Having problems loading data to Redshift? We offer an automated solution to transfer data continuously to your Amazon Redshift cluster. Automate your data loading to Redshift today.