Handling UTF-8 Characters in Redshift

in Redshift
 • Updated on 
utf8

# Amazon Redshift & UTF-8

One of the things to be aware of when transferring data to Amazon Redshift is related to the handling of UTF-8 characters. When moving data from external data sources into Redshift, you may encounter an error that looks like below.

ERROR: Load into table 'alex_test_table_char' failed. Check stl_load_errors system table for details.

And in your stl_load_errors it will say:

err_code  | err_reason
----------+------------------------------------------------------------------------------------------------------
1220      | Multibyte character not supported for CHAR (Hint: try using VARCHAR). Invalid char: e6 97 ad

This is because we were using a column that was a CHAR type, which only supports single-byte ASCII characters. When the data that you want to load contains UTF-8 characters, you will want a column of the VARCHAR type.

# Using a VARCHAR Column Instead

Now let’s try this again using a VARCHAR(20). The results are:

INFO: Load into table 'alex_test_table_varchar' completed, 3 record(s) loaded successfully.

So, finally we were able to complete the load successfully.

select * from alex_test_table_varchar;

id | name
---+------------
2  | 旭川清
1  | John Doe
3  | (´・ω・`)

Things have gotten a lot more convenient compared to before, when Amazon Redshift was just released. Redshift now supports up to 4 byte UTF-8 characters, which covers many of the use cases.

# Handling NUL Values

One thing to note about Amazon Redshift is that if you have a string that contains NUL(U+0000) values, then those NUL values will cause a load error when you’re try to copy that data. Let’s say your source data is:

1,aaaa,bbbb

where,

  • The table schema is (int, varchar, varchar)
  • `` is the NUL character

When this record is copied, you will see a load error as follows:

err_reason | raw_line
Missing newline: Unexpected character 0x61 found at location 3 | 1,aa

The reason why raw_line is 1,aa is because the NUL character is regarded as the end-of-line. As a result, Redshift fails to load the data due to the missing 3rd column value. To avoid this, you have to replace NUL values before running the COPY command. For example, escaping NUL characters like "\x00" is a durable workaround.

# Using ACCEPTINVCHARS

It is not rare to have invalid UTF-8 characters in your data, causing the load error when copying. In case of invalid characters, Redshift provides ACCEPTINVCHARS option for the COPY command, which replaces invalid characters with a replacement character.

copy <your_table_name> from <your_data_source.csv>
credentials 'aws_access_key_id=<your_key_id>;aws_secret_access_key=<your_secret_access_key>'
delimiter ‘,’
acceptinvchars;

For example, if your data contains an invalid character,

1,aa\xc3aa,bbbb

where,

  • \xc3 is an invalid character

You will see the following log message when copying:

INFO: Load into table 'utf8test' completed, 1 record(s) loaded successfully.
INFO: Load into table 'utf8test' completed, 1 record(s) were loaded with replacements made for ACCEPTINVCHARS. Check 'stl_replacements' system table for details.

And the record will be:

id  | val_1 | val_2
----+-------+-------
1   | aa?aa | bbbb

"?" is the default replacement character. You can change the replacement character by setting ACCEPTINVCHARS as .

Reference Links

http://docs.aws.amazon.com/redshift/latest/dg/r_Character_types.html#r_Character_types-storage-and-ranges

http://docs.aws.amazon.com/redshift/latest/dg/t_preparing-input-data.html

https://aws.amazon.com/releasenotes/Amazon-Redshift/9126302638893128

# How FlyData Can Help

FlyData provides continuous, near real-time replication between RDS, MySQL and PostgreSQL databases to Amazon Redshift. The FlyData Sync tool is an intuitive, powerful, cost-effective way to integrate your transactional databases and analytical data stores in a single interface with no manual scripting.

You can start a 14-day Free Trial and begin syncing your data within minutes. For questions about FlyData and how we can help accelerate your use-case and journey on Amazon Redshift, connect with us at support@flydata.com.

Worry-free replication from source to Redshift & Snowflake
Unlimited sync during trial
No credit-card required
World class support
Try FlyData for free
Amazon Partner Logo Certified AWS
Redshift partner
Get started. Try FlyData.
Quick setup. No credit card required. Unlimited sync during trial.
Fast and secure Enterprise-grade security and near real-time sync.
World-class support Proactive monitoring from technical experts, 24/7.
Straightforward pricing Pay for the rows you use, and nothing you don’t.