Sync RDS MySQL to Redshift
To take advantage of the free trial with FlyData for Amazon Redshift, you first need to register with Amazon Redshift. Do this while they’re offering as well a free trial account. Be sure to take note of your login and other information on Amazon Redshift. You’ll need that for when you register and configure FlyData.Once you’ve done that, launch an instance of Amazon Redshift on your system to begin the process of setting up a system with FlyData.
Authorize FlyData on Amazon Redshift
First, you will need to provide information to FlyData regarding your Amazon Redshift account for the two systems to be able to interact. This will require you also to change the security settings on Amazon Redshift to authorize FlyData. This is particularly necessary for FlyData to be able to upload data to your Amazon Redshift cluster.
The IP addresses shown on FlyData Settings will be used to connect to your cluster.
You will have to provide these IP addresses so that FlyData’s servers can access to your cluster’s endpoint. How this is set, though, depends on whether you’re using Amazon’s Virtual Private Cloud. You can accomplish this by doing one of the following, whichever one applies to your situation.
With Virtual Private Cloud (VPC)
If you are using Amazon’s Virtual Private Cloud, you will need first to identify the security group to which you would like to add the FlyData IP addresses. On your Amazon Redshift Cluster AWS Console, select the VPC Dashboard, then the link labeled VPC Security Groups. When there, search for the security group by entering its group ID in the search box (e.g., sg-8cde8f18). When you locate it, click on the box to the left of it to highlight. That will show you the details for that security group, as shown in the screenshot below:
Next you will need to authorize the FlyData IP addresses. Click on the Inbound Rules tab in the middle of the page for the selected security group, and then the Edit button. A dialog box will open: click on Edit Inbound Rules in that box and then on Add Rule to add the first IP address, and again to add another when done with the first. Set the parameters for these two rules as follows:
- Type: Custom TCP Rule
- Port Range: port
[IP Addresses on FlyData Settings Page]
The type is the same for both rules. For the port range, you would use your Amazon Redshift Cluster (e.g., 5439). For the source enter the first IP address for the first rule and the second address for the second rule. When you’re finished, click Save.
That’s all you need to do at this stage to authorize FlyData access to your cluster. Skip to the section below called, Register Cluster with FlyData.
Without Virtual Private Cloud(VPC)
If you’re not using Amazon’s Virtual Private Cloud, you will still need to register FlyData’s IP addresses with Amazon Redshift. To do this, go to your Amazon Redshift Cluster AWS Console page, and click on the Security Groups table. Then select the security group specified as the Cluster Security Groups. At this point you will be able to enter the needed parameters (see screenshot below).
Set the parameters for the two IP addresses as follows, clicking Authorize after adding the first IP address, and again to add the second:
- Connection Type:
- CIDR/IP to Authorize:
[IP Addresses on FlyData Settings Page]
That’s all you need to do at this stage to grant FlyData access to your Amazon Redshift cluster.
Register Cluster with FlyData
In the previous section, you were asked to enter the information about FlyData’s servers in your security setttings with Amazon Redshift. This way FlyData is authorized to interact with your cluster. Now you need to enter information about your Amazon Redshift cluster connection on the settings page with Flydata.
To get your Redshift cluster information, go to your Amazon Redshift Cluster AWS Console page, and click on the name of the cluster you want to synchronize. Information appears under Cluster Database Properties (see screenshot below)
To register the cluster, go to your FlyData Console, and click on the Settings menu. Then enter the Redshift cluster information in Amazon Redshift Cluster Setting (see screenshot below)
You will need to provide the endpoint and other parameters:
- Endpoint : Redshift Cluster endpoint.
- Port: Redshift Cluster port (default 5439).
- Database Name: Redshift database name.
- Username: Redshift user name.
- Password: Redshift user’s password.
You will need to use your endpoint and port. For example, the endpoint might be set to something like, myredshift.cywlg1ayhkjv.us-east-1.redshift.amazonaws.com, with the port set to 5439. For the database name, you’ll have to enter the name of the database that you want replicated on your Amazon Redshift cluster. You’ll also have to provide the MySQL user name and password for accessing the database.
Setup and Register RDS MySQL with FlyData
The point of FlyData Sync in particular, is to replicate selected databases and tables on Amazon Redshift. To do this, you will need to register Amazon RDS MySQL on FlyData. You will also need to make sure the binary logs are enabled on your MySQL server. Let’s do that first.
Create a Read Replica
Next you need to create a read replica. Do this in your Amazon RDS Dashboard. Select Instance Actions and then Create Read Replica.
Next click on Parameter Groups and then Create DB Parameter Group, as shown in the screenshot below:
A dialog box will appear asking you to set parameters for the group.
Set these parameters as follows:
- DB Parameter Group Family:
- DB Parameter Group Name:
- DB Parameter Group Description:
MySQL5.6 for Sync
When you’re finished, click on Yes, Create. Then select the icon next to the newly created parameter group,
mysql56sync. Click on Edit Parameters. You should then see a box like the one shown below:
You will need to set just a few of the parameters, the ones that follow:
* If your Read Replica is MySQL 5.7, you also need to edit following parameter.
When you’re finished, click Save Changes to save your changes.
Now you will need to configure the read replica. To do this, click on Instance Actions and then Modify, as shown in the screenshot below:
When you do so, you will see a dialog box like the one in the screenshot below:
For most of the values, the default may be used or they may be left blank. However, you will need to set a few of the values as follows:
- DB Engine Version:
- New Master Password: password
- Parameter Group:
- Apply Immediately:
You need to enter a password for the New Master Password so as to reset the default parameters. User may use the same password as you used previously for the master password. You don’t have to change it, but it needs to be entered again, although it says New. For the DB Parameter Group, enter the name of the one you just created.
When you’re finished entering these parameters, Select Apply Immediately and click on Continue. When asked for confirmation, click on the button labeled, Modify DB Instance (see screenshot below).
You will need to wait for the instance to become available before proceeding. When it does, select Instance Actions and then Modify. And Set :
- Backup Retention Period:
- Apply Immediately:
Now that you’ve modified the instance, wait until it becomes available again before proceeding.
Create a User for Sync
Having created a new read replica, you will now need to create a MySQL user on it. Log into the read replica from a mysql client (e.g., MySQL Workbench) with the master account. Then create a user to be used only by FlyData Sync. You can do this in MySQL like so:
mysql> CREATE USER 'sync'@'%' IDENTIFIED BY 'your_password';
mysql> GRANT REPLICATION SLAVE, SELECT, RELOAD, REPLICATION CLIENT, LOCK TABLES, EXECUTE ON *.* TO 'sync'@'%';
mysql> FLUSH PRIVILEGES;
For your_password, enter the password you want to use for this new user.
Retain binary log
To ensure proper replication, it is highly recommended to retain binary logs for at least 4 days. To do this, you can enter the following in mysql from your server:
mysql> call mysql.rds_set_configuration('binlog retention hours', 96);
Register the Read Replica
Enter the connection information related to the read replica in data entry page on the FlyData Dashboard. To do this, go to the New Data Entries table in the FlyData Dashboard. Then choose MySQL as Data Source. Be sure to include the name of a database and tables to synchronize.
- Host: RDS DB instance endpoing (MySQL Server host).
- Port: RDS DB instance port (default 3306).
- Username: the name of RDS user (in this document, `sync`).
- Password: RDS user’s password.
- Database: the name of a database that contains tables you want to replicate on Amazon Redshift.
Table: the name of the tables that you would like to replicate.
Full Sync: The “Full Sync” mode will sync any INSERT/UPDATE/DELETE/ALTER TABLEs made to the MySQL table(s) you specify.
Append Only: The “Append Only” mode will sync everything except DELETEs made to your MySQL table.
Schema: If you use a non-default Amazon Redshift schema, be sure to enter the Amazon Redshift schema name.
When you’re finished, click Update to save your changes.
Fetching Changes from your MySQL
To fetch changes from your MySQL tables, you can have FlyData directly monitor changes to your database. Alternatively, you can install the FlyData Agent onto a Linux server that has access to your MySQL server. While the first choice requires you to open your MySQL port to FlyData, it simplifies your process and you can save on server costs. Otherwise, please follow the steps below for the second option (installing the Agent).
FlyData Agent Model
The FlyData Agent is what monitors the changes made to your MySQL tables. Specifically, it monitors the MySQL binary log (i.e., the binlog), and sends these changes to the FlyData Cloud, in order to load it into Amazon Redshift.
To install the FlyData Agent, you will need a Linux server, which has access to your MySQL server. If you don’t have a server already, you can launch a new EC2 instance in your AWS account with the following steps. If you already have a server on which you can install FlyData Agent, you may skip to the next section.
Launch an Instance for the FlyData Agent
You will need an encryption key pair to log into the EC2 instance. You may use an existing one for the server or you could create a new key pair. Once you have one, though, you will need to know where it’s located on your server.
Once you have your encryption key ready, launch an EC2 instance that the FlyData Agent will use. To do this, follow these steps:
- From the AWS Console, select EC2.
- Click on Launch Instance and then click on Select to choose an Amazon Machine Image (AMI).
- Choose the Instance Type. This will affect initial-sync processing speed. For reference, below are the processing speeds when using a single instance of the following instance types (Run multiple instances to increase throughput.):
c3.large: 10GB/hour (*Please contact us for more information on how to increase throughput.)
- To finish, click on Review and Launch and then Launch.
You may have to wait a bit for the instance to be available. Once it is available, log into the instance by entering the following from the command-line:
$ ssh ec2-user@your-instance-public-DNS -i your_public_key
You would replace public_key with the path and file name for your public key. It might read something like,
~/.ssh/foobar.pem. The instance-public-DNS would be the specific internet address for your instance. It would look something like,
Install FlyData Agent
The specific install command for the FlyData Agent is available in your FlyData Dashboard. It should look something like this:
$ bash <(curl -L https://console.flydata.com/i/your-unique-identity)
$ source ~/.bashrc
You will need the exact command for your system. So copy the install command from your Dashboard and execute it on your Linux server.
Generate Redshift table definition script and run on Redshift
To start the FlyData Agent synchronizing, you will need to generate a table definition script that will be run on Amazon Redshift. You can do this by entering the following from the command-line on server:
$ flydata sync:generate_table_ddl > create_table.sql
$ cat create_table.sql
This uses the FlyData Agent to provide the needed table definition script. Copy the output and run it from FlyData’s Amazon Redshift Access Console.
The installation and configuration is finished. You need only to start the FlyData Agent on your server. This agent will extract data from the MySQL binlog on your server, and send it to Amazon Redshift for storing and processing.
To start the FlyData Agent, enter the following on your server:
$ flydata start
At this point, whatever tables you’ve created on Amazon Redshift’s cluster should be populated and kept in sync. If you want to be sure, run some
SELECT statements on some of the tables on both systems and compare the results. Add or change data on your server and see if it updates on Amazon Redshift. It may take as much as five minutes to synchronize, but it should work just fine.