FlyData Sync
Amazon Redshift Setting
Amazon Redshift Tips

# Getting Started Guide for FlyData Sync with MySQL

FlyData Sync is used to take data from MySQL to send to Amazon Redshift. It utilizes the MySQL binary log (i.e., binlog) to replicate data from your databases to your Amazon Redshift cluster. You can even synchronize data from multiple MySQL databases to Amazon Redshift. Data can be uploaded as often as every 5 minutes so that you can always query fresh data.

The configuration process is simple: You enable binary logging on your MySQL, you provide FlyData with access information on your server and with Amazon Redshift, and then recreate the tables that are on your server that you want to replicate on Amazon Redshift. When you’re ready, FlyData Sync will copy the data from the tables on your server to the cloud. After that, whenever a change is made in the binlog related to the data or the schema of the replicated tables, those changes are made shortly afterwards on your Amazon Redshift cluster.

This brief guide describes the steps for setting up FlyData Sync to work well with Amazon Redshift.

# Authorize FlyData on Amazon Redshift

First, you will need to provide information to FlyData regarding your Amazon Redshift account for the two systems to be able to interact. This will require you also to change the security settings on Amazon Redshift to authorize FlyData. This is particularly necessary for FlyData to be able to upload data to your Amazon Redshift cluster.

The IP addresses shown on FlyData Settings will be used to connect to your cluster.

Redshift Security IP Addresses

You will have to provide these IP addresses so that FlyData’s servers can access to your cluster’s endpoint. How this is set, though, depends on whether you’re using Amazon’s Virtual Private Cloud. You can accomplish this by doing one of the following, whichever one applies to your situation.

With Virtual Private Cloud (VPC)

If you are using Amazon’s Virtual Private Cloud, you will need first to identify the security group to which you would like to add the FlyData IP addresses. On your Amazon Redshift Cluster AWS Console, select the VPC Dashboard, then the link labeled VPC Security Groups. When there, search for the security group by entering its group ID in the search box (e.g., sg-8cde8f18). When you locate it, click on the box to the left of it to highlight. That will show you the details for that security group, as shown in the screenshot below:

Next you will need to authorize the FlyData IP addresses. Click on the Inbound Rules tab in the middle of the page for the selected security group, and then the Edit button. A dialog box will open: click on Edit Inbound Rules in that box and then on Add Rule to add the first IP address, and again to add another when done with the first. Set the parameters for these two rules as follows:

  • Type: Custom TCP Rule
  • Port Range: port
  • Source: [IP Addresses on FlyData Settings Page]

The type is the same for both rules. For the port range, you would use your Amazon Redshift Cluster (e.g., 5439). For the source enter the first IP address for the first rule and the second address for the second rule. When you’re finished, click Save.

That’s all you need to do at this stage to authorize FlyData access to your cluster. Skip to the section below called, Preparing MySQL.

Without Virtual Private Cloud(VPC)

If you’re not using Amazon’s Virtual Private Cloud, you will still need to register FlyData’s IP addresses with Amazon Redshift. To do this, go to your Amazon Redshift Cluster AWS Console page, and click on the Security Groups table. Then select the security group specified as the Cluster Security Groups. At this point you will be able to enter the needed parameters (see screenshot below).

Set the parameters for the two IP addresses as follows, clicking Authorize after adding the first IP address, and again to add the second:

  • Connection Type: CIDR/IP
  • CIDR/IP to Authorize: [IP Addresses on FlyData Settings Page]

That’s all you need to do at this stage to grant FlyData access to your Amazon Redshift cluster.

# Preparing MySQL

The next step is to prepare your MySQL server by enabling the binary log. To do this, add the following parameters to the [mysqld] section of your MySQL server’s configuration file (i.e., my.cnf or my.ini, depending on your system):

server-id = 888888

The value of log_bin should be the path and name of the binary log file. For the server-id, provide a unique number to identify it among databases that are replicated. It’s necessary to enable log-slave-updates if the database server is a slave of another database.

Depending on your MySQL Server version, you may also need to add the following parameters.

log-bin-use-v1-row-events = TRUE

When you have finished editing the MySQL configuration file, restart the MySQL daemon for the changes to take effect.

# Create a User for Sync

Having updated MySQL Server parameters, you will now need to create a MySQL user on it. Log into the server from a mysql client (e.g., MySQL Workbench) with the master account. Then create a user to be used only by FlyData Sync. You can do this in MySQL like so:

mysql> CREATE USER 'sync'@'%' IDENTIFIED BY 'your_password';

For your_password, enter the password you want to use for this new user.

SUPER privilege is necessary only if your MySQL Server version is 5.1.63 or below.

# Provide FlyData with Amazon Redshift Settings

The next step is to provide FlyData with your Amazon Redshift account access information, so that FlyData can transfer to Amazon Redshift the MySQL data you want to replicate. To do this, log into FlyData through a web browser at this address:

Once on that page, click on Settings -> Amazon Redshift Cluster Setting. You will be asked to provide information about your Amazon Redshift account. The access information will be visible only to you and will be secure. The default settings for any optional settings should be fine to start.

# Provide FlyData with MySQL Information

For FlyData Sync to work properly, you will need to provide FlyData access information to your MySQL server and the name of the databases to replicate on Amazon Redshift, and the table schema for the tables within the databases you want to include.

Within the FlyData web interface, click on Dashboard -> Your Data Entries to get to the page for data entries. Once on that page, create a new MySQL data entry. Be sure to give a recognizable data entry name in the Name field. Select MySQL as the data source.

Now you should enter your MySQL credentials.

  • Host: MySQL Server hostname.
  • Port: default 3306.
  • Username: the name of MySQL user.
  • Password: MySQL user’s password.
  • Database: the name of a database that contains tables you want to replicate on Amazon Redshift.

Table: the name of the tables that you would like to replicate.

  • Full Sync: The “Full Sync” mode will sync any INSERT/UPDATE/DELETE/ALTER TABLEs made to the MySQL table(s) you specify.

  • Append Only: The “Append Only” mode will sync everything except DELETEs made to your MySQL table.

  • Schema: If you use a non-default Amazon Redshift schema, be sure to enter the Amazon Redshift schema name.

When you’re finished, click Update to save your changes.

# Fetching Changes from your MySQL

Direct Model

To fetch changes from your MySQL tables, you can have FlyData directly monitor changes to your database. Alternatively, you can install the FlyData Agent onto a Linux server that has access to your MySQL server. While the first choice requires you to open your MySQL port to FlyData, it simplifies your process and you can save on server costs. Otherwise, please follow the steps below for the second option (installing the Agent).

FlyData Agent Model

The FlyData Agent is what monitors the changes made to your MySQL tables. Specifically, it monitors the MySQL binary log (i.e., the binlog), and sends these changes to the FlyData Cloud, in order to load it into Amazon Redshift.

To install the FlyData Agent, you will need a Linux server, which has access to your MySQL server.

To get the specific command for installing the FlyData Agent, go to Dashboard -> Install Command. Select either the Development or the Production tab, depending on your environment. Text for installing the FlyData Agent will then be presented.

Copy the install command with its options and other values. Log into your Linux server where you want to run the FlyData Agent, and paste or type the command at the command-line as root or an appropriate administrator. You may want to execute the following to enable FlyData commands:

 $ source ~/.bashrc 

# SQL Statements for Creating Tables

You will need initially to create the tables on Amazon Redshift that will be replicated. To make it easier, you can generate an SQL dump file which can be used to create these tables on Amazon Redshift. To do this, from the server in which you installed the FlyData Agent, execute the following from the command-line:

$ flydata sync:generate_table_ddl > create_table.sql

This will retrieve the MySQL table definitions and generate CREATE TABLE statements which can then be used to create the same tables on Amazon Redshift. It won’t contain the data; that comes later. You need to prepare the tables first to receive the data. The results of this script with be a simple text file, a dump file that can be used to do that.

# Create Tables on Amazon Redshift

Now that you have a text file containing all of the SQL statements for creating the tables on Amazon Redshift, you’re ready to implement it.

Copy the contents of create_table.sql and paste it into your FlyData Access Redshift console. When you’ve done that, execute that batch of SQL statements to create the tables on your Amazon Redshift cluster.

# Initialize FlyData Sync

Now that you’ve primed everything–your MySQL server, FlyData’s server, and Amazon Redshift–you’re ready to do an initial synchronization. This will copy all of the data from your MySQL server for the database and tables you specified, to Amazon Redshift.

$ flydata start

It’s simple, but it will start the initial synchronization. It could take from a few minutes to several hours, depending on the amount of data contained in the tables you’re replicating on Amazon Redshift.

When the initial synchronization is finished, your FlyData Sync setup is completed. From that point on, any updates to the data in the replicated tables on your MySQL server (i.e., the master) will be made on your Amazon Redshift cluster from five to fifteen minutes later. You can do your do your analysis on Amazon at any time.