FlyData Anatomy Series: The FlyData Cloud, Part 1

# The FlyData Cloud

This article is the third article in the FlyData Anatomy Series. The first article covered the Data Extraction Process. The second covered our approach to data security. This time, we will walk through the FlyData Cloud, which does most of the heavy lifting for data processing. When loading data into Amazon Redshift, there are various things to consider, e.g., reformatting of data, parallelization of workload, and re-sequencing of incoming data. In this article, we will shed some light on how these important steps will be handled when sending data through FlyData. Before we jump into the FlyData Cloud, if you haven’t already done so, please checkout our first article in the series: FlyData Anatomy Series: The Data Extraction Process. The article explains the first process in the data flow, and this article will make much more sense by following along the data flow.

# From Data Extraction to Data Processing

In the Data Extraction Process article, we left off after talking about how the data was extracted from the data source. Let’s continue our journey from here. Once we have a way to continually extract data from a data source, we need to figure out a way to process that data continuously, so that it gets uploaded to Amazon Redshift on a constant basis. The place where we apply these processes is what we refer to as the FlyData Cloud. In the FlyData Cloud, we mainly take care of the following processes:

  • Allocation of workload
  • Validation of data types
  • Conversion of data to an Amazon Redshift compatible format (we use TSV)
  • Saving data to S3
  • Any tracking related to the upload and transformation process
  • Handling and management of COPY commands
  • Error Handling

In the next part of this article, we will touch on each aspect in more detail. Stay tuned!

