GlueSync - RDBMS and NoSQL data replication

GlueSync is a software product for real-time event-based data replication from RDBMS to NoSQL databases and viceversa. This means that you will be able to replicate data to and from relation and non-relational databases in real-time using native technologies officially supported and maintained by each database vendor, deployable in any cloud, virtual or containeraized environment and on-prem deployments, even on bare-metal servers.

You can read more about GlueSync’s native approach looking at our blog article The GlueSync Journey.

In this documentation you can find the installation and configuration steps necessary to setup GlueSync into your infrastructure and connect a RDBMS instance with a NoSQL database. But, before jumping into the details, let’s talk about few core concepts.

Core concept behind GlueSync’s architecture

One of the main considerations we took when designing the GlueSync architecture is its native ability to be upscaled and deployed with ease, just like you used to do with your container-based applications: pull-config-deploy-enjoy. That’s the motto. You gain full control of what happens under the hood without keeping you into playing with GUIs that wouldn’t have allowed you to harness the full potential of this data replication suite.

GlueSync is being shipped trought docker containers, if you haven’t already read about docker containers you can have a look here at this link that points you directly to the official docker’s homepage.

This doesn’t mean that if you don’t have the possibility to run a docker environment into your infrastructure you couldn’t run GlueSync, on the contrary! You can ask our team to provide you the package for your specific destination platform in order to run it even on-prem in bare metal servers.

Here in the following diagram is represented an architectural overview of a GlueSync environment.

a diagram illustrating the architectural overview of GlueSync

Design

The design concept that have been adopted has basically been taking in consideration the purpose of each core functionality provided by the suite: it provides ability to replicate data from a relational database to a non-relational database and viceversa. This two aformentioned functionalities are called "ways" or "directions".

So, rather than having a monolithic single piece of general-purpose software, we have decoupled its functinalities into an auto-consistent and highly-resilient and specialized service per each "direction". Capable of replicating, logging, alerting and being monitored by itself without the need of a master central authority that could only have increased complexity and introduced a single point of failure into the overall architecture.

Being that said the result and outcome for our users is the ability to decide what to deploy per each use case: you have the control over the decision to deploy only the module to replicate data from MS SQL Server to MongoDB or just the viceversa due to your specific use case. In that way you’re going to have the fine graned control over permissions, security and performances that you deserve from a product made for real-worlds production use cases.

Understanding CDC vs GDC

When talking about sourcing changes from a relational database there are just a few ways to accomplish the task of auditing writes, updates and deletions performed at field-row level. The most common approach, but also the most challenging, is reading from the database transaction logs that luckily nowadays are wrapped around an API layer called CDC - Change Data Capture - which provides a way for application developers and DBAs to read throught it and understand the entire history of the changes that have been made from a certain time frame.

We used the term "challenging" because every database vendor have implemented its own way to expose these logs and building a tool capable of being compatible with all of them it is indeed a challenge by itself and sometimes specific vendors or database versions (esperially older ones) don’t provide either CDC or low level APIs to grab transaction logs from it.

In older to provide a wider compatibility on capturing real-time change streams from the vast majority of relational databases out there in the field we decided to develop a fine-tuned subset of UDFs (user defined functions) that together with a set of triggers helps GlueSync to enlarge the compatibility base with relational databases while maintaining a safe, fast and secure approach whit which entire lifecycle is entirely managed by its engine itself. We called this feature GCD - GlueSync Data Capture - used for certain kind of database brands | versions | editions that do not currently (or at all) the native CDC tecnique or for those who we initially decided to make compatible first throught that feature and then to provide CDC out-of-the-box in the upcoming future.

Naming conventions

As naming conventions to nickname each type of GlueSync "direction" we adopted the following nomenclature:

  • replication from relational (RDBMS) to non-relational databases (NoSQL) has been labeled "SQL to NoSQL"

  • replication from non-relational (NoSQL) to relational databases (RDBMS) has been labeled "NoSQL to SQL"

What you’re going to need

Before proceeding, for each GlueSync instance that you’re willing to deploy, please check if you have the following information:

  • RDBMS connection details, like:

    • Username

    • Password

    • Connection string (IP address / port)

    • Tables names

  • NoSQL database connection details, like:

    • Destination, either bucket or database

    • Connection string (IP address / port)

    • Username

    • Password

If you’re involved on the implementation of GlueSync or just tryng it out via our trial program, we higly suggest you to have a visual query editor tool in order to bootstrap multiple datasources connection, easily import / edit / display data. The tested tools from the GlueSync product and QA teams are:

As MOLO17 we do not provide any support on these specific tools neither we advertise them, you are free to use the toolset that you prefer the most in order to connect and perform queries against your database(s).

…​and also, do not miss our section dedicated to tutorials and use cases of GlueSync.

Compatibility matrix

Non-relational databases (NoSQL)

NoSQL vendor / edition / version GlueSync compatibility Technology used

Aerospike

✅ from GlueSync v1.3.4 starting from version 5.X and above all editions

writes performed trought official SDK

Amazon AWS S3

✅ from GlueSync v1.3.4

writes performed trought official SDK

Couchbase

✅ from GlueSync v1.0 starting from version 5.5 and above all editions

Native CDC via Eventing service, writes performed trought official SDK

MongoDB

✅ from GlueSync v1.3 starting from version 3.6 and above all editions

Native CDC via Change Streams, writes performed trought official SDK

Relational databases (RDBMS)

RDBMS vendor / edition / version GlueSync compatibility Technology used

Microsoft SQL Server and Microsoft SQL Azure

✅ from GlueSync v1.0, all editions

Native CDC via Change Tracking, from version 2016 and via GlueSync Data Capture (GDC) all versions

Oracle Database and OracleDB on OCI

✅ from GlueSync v1.2, all editions

Native CDC via Xstream APIs from 11.2g and via GlueSync Data Capture (GDC) all versions

DB2 for series i (AS/400) and DB2 for z/OS

writing (as a target) is fully supported, ⏱ CDC support coming soon

writes performed with native JDBC drivers

PostgreSQL

✅ from GlueSync v1.3.3, tested from version 9.0 and above

Via GlueSync Data Capture (GDC)

Sybase SQL (Adaptive Server Enterprise, ASE, SAP Sybase)

writing (as a target) is fully supported, ⏱ CDC support coming soon

writes performed with native JDBC drivers

MariaDB

✅ from GlueSync v1.3.3, tested from version 10.0 and above

Via GlueSync Data Capture (GDC)

MySQL

✅ from GlueSync v1.3.3, tested from version 8.0 and above

Via GlueSync Data Capture (GDC)

Tested means that actually each version that ranges from the specific tag mentioned and above are currently under the integration tests suite and being tested together with performance benchmarks that are performed per each commit-basis in order to ensure no regression and the best quality outcomes. Other versions older that those included in our test suites might work but are not currently battle-tested for a production use case. If you would like to consider testing a specific database version which appears to not have been currently made compatible you are more then welcome to join our beta program, in that case consider to drop us a line at this email address telling us that you would like to be part of the beta program for a specific db version tag.

Minimum system requirements

  • a docker environment;

  • 1 vCPU and 2GB of RAM;

  • 1 GB free disk space, used for logging redaction.