Data modeling in GlueSync

Data modeling is an essential toolkit that we provide in order to let developers and DBAs achieve the best out from their destination target database especially when it comes to map and model data in JSON documents accordigly to the application business logics / presentation layer point of view. This means that with that feature you will be able to implement different document design strategies that will result in a boost of performances and productivity at the destination application level.

The options described below may require additional resources to be processed. We suggest having on board our professional services to fine tune and properly size the containers and number of nodes required to satisfy your production workloads.

Core principles

It is important to mention that every operation that GlueSync does when modeling your data do not involve data introspection at the GlueSync core modules-level that because of the security design approach that our engineers adopted when crafting the core replicators modules.

Being that said every operation that involves data modeling is performed against the source or destination database engine, leveraging its full set of features and achieving the best out of compatibility from it, talking about SQL language stadards support, performances and data types.

Is on these solid principles that we refer to it with the term on-the-fly data modeling when talking about the GlueSync data modeling capabilities, bacause is indeed designed for in-memory-only computation without the need to store or persist no bit of data either while transfering or applying modeling-related business logics.

Data modeling 101

These are the current supported group of different features that GlueSync offers in terms of data modeling on-the-fly:

  • field skipping, ability to skip certain fields avoiding its replication;

  • field renaming, ability to rename fields;

  • SQL queries JSON modeling, apply SQL statements on your datasource to output a tailored JSON document data model.

Each of these are meant to support you on the journey of replicating data from a relational database to a NoSQL database or viceversa, overcoming challenges like post-commit document manipulation or UDFs (user defined functions) that could slow down the time-to-data need that you might have, making real-time data replication challenging.

Every data modeling function that GlueSync supports works both while capturing changes through CDC or GDC and also when the initial snapshot of data is performed. This means that you don’t have to handle any manual data modeling task.

Field skipping

This functionality is available both for SQL-to-NoSQL replication and viceversa following the respective links:

You should use this functionality in order to avoid unwanted fields to being replicated on the counter side, preserving space: don’t forget that each key in a JSON document is repeated per each document in your NoSQL database, meaning that if there are certain colums that you are not considering to use in NoSQL are easily avoidable using that feature in order to save space and resources.

Field Renaming

This functionality is available both for SQL-to-NoSQL replication and viceversa following the respective links:

You should use this functionality in order to rename long column names or just making them more use-case oriented in order to let them being replicated on the counter side with a new name. A common use case is preserving space: don’t forget that each key in a JSON document is repeated per each document in your NoSQL database, meaning that if there are certain colums that have long names you might be considering to rename these in order to save space and resources.

SQL queries JSON modeling

an example of SQL query to JSON output result

This feature enables you to define a SQL query statement, the same that you are used to use while performing queries against your relational database, in order to aggregate, map and design an output structure that will be then transformed by GlueSync data modeling engine as a JSON document output. It works both while initially moving your entire designed dataset (tables and columns defined in the statement) and also while performing CDC incremental replication.

To learn more about this functionality please visit the related page at this link: SQL queries JSON modeling.

As for now this functionality is only supported when replicating changes from a relational database to a NoSQL database, more extended support is on our developement roadmap and will be soon made available.