A blog post in which, first, I would like to keep a note of the most interesting things, approaches, techniques used during the conversion of a product database into one managed by Liquibase
from my personal experience in order to return to them in the future, and, second, share them with you, hoping they can be helpful to you as well.
Welcome to my first blog post. The reason for creating it is significant enough:
just a few weeks ago my colleagues and me finished a migration a product database from using an in-house solution
to Liquibase. Because the process turned out to be unexpectedly comfortable and enjoyable
even despite the complexity of the initial task, I would gladly like to share it with you.
This article is about using Liquibase or how the library can replace a custom/in-house solution in a product.
In this particular case, it is assumed that the database will be installed on an Oracle and MySQL instance.
The article consists of the following parts:
The original state
Memorable moments from the migration process on Liquibase
2.1 Dealing with schema names in lowercase
2.2 The target file structure
2.3 Using contexts to install test data
2.4 Using properties to substitute particular database specific expressions
2.5 Performance issues
1. The original state
From the beginning the project is designed for installation with multiple database schemas,
which should be installed on an Oracle and MySQL instance.
Based on my previous experience and the information available to me about the experience of migration a database on Liquibase in other companies,
the migration should be kind of normal or “conventional” in such cases:
the project is already quite mature and has a long history in the production,
so the current state could be installed as initial constant snapshot, and all new changes as regular Liquibase
changesets on top of that.
2. Memorable moments from the migration process on Liquibase
2.1 Dealing with schema names in lowercase
One of the problem points was tackling schema names written in lowercase.
Actually, it is not possible to get connected to the database host from Liquibase (using JAR-file or programmatically)
because, while establishing a connection, Oracle does not differentiate between a username written in lowercase or in uppercase.
Normally such a problem does not happen when someone connects with a username surrounded by double quotes,
such as how it works with sqlplus
. Liquibase, however, ignores double quotes (v.4.8.0).
For this incorrect behavior there is a corresponding bug
It is worth mentioning, that the described problem has been found in case of connecting to Oracle, and
there is no such issue in dealing with MySQL.
2.2 The target file structure
The main ideas the chosen file structure based on are:
- Supporting for efficient way of storing data for multiple targets (Oracle, MySQL)
- Ability to separate the layer of database with structure and data, which were there before Liquibase
So the target structure looks like the following:
1 | ├── common |
2.3 Using contexts to install test data
In the previous section we have seen test data files in the structure.
It is hard to imagine a software project without tests, so project test data is a regular case
and contexts
is a regular tool in the arsenal of Liquibase to manage test data in a project.
One common use is to mark changeSets that insert test data as
context=”test”
so that in your development and QA environments you can run liquibase with–contexts=test
to get the test data and in production you run with–context=prod
to not have test data.
In the case of this particular project,
the test data changelogs have the parameter contexts
with the value test-env
.
In order to have test data installed the update
command is called with the contexts test-env
.
And just like it says in the official Liquibase documentation,
for installation in production we need to pass to Liquibase something like:
“Install all changesets without context, but don’t install test data.”
So to have Liquibase install in accordance with the above statement, we can use the only context="production-env"
or context="!test-env"
.
1 | databaseChangeLog: |
I think it makes sense to mention once again, the rule of using contexts
:
If you do not specify any contexts in the CLI at runtime, every changeset in your changelog runs,
even if they have contexts attached.
If you add a context to a changeset, it only runs when you specify that context in the CLI,
but unmarked changesets still run.
2.4 Using properties to substitute particular database specific expressions
One of the benefits Liquibase brings is a special, unified format to describe SQL instructions.
Using it a developer is able to define changes once for many RDBMS.
However, despite using that universal format of the Liquibase change types we still need to handle some particular specific
of those two RDBMSes and there are still a lot of them and this is pretty much normal:
the same action can be expressed in different keywords and with functions which have different names from RDBMS to RDBMS.
On the other hand the large numbers of available changes
Liquibase is able to apply to the database is also limited,
so for some particular special needs some extension tools are required.
For example, when we create a table with a VARCHAR2 column and
we want to specify the maximum string length in characters we must put a keyword CHAR
:
1 | CREATE TABLE table01 ( |
Otherwise, if you don’t explicitly specify BYTE or CHAR after the max_size, by default, Oracle uses BYTE.
MySQL, in turn, by default expects characters after the max_size value:
The CHAR and VARCHAR types are declared with a length that indicates the maximum number of characters you want to store.
For example,CHAR(30)
can hold up to 30 characters.
So we have the difference in column definition and should consider it during installation.
A possible and quite obvious workaround would be writing one changeset per DBMS or in other words to duplicate changes.
Clearly, that would make changelogs longer and in most cases the duplication leads to more complicated maintenance:
all of that makes the solution error-prone in the end.
Another way is substitution of such names of functions and operators with the substituting
properties and the parameter dbms
in Changelogs:
Liquibase allows a dynamic substitution of properties in your changelog.
The tokens to replace in your changelog are described using the${property-name}
syntax.
By substituting values for replacement tokens in the format of${property-name}
,
you can use the same changesets to reflect small environmental changes.
For example, your tablespace name in Oracle may differ from environment to environment,
but you want to only write one create table changeset that can be used in all your environments.
so finally we have the following:
define a substitution in advance;
1
2
3
4
5
6
7
8
9databaseChangeLog:
- property:
name: def.char
dbms: mysql
value: ""
- property:
name: def.char
dbms: oracle
value: CHARinclude a changelog with use of the substitutions:
the root changelog:
1 | databaseChangeLog: |
schema-1/create-table-01.yml:
1 | databaseChangeLog: |
schema-1/create-table-01.sql:
1 | CREATE TABLE table01 ( |
2.5 Performance issues
2.5.1 Multiple inserts
The two DBMSes: Oracle and MySQL the database migrated under Liquibase meant to be installed on
have many peculiarities.
One of them is multiple INSERT
SQL statement from MySQL.
1 | INSERT INTO tbl_name (a,b,c) |
When we run such statements from Liquibase changelogs, it takes almost the same time to apply a single row
or a thousand of them.
Oracle, in turn, does not support that quite convenient and compact syntax of the statement,
however it also provides the INSERT ALL
option:
1 | INSERT ALL |
So, the timing of insertion such as the insert
statement via Liquibase does have that performance advantage,
which we have with MySQL, so Liquibase by itself does not provide any approach in this regard.
Generally speaking this is not a problem of Liquibase, of course.
However, for example, Oracle SQLcl offers a possible solution for that.
Concepts: SQLcl is free. We have taken the community edition of Liquibase, and enhanced the Oracle support. It’s using the 4.6.x library, but also has access to ALL of our features in SQLcl, a more modern take on SQLPlus.
More details in the related question on StackOverflow.
2.5.2 Insertion of data from CSV files
Another important point from the perspective of performance: an option of inserting
data from CSV files. Internally Liquibase applies PreparedStatements
which significantly speed up the insertion in comparison with a usual insertion by single row SQL inserts.
2.5.3 Prefilled Database Images approach
One of the great tricks which could be used with Liquibase is applying simple caching of an installed database.
For example: there is a need to roll out a huge database instance almost immediately during a build on the CI server.
The key point here: the database content is constant. So for each such build the CI server runner executes
the same operations and installs the same data over and over.
So the obvious question here: why not save a state of deployed database somewhere? For example in a docker image.
Doing that we get a pre-filled database image, and later we just pull the instance from the image.
Conclusion
My personal experience proved that Liquibase is an ultimately simple, flexible and powerful tool.
However, even if the big variety of the predefined change types is still not enough a developer
can use some extensions which Liquibase comes with.
The given article is just tiny set of things I noticed during my experience with Liquibase. Over time,
some new details and interesting hints will come to my mind, so the article is not finalised and meant to get an update.
Stay tuned ✋