Database migration/upgrade is a way of life for any DBAs. There can be many reason requiring data migration: data center move, hardware upgrade, software upgrade, etc. The most ridiculous justification I’ve heard of is one of IT security compliance. Whatever the reasons may be, DBAs can get really busy really fast, being swayed by management’s business decisions.
Among the different types of databases, the ones that require the most work and due diligence are those OLTP that are public-facing. Careful management and coordination must be done in minimize risks as there could be many application server pools hitting the database.
The Split Brain Situation
One very notable pitfall that I’ve experienced in my 2 decade tenure is one where you bring the database and its application into a Split Brain Situation. That is the phenomenon where during an out-of-place database migration/upgrade, applications were not migrated all at once, causing some applications to continue accessing the old database while some others have already started using the new. Once changes are made to BOTH the old database and the new database during a migration, you’ve fallen into the trap.
Remediation
If you did not have a rollback plan in place, the only remediation approach you may have left might be to manually fix the data. That means, if you wish to move forward with the migration, you figure out what data changes happened on the old database and apply the same changes on the new. Or if you intend to abort the migration, identify the changes that happened on the new database and manually apply them back onto the old. This remediation can be disastrous.
Costs
Quite a few years back I was involved with migrating a high-profile, public-facing OLTP database onto Oracle Exadata. There were 3-4 DBAs altogether working on the migration. And the situation happened only because of a very unfortunate miscommunication with the application developers. It happened so easily that it’s even hard to blame anybody. Ultimately, the migration was called off by rolling back. The costs associated with it include but not limited to:
- Man hours lost by the DBAs, application developers, project managers, independent contractors, etc.
- Hours and revenue lost from the business perspective as certain business functions may have been impacted during the downtime.
Principles
To avoid the split brain situation during a database migration, follow the simple principles below:
- Have an “All-or-Nothing” mindset.
- Changes only happen to one database at any given time.
- Once the switchover takes place, changes cannot happen on the old database.
- Have a back-out plan. That’s often much easier than to have to fix data manually.
Steps
I’ve started to do these steps consistently every time I do an out-of-place upgrade or migration. These steps are in Oracle terminology, but would apply sensibly to other database platforms as well.
- Lock all non-internal Oracle users on the current primary database.
- Kill any client session still connected to Oracle on the current primary. (All instances)
- Allow Oracle Data Guard to replicate the latest changes to the standby database.
- Once the standby database is up-to-date, issue the switchover from primary to standby.
- Unlock users locked in step 1 on the new primary. (Reuse the same script and change “account lock” to “account unlock”.)
- Announce GO-LIVE, so that applications can start connecting to the new primary.
Alternatives exist for these steps. For example, you can shutdown the old database during a switch over, or bring it to the restricted session mode. It accomplishes the same objectives.
Avoid the Split Brain Situation ahead of time and at all costs!