SQL Scripts that can be Rerun

This guide deals with writing database scripts which can be rerun (‘rerunnable’). Creating rerunnable scripts makes deployments easier and avoids any unwanted behavior if the code were to run more than once. When modifying or updating a database, running a normal script once will work as expected without errors. However, if the script were to run again, the database may return an error message, halt execution or even worse, create unwanted duplicates. This could drastically slow down the deployment process.

This play will show you how to add simple checks to your database scripts so that they can be run more than once without errors. 

The types of statements that can have such checks to ensure they are rerunnable are:

  1. CREATE TABLE
  2. CREATE/REPLACE View
  3. DROP TABLE/DROP VIEW/DROP Stored Procedure/Function/Trigger
  4. CREATE Stored Procedure/Function/Trigger
  5. ALTER TABLE
  6. INSERT INTO with Primary Keys
  7. INSERT INTO without Primary Keys

Most of these steps should be taken on every script and every project because they are such simple steps. However, ‘ALTER TABLE’ (5) and ‘INSERT INTO’ without Primary Keys (7) require significant additional work. As such, they are more appropriate for large projects where the need to make deployments as easy as possible outweighs the additional development effort.

CREATE TABLE

When creating a table, using “IF NOT EXISTS” in the script will check to see if the table you are trying to create already exists before executing the create statement.

For Example:

CREATE TABLE `yourtable`
    (
        `id`      INT NOT NULL                  ,
        `column1` VARCHAR(255) NULL DEFAULT NULL,
        `column2` VARCHAR(255) NULL DEFAULT NULL
    )
    ENGINE = InnoDB
;

can be rewritten as: 

CREATE TABLE
    IF NOT EXISTS `yourtable`
    (
        `id`      INT NOT NULL                  ,
        `column1` VARCHAR(255) NULL DEFAULT NULL,
        `column2` VARCHAR(255) NULL DEFAULT NULL
    )
    ENGINE = InnoDB
;

CREATE/REPLACE VIEW

A rerunnable view creation script can be written using “CREATE OR REPLACE VIEW” instead of “CREATE VIEW”

For Example:

CREATE VIEW `yourview` AS
SELECT
    `column1`
FROM
    `yourtable`
;

can be rewritten as:

CREATE OR REPLACE VIEW `yourview` AS
SELECT
    `column1`
FROM
    `yourtable`
;

DROP TABLE / View / Stored Procedure / Functions / Triggers

Using “IF EXISTS” when dropping a table or a view from will drop the table/view/stored procedure only if it exists. If the table/view/stored procedure has already been dropped, no error message will be thrown.

For example, the following code for dropping table,

DROP TABLE `yourtable`
;

can be rewritten as:

DROP TABLE IF EXISTS `yourtable`
;

Similarly, the SQL for dropping a view, 

DROP VIEW `yourview`
;

can be rewritten as:

DROP VIEW IF EXISTS `yourview`
;

For a stored procedure,

DROP PROCEDURE `yourprocedure`
;  

can be rewritten as:

DROP PROCEDURE IF EXISTS `yourprocedure`
;

Finally, for triggers and functions you can write

DROP TRIGGER IF EXISTS `yourtrigger`
;

and

DROP FUNCTION IF EXISTS `yourfunction`
;

CREATE Stored Procedure/Function/Trigger

Creating a stored procedure works a little differently than creating a table. You will first have to drop a stored procedure if it exists and then create another one with the same name. You would take the same approach to alter a stored procedure as well.

For Example:

CREATE PROCEDURE
    `yourprocedure`...

can be rewritten as:

DROP PROCEDURE IF EXISTS `yourprocedure`
;

CREATE PROCEDURE
    `yourprocedure`...

The same approach works for creating and modifying functions and triggers. You can add a DROP IF EXISTS in front of the code which creates the function/trigger:

DROP TRIGGER IF EXISTS `yourtrigger`
;
DROP FUNCTION IF EXISTS `yourfunction`
;

ALTER TABLE

There is no single approach to creating a rerunnable ALTER TABLE script. Some ALTER TABLE scripts are always rerunnable but in other cases you have to use a stored procedure to make the query rerunnable (as shown in the example below).

For example, the following line modifies the datatype of a column and will always be possible to rerun:

ALTER TABLE `yourtable` MODIFY `column1` VARCHAR(255)
;

However for adding a column the situation is more difficult. You can use “IF NOT EXISTS” inside a temporary stored procedure to make the script possible to rerun.

Below are the high level steps for running a rerunnable ALTER statement:

  1. Create a stored procedure
  2. Check if modification already exists using INFORMATION_SCHEMA
  3. Execute the alter statement conditionally (IF, THEN, ELSE)
  4. Close the stored procedure
  5. Execute the stored procedure
  6. Drop the stored procedure

We also use all the rerunning techniques we have already learned when carrying out each step. For example, we drop the procedure if it exists before creating it. 

DROP PROCEDURE IF EXISTS sp_alter_table
;

DELIMITER $$
CREATE PROCEDURE
    sp_alter_table()
BEGIN
    DECLARE _count INT;
    SET _count =
    (
        SELECT
            count(1)
        FROM
            information_schema.columns
        WHERE
            table_name       = 'yourtable'
            AND table_schema = 'Appian'
            AND column_name  = 'column3'
    )
    ;
    IF _count = 0 THEN
        ALTER TABLE yourtable ADD `column3` VARCHAR(255)
        ;
    
    END IF;
END $$
DELIMITER ;
CALL sp_alter_table();
DROP PROCEDURE sp_alter_table
;

Note: Using “IF NOT EXISTS” on the INFORMATION_SCHEMA.COLUMNS table will check to see if the column exists before adding the column.

You may be wondering about how you could use this to drop columns. A similar method would work, but Appian recommends that you do NOT drop columns from a table to avoid any issues that may arise due to backward compatibility.

Adding or removing constraints follows the same high level steps as adding a column, as explained above. However, you should check for the constraint by querying INFORMATION_SCHEMA.STATISTICS instead of INFORMATION_SCHEMA.COLUMNS. See the example below:

DROP PROCEDURE IF EXISTS sp_addconstraint
;

DELIMITER $$
CREATE PROCEDURE
    sp_addconstraint()
begin
    DECLARE _count INT;
    SET _count =
    (
        SELECT
            count(1)
        FROM
            information_schema.statistics
        WHERE
            table_name       = 'yourtable'
            AND table_schema = 'Appian'
            AND column_name  = 'id'
            AND index_name   = 'PRIMARY'
    )
    ;
    IF _count = 0 THEN
        ALTER TABLE `yourtable` ADD PRIMARY KEY (id)
        ;
    
    END IF;
END $$
DELIMITER ;
CALL sp_addconstraint();
DROP PROCEDURE sp_addconstraint
;

INSERT INTO with primary keys

For inserts, if you are inserting values where you are providing the primary key (or unique identifiers), you can use the IGNORE statement to make the insert rerunnable.

For Example:

INSERT INTO `yourtable`
    (`id`        ,
        `column1`,
        `column2`,
        `column3`
    )
    VALUES
    ('1'   ,
        'A',
        'B',
        'C'
    )
    ,
    ('2'   ,
        'D',
        'E',
        'F'
    )
;

can be rewritten as:

INSERT IGNORE
INTO
    `yourtable`
    (`id`        ,
        `column1`,
        `column2`,
        `column3`
    )
    VALUES
    ('1'   ,
        'A',
        'B',
        'C'
    )
    ,
    ('2'   ,
        'D',
        'E',
        'F'
    )
;

Assuming that `id` is the primary key, if a value with `1` or `2` in `id` already exists, this statement will not insert this row of data in the table. However, if you want to update the row to the values in the query, the statement can be rewritten as:

INSERT INTO `yourtable`
    (`id`        ,
        `column1`,
        `column2`,
        `column3`
    )
    VALUES
    ('1'    ,
        'A2',
        'B2',
        'C2'
    )
    ,
    ('2'    ,
        'D2',
        'E2',
        'F2'
    )
ON
    DUPLICATE KEY
UPDATE
    column1 = VALUES
    (column1
    )
    ,
    column2 = VALUES
    (column2
    )
    ,
    column3 = VALUES
    (column3
    )
;

INSERT INTO without primary keys:

If you are not providing any primary keys with the values you wish to insert, you will have to use another method for inserting while avoiding duplicate inserts on reruns. By using a temporary table, we can define what values already exist in the target table before inserting them (use the LIKE argument to construct a mirror of the target table). Each row that already exists in the target table can be flagged as duplicate and avoided when inserting. Below is an example of using a temporary table to insert values in a way that can be rerun.

High Level Steps

  1. Create temporary table with extra column for flagging duplicates
  2. Insert values into temporary table
  3. Update the temporary table: flag any row that already exists in the target table
  4. Insert into target from temporary table where not flagged as existing
  5. Drop the temp table

Code to achieve this is included below - if you have been following along, the table `yourtable` will need to have auto increment added to the `id` column for this script to run (in addition to the other changes from the ALTER TABLE section). This is a longer example, so we will break it down in a moment. Note that temporary tables only exist for the duration of the transaction, so this code cannot be run a bit at a time, you must run the whole thing in one go.

-- create temporary table with extra column to flag duplicates
CREATE TEMPORARY TABLE `tmp_yourtable` LIKE `yourtable`
;

ALTER TABLE `tmp_yourtable` ADD COLUMN duplicate tinyint(1) DEFAULT 0
;

-- insert values into temp table
INSERT INTO `tmp_yourtable`
    ( `column1`  ,
        `column2`,
        `column3`
    )
    VALUES
    ( 'A2'  ,
        'B2',
        'C2'
    )
    ,
    ( 'D2'  ,
        'E2',
        'F2'
    )
    ,
    ( 'G'  ,
        'H',
        'I'
    )
;

-- join on compound keys and update matches as existing
UPDATE
    `tmp_yourtable` tmp
    INNER JOIN
        `yourtable` your
        ON
            your.column1     = tmp.column1
            AND your.column2 = tmp.column2
            AND your.column3 = tmp.column3
        SET tmp.duplicate    = 1
;

-- insert only those that do not exist already
INSERT INTO `yourtable`
    (column1   ,
        column2,
        column3
    )
SELECT
    column1,
    column2,
    column3
FROM
    `tmp_yourtable`
WHERE
    duplicate <> 1
;

DROP TEMPORARY TABLE `tmp_yourtable`
;
 

Firstly, we are creating a temporary table, which has an extra column in it called ‘duplicate` which we use to mark out duplicate rows. The additional column is a tinyint(1) which is what we use in Appian for booleans, and defaults to 0 (false). We will later set duplicate rows to 1 (true).

CREATE TEMPORARY TABLE `tmp_yourtable` LIKE `yourtable`
;

ALTER TABLE `tmp_yourtable` ADD COLUMN duplicate tinyint(1) DEFAULT 0
;

Next, we insert the rows we want to add into the temporary table first. Like so:

INSERT INTO `tmp_yourtable`
    ( `column1`  ,
        `column2`,
        `column3`
    )
    VALUES
    ( 'A2'  ,
        'B2',
        'C2'
    )
    ,
    ( 'D2'  ,
        'E2',
        'F2'
    )
    ,
    ( 'G'  ,
        'H',
        'I'
    )
;

In the next stage, we look for duplicates by comparing the temporary table with our actual table. For now, a duplicate is any row where all columns contain the same value except for the primary key.

UPDATE
    `tmp_yourtable` tmp
    INNER JOIN
        `yourtable` your
        ON
            your.column1     = tmp.column1
            AND your.column2 = tmp.column2
            AND your.column3 = tmp.column3
        SET tmp.duplicate    = 1
;

Now we have a temporary table containing all the data we wish to insert and we know which rows are duplicates because they have a 1 in the duplicate column. All that remains is copy the non-duplicate rows into our actual table - and of course to drop the temporary table in case another developer needs one like yours when this change goes to your production environment.

INSERT INTO `yourtable`
    (column1   ,
        column2,
        column3
    )
SELECT
    column1,
    column2,
    column3
FROM
    `tmp_yourtable`
WHERE
    duplicate <> 1
;

DROP TEMPORARY TABLE `tmp_yourtable`
;

What is a duplicated row? It is tempting to say any row in the table which contains the same data in every column apart from the primary key is a duplicate. However, the situation may not be as simple as this. Consider a table of books. Suppose we store the author, title and the artist who designed the cover for the current edition. In this case, we likely consider a row to be a duplicate if the author and title are the same. If we have different artists in data we want to insert, we will want to update the artist.  

To handle this type of situation, we modify step 3 to do an update of existing values in the target table as well as mark rows as duplicate. We will mark the rows as duplicate if the appropriate columns contain the same value. In the book example, this will only be the author and title. If we find a match here we would update the cover artist. 

For our abstract table, we will suppose rows are the same if columns 1 and 3 are the same, and update column 2 if a match is found. So the code for the third step becomes:

UPDATE
    `tmp_yourtable` tmp
    INNER JOIN
        `yourtable` your
        ON
            your.column1     = tmp.column1
            AND your.column3 = tmp.column3
        SET tmp.duplicate    = 1,
            your.column2     = tmp.column2
;