Skip to content

How to retrieve missing Task History entries after 1.47 upgrade

Overview

Following a Postgres update, from version 1.47 of Matillion ETL and beyond, only a subset of the whole task history will be immediately available without user action. This subset is a maximum of the most recent 500,000 rows.

For example, if you had one million task history items, upgrading to version 1.47 of Matillion ETL would lead to only the most recent 500,000 entries being shown under ProjectTask History or via the API. Older entries are hidden from view and will eventually be cleaned up by the scheduler. Newer entries will not have this problem.

After a user runs the advised script below, Matillion ETL will "un-hide" the older entries, making these entries visible via the UI or via the API.

  • This article specifically refers to Postgres users (version 9.6).
  • The Postgres database can be either on-board Matillion ETL or external.
  • This is a one-time task. Users do not need to run this script more than once.

How to retrieve missing Task History entries

  1. SSH to the Matillion server.
  2. Initiate the psql command line tool, passing the hostname, port, dbname, and username; for example:
psql -h localhost -p 5432 -d emerald -U postgres

Or:

psql -h 127.0.0.1 -p 5432 -d emerald -U postgres
-h -p -d -U
hostname port dbname username

Provide a password if required when connecting to an external database.

For additional help with the PostgreSQL command line tool, click here.

  1. Once connected, execute the below script:
DO $$
    DECLARE
        package RECORD;
        rows_updated INTEGER := 0;
    BEGIN

        FOR package IN SELECT * FROM task_package_history WHERE org_id IS NULL
            LOOP
                UPDATE task_package_history
                SET org_id = 'none'
                WHERE taskid = package.taskid AND batchid = package.batchid;
                rows_updated := rows_updated + 1;
                IF mod(rows_updated, 500000) = 0 THEN
                    RAISE NOTICE 'Rows updated: %', rows_updated;
                END IF;
            END LOOP;

        RAISE NOTICE 'Total rows updated: %', rows_updated;
    END; $$;

Users can perform this task by either copying and pasting the block of code into their command line, or saving it as a .sql file.

If you opt to save the file, pass that file in the command line using the psql flag -f. For example:

psql -h localhost -p 5432 -d emerald -U postgres -f filename.sql
  1. This script will print out a message for every 500,000 rows updated, to keep users updated on progress.

The script is all-or-nothing. If the process crashes while running, nothing is updated.