How to retrieve missing Task History entries after 1.47 upgrade
Overview
Following a Postgres update, from version 1.47 of Matillion ETL and beyond, only a subset of the whole task history will be immediately available without user action. This subset is a maximum of the most recent 500,000 rows.
For example, if you had one million task history items, upgrading to version 1.47 of Matillion ETL would lead to only the most recent 500,000 entries being shown under Project → Task History or via the API. Older entries are hidden from view and will eventually be cleaned up by the scheduler. Newer entries will not have this problem.
After a user runs the advised script below, Matillion ETL will "un-hide" the older entries, making these entries visible via the UI or via the API.
- This article specifically refers to Postgres users (version 9.6).
- The Postgres database can be either on-board Matillion ETL or external.
- This is a one-time task. Users do not need to run this script more than once.
How to retrieve missing Task History entries
- SSH to the Matillion server.
- Initiate the psql command line tool, passing the hostname, port, dbname, and username; for example:
psql -h localhost -p 5432 -d emerald -U postgres
Or:
psql -h 127.0.0.1 -p 5432 -d emerald -U postgres
-h | -p | -d | -U |
---|---|---|---|
hostname | port | dbname | username |
Provide a password if required when connecting to an external database.
For additional help with the PostgreSQL command line tool, click here.
- Once connected, execute the below script:
DO $$
DECLARE
package RECORD;
rows_updated INTEGER := 0;
BEGIN
FOR package IN SELECT * FROM task_package_history WHERE org_id IS NULL
LOOP
UPDATE task_package_history
SET org_id = 'none'
WHERE taskid = package.taskid AND batchid = package.batchid;
rows_updated := rows_updated + 1;
IF mod(rows_updated, 500000) = 0 THEN
RAISE NOTICE 'Rows updated: %', rows_updated;
END IF;
END LOOP;
RAISE NOTICE 'Total rows updated: %', rows_updated;
END; $$;
Users can perform this task by either copying and pasting the block of code into their command line, or saving it as a .sql file.
If you opt to save the file, pass that file in the command line using the psql flag -f. For example:
psql -h localhost -p 5432 -d emerald -U postgres -f filename.sql
- This script will print out a message for every 500,000 rows updated, to keep users updated on progress.
The script is all-or-nothing. If the process crashes while running, nothing is updated.