Skip to content

Transactions

A database transaction is a series of one or more operations, such as reading, writing, or updating data, that are performed as a single unit of work. Transactions are a core concept in databases, designed to maintain data integrity and consistency. They comply with the following ACID properties:

  • Atomicity: A transaction is all-or-nothing. Either all operations within the transaction are completed successfully, or none are. If any operation fails, the transaction is rolled back to its initial state.
  • Consistency: A transaction ensures that the database transitions from one valid state to another, preserving the defined rules and constraints.
  • Isolation: Transactions are executed independently, and the intermediate states of one transaction are invisible to others. This prevents conflicts in concurrent transactions.
  • Durability: Once a transaction is successfully committed, its results are permanent, even in the event of a system failure.

A database transaction, initiated using the BEGIN statement, encapsulates multiple queries. The transaction is only committed when all the queries within it are executed successfully. If any query fails, the transaction is rolled back to ensure data integrity.

Below is an example of a database transaction:

-- Start the transaction
BEGIN;

-- Deduct £100 from Account A
UPDATE accounts
SET balance = balance - 100
WHERE account_id = 'A';

-- Add £100 to Account B
UPDATE accounts
SET balance = balance + 100
WHERE account_id = 'B';

-- Commit the transaction to save changes permanently
COMMIT;

For more information about the core concepts of database transactions, read Database transaction.


Transactions in the Data Productivity Cloud

By using the Begin, Rollback, and Commit components, a user can create a database transaction. Within a transaction, the user can include the SQL Script component to execute SQL commands and the Run Transformation component to run a transformation pipeline. After executing these steps, the user can use the Commit component to finalize the transaction, or the Rollback component to end the transaction, and undo any changes made since the latest Begin component was executed.

Note

  • Database transactions are available in the Data Productivity Cloud for Snowflake and Amazon Redshift.
  • Databricks doesn't use transactions but relies on Delta tables in a similar concept. As a result, database transactions won't be supported for Databricks.

Notes on Transactions

Auto Commit

Auto Commit is the default behavior in a pipeline. Any database changes made during execution are automatically committed as soon as each component completes. For more information, read Implicit and explicit transactions.

However, if transactional control is required, you can use the Begin, Commit, and Rollback components to explicitly manage database transactions. When these components are used, Auto Commit mode is temporarily disabled during the transaction. Changes made within the transaction are not committed until a Commit component is executed. Once the transaction is completed using either a Commit or Rollback component, the pipeline returns to Auto Commit mode until another Begin component is triggered.

Implicit and explicit transactions

Implicit transactions and explicit transactions refer to the ways in which transactions are initiated and managed in a database system. Both are used to maintain data integrity and consistency, but they differ in how they are controlled by the user or system.

When using the Begin component in the Data Productivity Cloud, this is initiating an explicit transaction.

Implicit transactions

Single DML (Data Manipulation Language) statements, such as INSERT, DELETE, and UPDATE, are auto committed when executed outside an explicit transaction.

Explicit transactions

Use BEGIN, COMMIT, and ROLLBACK to manage transactions explicitly.

BEGIN;
INSERT INTO table_name VALUES ('data');
COMMIT;

DML and DDL

Data Manipulation Language (DML)

DML refers to commands used to manipulate data within a database. These commands primarily handle retrieving, inserting, updating, and deleting records in database tables, such as INSERT, DELETE, and UPDATE.

Data Definition Language (DDL)

Note

In Snowflake, DDL statements such as CREATE, DROP, and ALTER are automatically committed and can't be rolled back.

DDL consists of commands used to define and manage the structure of a database, including creating, modifying, or deleting tables and schemas. Examples include CREATE, ALTER, DROP, and TRUNCATE.

Snowflake components using DDL

The following Snowflake components use DDL with the ALTER <object> command:

The following Snowflake components use DDL with the CREATE <object> command:

The following Snowflake components use DDL with the DROP <object> command:

Amazon Redshift components using DDL

Note

TRUNCATE statements are automatically committed and can't be rolled back. However, when these components are used within a transaction, operations that would normally use TRUNCATE are replaced with DELETE FROM to maintain transaction safety. While DELETE FROM is slower than TRUNCATE, this change in logic is required within transactions. Outside of a transaction, the same components use the faster TRUNCATE logic, avoiding the performance impact of DELETE FROM.

The following Amazon Redshift components use DDL with the TRUNCATE command:

To avoid a transaction being ended prematurely, it's recommended that you use these components outside of a transaction.


Open transactions

Transactions without an explicit COMMIT or ROLLBACK will be treated as incomplete transactions.

  • For Snowflake: If a transaction remains open without a COMMIT or ROLLBACK, the data warehouse will automatically manage the situation by timing out and rolling back the transaction by default.
  • For Amazon Redshift: If a transaction remains open without a COMMIT or ROLLBACK, Amazon Redshift doesn't perform automatic rollback. This means that open transactions won't automatically be rolled back after a certain period. Therefore, it's crucial to manage transactions explicitly to ensure they're properly concluded.
    • To maintain optimal database health and performance, consider the following recommendations:

      • Explicit transaction management: Always end transactions with a COMMIT to save changes or a ROLLBACK to discard them.
      • Monitor open transactions: Frequently monitor open transactions by using system tables like SVV_TRANSACTIONS to detect and resolve any that might remain open unintentionally.

      Note

      For more information about managing transactions, read Managing transactions.


Nested transactions

Nested transactions aren't supported.

A second BEGIN within an active transaction is ignored. This is illustrated in the image below.

Nested transactions


Considerations for transactions

Soft Cancel vs Hard Cancel

  • Soft Cancel:
    • Allows the current component to finish execution but no subsequent components are run.
    • When the pipeline is cancelled, the connection to the data warehouse is closed. Your chosen data warehouse then determines how to handle the open transaction, usually by rolling it back.
  • Hard Cancel:
    • Stops the pipeline immediately without waiting for the current component to finish.
    • The transaction is typically rolled back by your data warehouse, assuming the component is not a DDL operation.

Recommendations

  • Ensure that all pipelines explicitly use COMMIT or ROLLBACK for transactions to prevent open transactions.
  • Avoid including DDL commands within explicit transactions. A transaction begins with the Begin component and ends with either the Commit or Rollback component. A pipeline can consist of more than one transaction. If a pipeline requires a DDL component, it can occur outside of a transaction within the same pipeline.
  • If an error occurs during a transaction, the recommended approach is to issue a ROLLBACK.