Transactions
A database transaction is a series of one or more operations, such as reading, writing, or updating data, that are performed as a single unit of work. Transactions are a core concept in databases, designed to maintain data integrity and consistency. They comply with the following ACID properties:
- Atomicity: A transaction is all-or-nothing. Either all operations within the transaction are completed successfully, or none are. If any operation fails, the transaction is rolled back to its initial state.
- Consistency: A transaction ensures that the database transitions from one valid state to another, preserving the defined rules and constraints.
- Isolation: Transactions are executed independently, and the intermediate states of one transaction are invisible to others. This prevents conflicts in concurrent transactions.
- Durability: Once a transaction is successfully committed, its results are permanent, even in the event of a system failure.
A database transaction, initiated using the BEGIN
statement, encapsulates multiple queries. The transaction is only committed when all the queries within it are executed successfully. If any query fails, the transaction is rolled back to ensure data integrity.
Below is an example of a database transaction:
-- Start the transaction
BEGIN;
-- Deduct £100 from Account A
UPDATE accounts
SET balance = balance - 100
WHERE account_id = 'A';
-- Add £100 to Account B
UPDATE accounts
SET balance = balance + 100
WHERE account_id = 'B';
-- Commit the transaction to save changes permanently
COMMIT;
For more information about the core concepts of database transactions, read Database transaction.
Transactions in the Data Productivity Cloud
By using the Begin, Rollback, and Commit components, a user can create a database transaction. Within a transaction, the user can include the SQL Script component to execute SQL commands and the Run Transformation component to run a transformation pipeline. After executing these steps, the user can use the Commit component to finalize the transaction, or the Rollback component to end the transaction, and undo any changes made since the latest Begin component was executed.
Note
- Database transactions are available in the Data Productivity Cloud for Snowflake and Amazon Redshift.
- Databricks doesn't use transactions but relies on Delta tables in a similar concept. As a result, database transactions won't be supported for Databricks.
Notes on Transactions
Auto Commit
Auto Commit is the default behavior in a pipeline. Any database changes made during execution are automatically committed as soon as each component completes. For more information, read Implicit and explicit transactions.
However, if transactional control is required, you can use the Begin, Commit, and Rollback components to explicitly manage database transactions. When these components are used, Auto Commit mode is temporarily disabled during the transaction. Changes made within the transaction are not committed until a Commit component is executed. Once the transaction is completed using either a Commit or Rollback component, the pipeline returns to Auto Commit mode until another Begin component is triggered.
Implicit and explicit transactions
Implicit transactions and explicit transactions refer to the ways in which transactions are initiated and managed in a database system. Both are used to maintain data integrity and consistency, but they differ in how they are controlled by the user or system.
When using the Begin component in the Data Productivity Cloud, this is initiating an explicit transaction.
Implicit transactions
Single DML (Data Manipulation Language) statements, such as INSERT
, DELETE
, and UPDATE
, are auto committed when executed outside an explicit transaction.
Explicit transactions
Use BEGIN
, COMMIT
, and ROLLBACK
to manage transactions explicitly.
BEGIN;
INSERT INTO table_name VALUES ('data');
COMMIT;
DML and DDL
Data Manipulation Language (DML)
DML refers to commands used to manipulate data within a database. These commands primarily handle retrieving, inserting, updating, and deleting records in database tables, such as INSERT
, DELETE
, and UPDATE
.
Data Definition Language (DDL)
Note
In Snowflake, DDL statements such as CREATE
, DROP
, and ALTER
are automatically committed and can't be rolled back.
DDL consists of commands used to define and manage the structure of a database, including creating, modifying, or deleting tables and schemas. Examples include CREATE
, ALTER
, DROP
, and TRUNCATE
.
Snowflake components using DDL
The following Snowflake components use DDL with the ALTER <object>
command:
The following Snowflake components use DDL with the CREATE <object>
command:
- Amazon Bedrock Prompt
- Azure OpenAI Prompt
- OpenAI Prompt
- Snowpark Container Prompt
- Python Pushdown
- Create Table
- Create File Format
- Create Stream
- Salesforce Output
The following Snowflake components use DDL with the DROP <object>
command:
Amazon Redshift components using DDL
Note
TRUNCATE
statements are automatically committed and can't be rolled back. However, when these components are used within a transaction, operations that would normally use TRUNCATE
are replaced with DELETE FROM
to maintain transaction safety. While DELETE FROM
is slower than TRUNCATE
, this change in logic is required within transactions. Outside of a transaction, the same components use the faster TRUNCATE
logic, avoiding the performance impact of DELETE FROM
.
The following Amazon Redshift components use DDL with the TRUNCATE
command:
To avoid a transaction being ended prematurely, it's recommended that you use these components outside of a transaction.
Open transactions
Transactions without an explicit COMMIT
or ROLLBACK
will be treated as incomplete transactions.
- For Snowflake: If a transaction remains open without a
COMMIT
orROLLBACK
, the data warehouse will automatically manage the situation by timing out and rolling back the transaction by default.- The timeout period can be configured in Snowflake settings. For more information, read Snowflake Sessions & Session Policies.
- For Amazon Redshift: If a transaction remains open without a
COMMIT
orROLLBACK
, Amazon Redshift doesn't perform automatic rollback. This means that open transactions won't automatically be rolled back after a certain period. Therefore, it's crucial to manage transactions explicitly to ensure they're properly concluded.-
To maintain optimal database health and performance, consider the following recommendations:
- Explicit transaction management: Always end transactions with a
COMMIT
to save changes or aROLLBACK
to discard them. - Monitor open transactions: Frequently monitor open transactions by using system tables like
SVV_TRANSACTIONS
to detect and resolve any that might remain open unintentionally.
Note
For more information about managing transactions, read Managing transactions.
- Explicit transaction management: Always end transactions with a
-
Nested transactions
Nested transactions aren't supported.
A second BEGIN
within an active transaction is ignored. This is illustrated in the image below.
Considerations for transactions
Soft Cancel vs Hard Cancel
- Soft Cancel:
- Allows the current component to finish execution but no subsequent components are run.
- When the pipeline is cancelled, the connection to the data warehouse is closed. Your chosen data warehouse then determines how to handle the open transaction, usually by rolling it back.
- Hard Cancel:
- Stops the pipeline immediately without waiting for the current component to finish.
- The transaction is typically rolled back by your data warehouse, assuming the component is not a DDL operation.
Recommendations
- Ensure that all pipelines explicitly use
COMMIT
orROLLBACK
for transactions to prevent open transactions. - Avoid including DDL commands within explicit transactions. A transaction begins with the Begin component and ends with either the Commit or Rollback component. A pipeline can consist of more than one transaction. If a pipeline requires a DDL component, it can occur outside of a transaction within the same pipeline.
- If an error occurs during a transaction, the recommended approach is to issue a
ROLLBACK
.