AWS Big Data Blog

Stored procedure enhancements in Amazon Redshift

Amazon Redshift is a fully managed, petabyte-scale data warehouse service in the cloud. With Amazon Redshift, you can analyze all your data to derive holistic insights about your business and your customers. It supports stored procedures where prepared SQL code is saved and the code can be reused over and over again.

Stored procedures are commonly used to encapsulate logic for data transformation, data validation, and business-specific logic. By combining multiple SQL steps into a stored procedure, you can create reusable code blocks that can run together as a single transaction or multiple individual transactions. You can also schedule stored procedures to automate data processing on Amazon Redshift. For more information, refer to Bringing your stored procedures to Amazon Redshift.

In the Redshift stored procedure default atomic transaction mode, a call to a Redshift stored procedure will create its own transaction when the call starts or is part of the existing transaction if an explicit transaction is opened before the stored procedure is called. All the statements inside a procedure behave as if they are in a single transaction block that ends when the stored procedure call finishes. A nested call to another procedure is treated like any other SQL statement and operates within the context of the same transaction as the caller. Statements for TRUNCATE, COMMIT, and ROLLBACK and the exception handling block with arbitrary SQL statements close the current transaction and start a new transaction implicitly. This behavior can cause challenges in migration to Amazon Redshift from other systems like Teradata.

In this post, we discuss the enhancements to Amazon Redshift stored procedures for non-atomic transaction mode. This mode provides enhanced transaction controls that enable you to automatically commit the statements inside the stored procedure.

Non-atomic transaction mode

The new non-atomic transaction mode feature provides three enhancements on stored procedures in Amazon Redshift:

  • Unless the DML or DDL statements are part of an explicit open transaction, each statement in the stored procedure will run in its own implicit transaction and a new transaction will be opened to handle following statements. If an explicit transaction is opened, then all subsequent statements are run and remain un-committed until an explicit transaction control command (COMMIT or ROLLBACK) is run to end the transaction.
  • Amazon Redshift will not re-raise the exception after the exception handler statements are complete. Therefore, a new RAISE statement without any INFO or EXCEPTION has been provided to re-throw the exception caught by the exception handling block. This RAISE statement without any INFO or EXCEPTION will only be allowed in the exception handling block.
  • Also, the new START TRANSACTION statement begins an explicit transaction inside the non-atomic transaction mode stored procedure. Use the existing transaction control command (COMMIT or ROLLBACK) to end the explicitly started transaction.
    • Amazon Redshift does not support sub-transactions so if there is already an open transaction, then calling this statement again will do nothing, and no error is raised.
    • If an explicit transaction is still open when the nonatomic transaction mode stored procedure call ends, then the explicit transaction remains open until a transaction control command is run in the session.
    • If the session disconnects before running a transaction control command, the whole transaction is automatically rolled back.

Additional restrictions

Some restrictions have also been introduced for Redshift stored procedures:

  • For nesting stored procedure calls, all the procedures must be created in the same transaction mode, no matter if it’s in atomic (default) transaction mode or the new non-atomic transaction mode
  • You can’t nest stored procedures across the two transaction modes (atomic and non-atomic)
  • You can’t set the SECURITY DEFINER option or SET configuration_parameter option for non-atomic transaction mode stored procedures

Impact to cursors

Cursors in non-atomic transaction mode stored procedures will behave differently compared to the default atomic transaction mode:

  • Cursor statements will need an explicit transaction block before beginning the cursor to ensure that each iteration of the cursor loop is not auto-committed.
  • To return a cursor from non-atomic transaction mode stored procedure, you will need an explicit transaction block before beginning the cursor. Otherwise, the cursor will be closed when the SQL statement inside the loop is automatically committed.

Advantages

The following are key advantages of this feature from a user perspective:

  • It provides the capability to lift and shift Teradata stored procedures to run in Teradata session mode. This helps in seamless migrations from data warehouses like Teradata and SQL Server.
  • It enables Amazon Redshift to provide more flexible operations inside of stored procedures when encountering errors and exceptions. Amazon Redshift can now preserve previous action’s state before reaching an exception.

Syntax

The new optional keyword NONATOMIC has been added to the stored procedure definition syntax, as shown in the following code:

CREATE [ OR REPLACE ] PROCEDURE sp_procedure_name
( [ [ argname ] [ argmode ] argtype [, ...] ] )
[ NONATOMIC ]
AS $$
procedure_body
$$ LANGUAGE plpgsql

This optional keyword creates the stored procedure under the non-atomic transaction mode. If you don’t specify the keyword, then the default atomic mode will be the transaction mode when creating the stored procedure.

NONATOMIC means each DML and DDL statement in the procedure will be implicitly committed.

Without non-atomic mode, the procedure will create its own transaction when the call starts or be part of the existing transaction if an explicit transaction is opened before it is called. Every statement within the stored procedure will belong to this one transaction.

Example of NONATOMIC mode

Let’s consider the customer contact table custcontacts, which stores customer primary and secondary contact phone numbers:

CREATE table custcontacts(
custid int4 not null,
primaryphone char(10),
secondaryphone char(10));

We insert three sample customer records with no contact values:

INSERT INTO custcontacts VALUES (101, 'xxxxxxxxxx', 'xxxxxxxxxx');
INSERT INTO custcontacts VALUES (102, 'xxxxxxxxxx', 'xxxxxxxxxx');
INSERT INTO custcontacts VALUES (103, 'xxxxxxxxxx', 'xxxxxxxxxx');

You need to create a stored procedure to update the primary and secondary phone numbers. The requirement is not to roll back updates to the primary contact number if updates to the secondary contact number fail for some reason.

You can achieve this by creating the stored procedure with the NONATOMIC keyword. The NONATOMIC keyword ensures that each statement in the stored procedure runs in its own implicit transaction block. Therefore, if the UPDATE statement for the secondary phone fails, then it won’t roll back the data update made to the primary phone. See the following code:

CREATE PROCEDURE sp_update_custcontacts(cid int4,pphone char(15),sphone char(15)) NONATOMIC AS
$$
BEGIN
UPDATE custcontacts SET primaryphone=pphone WHERE custid=cid;
UPDATE custcontacts SET secondaryphone=sphone WHERE custid=cid;
END;
$$
LANGUAGE plpgsql;

Now let’s call the stored procedure passing the secondary phone number with more than 10 digits, which will fail in the secondaryphone UPDATE statement due to incorrect length:

call sp_update_custcontacts(101,'1234567890','345443345324');

The preceding procedure call will update the primary phone number successfully. The secondary phone number update fails. However, the primaryphone update will not roll back because it ran in its own implicit transaction block due to the NONATOMIC clause in the stored procedure definition.

select * from custcontacts;

custcontacts | primaryphone | secondaryphone
-------------+---------------+---------------
101 | 1234567890 | XXXXXXXXXX
102 | XXXXXXXXXX | XXXXXXXXXX
103 | XXXXXXXXXX | XXXXXXXXXX

Exception handling in NONATOMIC mode

Exceptions are handled in stored procedures differently based on the atomic or non-atomic mode:

  • Atomic (default) – Exceptions are always re-raised
  • Non-atomic – Exceptions are handled and you can choose to re-raise or not

Let’s continue with the previous example to illustrate exception handling in non-atomic mode.

Create the following table to log exceptions raised by stored procedures:

CREATE TABLE procedure_log
(log_timestamp timestamp, procedure_name varchar(100), error_message varchar(255));

Now update the sp_update_custcontacts() procedure to handle exceptions. Note that we’re adding an EXCEPTION block in the procedure definition. It inserts a record in the procedure_log table in the event of an exception.

CREATE OR REPLACE PROCEDURE sp_update_custcontacts(cid int4,pphone char(15),sphone char(15)) NONATOMIC AS
$$
BEGIN
UPDATE custcontacts SET primaryphone=pphone WHERE custid=cid;
UPDATE custcontacts SET secondaryphone=sphone WHERE custid=cid;
EXCEPTION
WHEN OTHERS THEN
INSERT INTO procedure_log VALUES (getdate(), 'sp_update_custcontacts', sqlerrm);
END;
$$
LANGUAGE plpgsql;

Now create one more stored procedure, which will call the preceding procedure. It also has an EXCEPTION block and inserts a record in the procedure_log table in the event of an exception.

CREATE PROCEDURE sp_update_customer() NONATOMIC AS
$$
BEGIN
-- Let us assume you have additional staments here to update other fields. For this example, ommitted them for simplifiction.
-- Nested call to update contacts
call sp_update_custcontacts(101,'1234567890','345443345324');
EXCEPTION
WHEN OTHERS THEN
INSERT INTO procedure_log VALUES (getdate(), 'sp_update_customer', sqlerrm);
END;
$$
LANGUAGE plpgsql;

Let’s call the parent procedure we created:

call sp_update_customer();

This in turn will call the sp_update_custcontacts() procedure. The inner procedure sp_update_custcontacts() will fail because we’re updating the secondary phone with an invalid value. The control will enter the EXCEPTION block of the sp_update_custcontacts() procedure and make an insert into the procedure_log table.

However, it will not re-raise the exception in non-atomic mode. Therefore, the parent procedure sp_update_customer() will not get the exception passed from the sp_update_custcontacts() procedure. The control will not enter the EXCEPTION block of the sp_update_customer() procedure.

If you query the procedure_log table, you will see an entry only for the error handled by the sp_update_custcontacts() procedure:

select * from procedure_log;

Procedure Log Output

Now redefine the sp_update_custcontacts() procedure with the RAISE statement:

CREATE PROCEDURE sp_update_custcontacts(cid int4,pphone char(15),sphone char(15)) NONATOMIC AS
$$
BEGIN
UPDATE custcontacts SET primaryphone=pphone WHERE custid=cid;
UPDATE custcontacts SET secondaryphone=sphone WHERE custid=cid;
EXCEPTION
WHEN OTHERS THEN
INSERT INTO procedure_log VALUES (getdate(), 'sp_update_custcontacts', sqlerrm);
RAISE;
END;
$$
LANGUAGE plpgsql;

Let’s call the parent stored procedure sp_update_customer() again:

call sp_update_customer();

Now the inner procedure sp_update_custcontacts() will re-raise the exception to the parent procedure sp_update_customer() after handling the exception in its own EXCEPTION block. Then the control will reach the EXCEPTION block in the parent procedure and insert another record into the procedure_log table.

If you query the procedure_log table now, you will see two entries: one by the inner procedure sp_update_custcontacts() and another by the parent procedure sp_update_customer(). This demonstrates that the RAISE statement in the inner procedure re-raised the exception.

select * from procedure_log;

Procedure log output

Explicit START TRANSACTION statement in non-atomic mode

You can issue a START TRANSACTION statement to begin a transaction block inside the stored procedure. It will open a new transaction inside the stored procedure. For examples, refer to Nonatomic mode stored procedure transaction management.

Conclusion

In this post, we discussed the enhancements to Redshift stored procedures for non-atomic transaction mode, which provides enhanced transaction controls to enable you to automatically commit the statements inside the stored procedure. This mode also enables easier migration to Amazon Redshift from other systems like Teradata. Try out these enhancements and let us know your experience in comments.


About the Authors

Milind Oke is a Data Warehouse Specialist Solutions Architect based out of New York. He has been building data warehouse solutions for over 15 years and specializes in Amazon Redshift.

Satesh Sonti is a Sr. Analytics Specialist Solutions Architect based out of Atlanta, specialized in building enterprise data platforms, data warehousing, and analytics solutions. He has over 17 years of experience in building data assets and leading complex data platform programs for banking and insurance clients across the globe.

Kiran Chinta is a Software Development Manager at Amazon Redshift. He leads a strong team in query processing, SQL language, data security, and performance. Kiran is passionate about delivering products that seamlessly integrate with customers’ business applications with the right ease of use and performance. In his spare time, he enjoys reading and playing tennis.

Huichen Liu is a software development engineer on the Amazon Redshift query processing team. She focuses on query optimization, statistics and SQL language features. In her spare time, she enjoys hiking and photography.