Use the Amazon Redshift SQLAlchemy dialect to work together with Amazon Redshift

November 18, 2021

275

[ad_1]

Amazon Redshift is a quick, scalable, safe, and absolutely managed cloud information warehouse that allows you to analyze your information at scale. You’ll be able to work together with an Amazon Redshift database in a number of other ways. One technique is utilizing an object-relational mapping (ORM) framework. ORM is extensively utilized by builders as an abstraction layer upon the database, which lets you write code in your most popular programming language as an alternative of writing SQL. SQLAlchemy is a well-liked Python ORM framework that allows the interplay between Python code and databases.

A SQLAlchemy dialect is the system used to speak with numerous forms of DBAPI implementations and databases. Beforehand, the SQLAlchemy dialect for Amazon Redshift used psycopg2 for communication with the database. As a result of psycopg2 is a Postgres connector, it doesn’t help Amazon Redshift particular performance akin to AWS Id and Entry Administration (IAM) authentication for safe connections and Amazon Redshift particular information varieties akin to SUPER and GEOMETRY. The brand new Amazon Redshift SQLAlchemy dialect makes use of the Amazon Redshift Python driver (redshift_connector) and allows you to securely hook up with your Amazon Redshift database. It natively helps IAM authentication and single sign-on (SSO). It additionally helps Amazon Redshift particular information varieties akin to SUPER, GEOMETRY, TIMESTAMPTZ, and TIMETZ.

On this submit, we talk about how one can work together together with your Amazon Redshift database utilizing the brand new Amazon Redshift SQLAlchemy dialect. We exhibit how one can securely join utilizing Okta and carry out numerous DDL and DML operations. As a result of the brand new Amazon Redshift SQLAlchemy dialect makes use of redshift_connector, customers of this bundle can take full benefit of the connection choices offered by redshift_connector, akin to authenticating through IAM and id supplier (IdP) plugins. Moreover, we additionally exhibit the help for IPython SqlMagic, which simplifies operating interactive SQL queries instantly from a Jupyter pocket book.

Stipulations

The next are the conditions for this submit:

Get began with the Amazon Redshift SQLAlchemy dialect

It’s straightforward to get began with the Amazon Redshift SQLAlchemy dialect for Python. You’ll be able to set up the sqlalchemy-redshift library utilizing pip. To exhibit this, we begin with a Jupyter pocket book. Full the next steps:

Create a pocket book occasion (for this submit, we name it redshift-sqlalchemy).
On the Amazon SageMaker console, beneath Pocket book within the navigation pane, select Pocket book cases.
Discover the occasion you created and select Open Jupyter.
Open your pocket book occasion and create a brand new conda_python3 Jupyter pocket book.
Run the next instructions to put in sqlalchemy-redshift and redshift_connector:

pip set up sqlalchemy-redshift
pip set up redshift_connector

redshift_connector offers many alternative connection choices that assist customise the way you entry your Amazon Redshift cluster. For extra info, see Connection Parameters.

Connect with your Amazon Redshift cluster

On this step, we present you the way to connect with your Amazon Redshift cluster utilizing two totally different strategies: Okta SSO federation, and direct connection utilizing your database person and password.

Join with Okta SSO federation

As a prerequisite, arrange your Amazon Redshift utility in your Okta configuration. For extra info, see Federate Amazon Redshift entry with Okta as an id supplier.

To ascertain a connection to the Amazon Redshift cluster, we make the most of the create_engine perform. The SQLAlchemy create_engine() perform produces an engine object based mostly on a URL. The sqlalchemy-redshift bundle offers a customized interface for creating an RFC-1738 compliant URL that you should use to determine a connection to an Amazon Redshift cluster.

We construct the SQLAlchemy URL as proven within the following code. URL.create() is accessible for SQLAlchemy model 1.4 and above. When authenticating utilizing IAM, the host and port don’t should be specified by the person. To attach with Amazon Redshift securely utilizing SSO federation, we use the Okta person title and password within the URL.

import sqlalchemy as sa
from sqlalchemy.engine.url import URL
from sqlalchemy import orm as sa_orm

from sqlalchemy_redshift.dialect import TIMESTAMPTZ, TIMETZ


# construct the sqlalchemy URL. When authenticating utilizing IAM, the host
# and port don't should be specified by the person.
url = URL.create(
drivername="redshift+redshift_connector", # point out redshift_connector driver and dialect shall be used
database="dev", # Amazon Redshift database
username="johnd@instance.com", # Okta username
password='<PWD>' # Okta password
)

# a dictionary is used to retailer further connection parameters
# which can be particular to redshift_connector or can't be URL encoded.
conn_params = {
"iam": True, # have to be enabled when authenticating through IAM
"credentials_provider": "OktaCredentialsProvider",
"idp_host": "<prefix>.okta.com",
"app_id": "<appid>",
"app_name": "amazon_aws_redshift",
"area": "<area>",
"cluster_identifier": "<clusterid>",
"ssl_insecure": False, # ensures certificates verification happens for idp_host
}

engine = sa.create_engine(url, connect_args=conn_params)

Join with an Amazon Redshift database person and password

You’ll be able to hook up with your Amazon Redshift cluster utilizing your database person and password. We assemble a URL and use the URL.create() constructor, as proven within the following code:

import sqlalchemy as sa
from sqlalchemy.engine.url import URL

# construct the sqlalchemy URL
url = URL.create(
drivername="redshift+redshift_connector", # point out redshift_connector driver and dialect shall be used
host="<clusterid>.xxxxxx.<aws-region>.redshift.amazonaws.com", # Amazon Redshift host
port=5439, # Amazon Redshift port
database="dev", # Amazon Redshift database
username="awsuser", # Amazon Redshift username
password='<pwd>' # Amazon Redshift password
)

engine = sa.create_engine(url)

Subsequent, we'll create a session utilizing the already established engine above. 

Session = sa_orm.sessionmaker()
Session.configure(bind=engine)
session = Session()

# Outline Session-based Metadata
metadata = sa.MetaData(bind=session.bind)

Create a database desk utilizing Amazon Redshift information varieties and insert information

With new Amazon Redshift SQLAlchemy dialect, you may create tables with Amazon Redshift particular information varieties akin to SUPER, GEOMETRY, TIMESTAMPTZ, and TIMETZ.

On this step, you create a desk with TIMESTAMPTZ, TIMETZ, and SUPER information varieties.

Optionally, you may outline your desk’s distribution fashion, type key, and compression encoding. See the next code:

import datetime
import uuid
import random

table_name="product_clickstream_tz"

RedshiftDBTable = sa.Desk(
table_name,
metadata,
sa.Column('session_id', sa.VARCHAR(80)),
sa.Column('click_region', sa.VARCHAR(100), redshift_encode="lzo"),
sa.Column('product_id', sa.BIGINT),
sa.Column('click_datetime', TIMESTAMPTZ),
sa.Column('stream_time', TIMETZ),
sa.Column ('order_detail', SUPER),
redshift_diststyle="KEY",
redshift_distkey='session_id',
redshift_sortkey='click_datetime'
)

# Drop the desk if it already exists
if sa.examine(engine).has_table(table_name):
RedshiftDBTable.drop(bind=engine)

# Create the desk (execute the "CREATE TABLE" SQL assertion for "product_clickstream_tz")
RedshiftDBTable.create(bind=engine)

On this step, you'll populate the desk by getting ready the insert command. 

# create pattern information set
# generate a UUID for this row
session_id = str(uuid.uuid1())

# create Area info
click_region = "US / New York"

# create Product info
product_id = random.randint(1,100000)

# create a datetime object with timezone
click_datetime = datetime.datetime(yr=2021, month=10, day=20, hour=10, minute=12, second=40, tzinfo=datetime.timezone.utc)

# create a time object with timezone
stream_time = datetime.time(hour=10, minute=14, second=56, tzinfo=datetime.timezone.utc)

# create SUPER info
order_detail="[{"o_orderstatus":"F","o_clerk":"Clerk#0000001991","o_lineitems":[{"l_returnflag":"R","l_tax":0.03,"l_quantity":4,"l_linestatus":"F"}]}]"

# create the insert SQL assertion
insert_data_row = RedshiftDBTable.insert().values(
session_id=session_id,
click_region=click_region,
product_id=product_id,
click_datetime=click_datetime,
stream_time=stream_time,
order_detail=order_detail
)

# execute the insert SQL assertion
session.execute(insert_data_row)
session.commit()

Question and fetch outcomes from the desk

The SELECT statements generated by SQLAlchemy ORM are constructed by a question object. You need to use a number of totally different strategies, akin to all(), first(), rely(), order_by(), and be part of(). The next screenshot exhibits how one can retrieve all rows from the queried desk.

Use IPython SqlMagic with the Amazon Redshift SQLAlchemy dialect

The Amazon Redshift SQLAlchemy dialect now helps SqlMagic. To ascertain a connection, you may construct the SQLAlchemy URL with the redshift_connector driver. Extra details about SqlMagic is accessible on GitHub.

Within the subsequent part, we exhibit how you should use SqlMagic. Just be sure you have the ipython-sql bundle put in; if not, set up it by operating the next command:

Connect with Amazon Redshift and question the info

On this step, you construct the SQLAlchemy URL to connect with Amazon Redshift and run a pattern SQL question. For this demo, we’ve got prepopulated TPCH information within the cluster from GitHub. See the next code:

import sqlalchemy as sa
from sqlalchemy.engine.url import URL
from sqlalchemy.orm import Session
%reload_ext sql
%config SqlMagic.displaylimit = 25

connect_to_db = URL.create(
drivername="redshift+redshift_connector",     host="cluster.xxxxxxxx.area.redshift.amazonaws.com",     
port=5439,  
database="dev",  
username="awsuser",  
password='xxxxxx'  
)
%sql $connect_to_db
%sql choose current_user, model();

You’ll be able to view the info in tabular format by utilizing the pandas.DataFrame() technique.

If you happen to put in matplotlib, you should use the outcome set’s .plot(), .pie(), and .bar() strategies for fast plotting.

Clear up

Ensure that SQLAlchemy assets are closed and cleaned up once you’re carried out with them. SQLAlchemy makes use of a connection pool to supply entry to an Amazon Redshift cluster. As soon as opened, the default conduct leaves these connections open. If not correctly cleaned up, this may result in connectivity points together with your cluster. Use the next code to wash up your assets:

session.shut()

# If the connection was accessed instantly, guarantee it's invalidated
conn = engine.join()
conn.invalidate()

# Clear up the engine
engine.dispose()

Abstract

On this submit, we mentioned the brand new Amazon Redshift SQLAlchemy dialect. We demonstrated the way it helps you to securely hook up with your Amazon Redshift database utilizing SSO in addition to direct connection utilizing the SQLAlchemy URL. We additionally demonstrated how SQLAlchemy helps TIMESTAMPTZ, TIMETZ, and SUPER information varieties with out explicitly casting it. We additionally showcased how redshift_connector and the dialect help SqlMagic with Jupyter notebooks, which allows you to run interactive queries in opposition to Amazon Redshift.

In regards to the Authors

Sumeet Joshi is an Analytics Specialist Options Architect based mostly out of New York. He focuses on constructing large-scale information warehousing options. He has over 16 years of expertise in information warehousing and analytical house.

Brooke White is a Software program Growth Engineer at AWS. She permits clients to get probably the most out of their information by means of her work on Amazon Redshift drivers. Previous to AWS, she constructed ETL pipelines and analytics APIs at a San Francisco Bay Space startup.

[ad_2]

Use the Amazon Redshift SQLAlchemy dialect to work together with Amazon Redshift

Stipulations

Get began with the Amazon Redshift SQLAlchemy dialect

Connect with your Amazon Redshift cluster

Join with Okta SSO federation

Join with an Amazon Redshift database person and password

Create a database desk utilizing Amazon Redshift information varieties and insert information

Question and fetch outcomes from the desk

Use IPython SqlMagic with the Amazon Redshift SQLAlchemy dialect

Connect with Amazon Redshift and question the info

Clear up

Abstract

In regards to the Authors

New DataGrail analysis finds firms might spend upwards of $400K/12 months complying with knowledge privateness legal guidelines, doubling the 2020 value

Automate notifications on Slack for Amazon Redshift question monitoring rule violations

From the Floor Up: The Reality About Information Innovation

LEAVE A REPLY Cancel reply

Most Popular

Engaged on a Scrum Group Coaching: Public Course Now Obtainable:

Introducing the Insider Incident Knowledge Trade Normal (IIDES)

Chris Patterson on MassTransit and Occasion-Pushed Methods – Software program Engineering Radio

LangChain and Agentic AI Engineering with Erick Friis

Free Video Coaching – Scrum Staff Reset – Video #1 Out there Now

Cyber-Knowledgeable Machine Studying

Charles Humble on Skilled Expertise for Software program Engineers – Software program Engineering Radio

The Subsea Cable Community with Josh Dzieza

Digital Forensics with Emre Tinaztepe

Fallout: London with Daniel Morrison Neil and Jordan Albon

Recent Comments

ABOUT US

POPULAR POSTS

Engaged on a Scrum Group Coaching: Public Course Now Obtainable:

Introducing the Insider Incident Knowledge Trade Normal (IIDES)

Chris Patterson on MassTransit and Occasion-Pushed Methods – Software program Engineering Radio

POPULAR CATEGORY