On this collection, we discuss Swisscom’s journey of automating Amazon Redshift provisioning as a part of the Swisscom One Knowledge Platform (ODP) resolution utilizing the AWS Cloud Improvement Package (AWS CDK), and we offer code snippets and the opposite helpful references.
In Half 1, we did a deep dive on provisioning a safe and compliant Redshift cluster utilizing the AWS CDK and the very best practices of secret rotation. We additionally defined how Swisscom used AWS CDK {custom} sources to automate the creation of dynamic consumer teams which are related for the AWS Identification and Entry Administration (IAM) roles matching completely different job capabilities.
On this publish, we discover utilizing the AWS CDK and a number of the key matters for self-service utilization of the provisioned Redshift cluster by end-users in addition to different managed companies and purposes. These matters embrace federation with the Swisscom id supplier (IdP), JDBC connections, detective controls utilizing AWS Config guidelines and remediation actions, price optimization utilizing the Redshift scheduler, and audit logging.
Scheduled actions
To optimize cost-efficiency for provisioned Redshift cluster deployments, Swisscom carried out a scheduling mechanism. This performance is pushed by the consumer configuration of the cluster, as described in Half 1 of this collection, whereby the consumer might allow dynamic pausing and resuming of clusters primarily based on specified cron expressions:
This characteristic permits Swisscom to cut back operational prices by suspending cluster exercise throughout off-peak hours. This results in important price financial savings by pausing and resuming clusters at acceptable instances. The scheduling is achieved utilizing the AWS CloudFormation motion CfnScheduledAction. The next code illustrates how Swisscom carried out this scheduling:
JDBC connections
The JDBC connectivity for Amazon Redshift clusters was additionally very versatile, adapting to user-defined subnet varieties and safety teams within the configuration:
As illustrated within the ODP structure diagram in Half 1 of this collection, a substantial a part of extract, rework, and cargo (ETL) processes is anticipated to function outdoors of Amazon Redshift, throughout the serverless AWS Glue atmosphere. Given this, Swisscom wanted a mechanism for AWS Glue to connect with Amazon Redshift. This connectivity to Redshift clusters is offered by way of JDBC by creating an AWS Glue connection throughout the AWS CDK code. This connection permits ETL processes to work together with the Redshift cluster by establishing a JDBC connection. The subnet and safety group outlined within the consumer configuration information the creation of JDBC connectivity. If no safety teams are outlined within the configuration, a default one is created. The connection is configured with particulars of the information product from which the Redshift cluster is being provisioned, like ETL consumer and default database, together with community parts like cluster endpoint, safety group, and subnet to make use of, offering safe and environment friendly knowledge switch. The next code snippet demonstrates how this was achieved:
By doing this, Swisscom made certain that serverless ETL workflows in AWS Glue can securely talk with newly provisioned Redshift cluster working inside a secured digital non-public cloud (VPC).
Identification federation
Identification federation permits a centralized system (the IdP) for use for authenticating customers to be able to entry a service supplier like Amazon Redshift. A extra common overview of the subject may be present in Identification Federation in AWS.
Identification federation not solely enhances safety attributable to its centralized consumer lifecycle administration and centralized authentication mechanism (for instance, supporting multi-factor authentication), but additionally improves the consumer expertise and reduces the general complexity of id and entry administration and thereby additionally its governance.
In Swisscom’s setup, Microsoft Energetic Listing Companies are used for id and entry administration. On the preliminary construct phases of ODP, Amazon Redshift supplied two completely different choices for id federation:
In Swisscom’s context, in the course of the preliminary implementation, Swisscom opted for IAM-based SAML 2.0 IdP federation as a result of this can be a extra common method, which will also be used for different AWS companies, akin to Amazon QuickSight (see Organising IdP federation utilizing IAM and QuickSight).
At 2023 AWS re:Invent, AWS introduced a new connection possibility to Amazon Redshift primarily based on AWS IAM Identification Middle. IAM Identification Middle supplies a single place for workforce identities in AWS, permitting the creation of customers and teams immediately inside itself or by federation with customary IdPs like Okta, PingOne, Microsoft Entra ID (Azure AD), or any IdP that helps SAML 2.0 and SCIM. It additionally supplies a single sign-on (SSO) expertise for Redshift options and different analytics companies akin to Amazon Redshift Question Editor V2 (see Combine Identification Supplier (IdP) with Amazon Redshift Question Editor V2 utilizing AWS IAM Identification Middle for seamless Single Signal-On), QuickSight, and AWS Lake Formation. Furthermore, a single IAM Identification Middle occasion may be shared with a number of Redshift clusters and workgroups with a easy auto-discovery and join functionality. It makes certain all Redshift clusters and workgroups have a constant view of customers, their attributes, and teams. This complete setup suits nicely with ODP’s imaginative and prescient of offering self-service analytics throughout the Swisscom workforce with essential safety controls in place. On the time of writing, Swisscom is actively working in the direction of utilizing IAM Identification Middle as the usual federation resolution for ODP. The next diagram illustrates the high-level structure for the work in progress.
Audit logging
Amazon Redshift audit logging is beneficial for auditing for safety functions, monitoring, and troubleshooting. The logging supplies data, such because the IP tackle of the consumer’s laptop, the kind of authentication utilized by the consumer, or the timestamp of the request. Amazon Redshift logs the SQL operations, together with connection makes an attempt, queries, and adjustments, and makes it easy to trace the adjustments. These logs may be accessed by way of SQL queries towards system tables, saved to a safe Amazon Easy Storage Service (Amazon S3) location, or exported to Amazon CloudWatch.
Amazon Redshift logs data within the following log information:
- Connection log – Gives data to watch customers connecting to the database and associated connection data like their IP tackle.
- Person log – Logs details about adjustments to database consumer definitions.
- Person exercise log – Tracks details about the forms of queries that each the customers and the system carry out within the database. It’s helpful primarily for troubleshooting functions.
With the ODP resolution, Swisscom needed to jot down all of the Amazon Redshift logs to CloudWatch. That is presently in a roundabout way supported by the AWS CDK, so Swisscom carried out a workaround resolution utilizing the AWS CDK {custom} sources possibility, which invokes the SDK on the Redshift motion enableLogging. See the next code:
AWS Config guidelines and remediation
After a Redshift cluster has been deployed, Swisscom wanted to guarantee that the cluster meets the governance guidelines outlined in each time limit after creation. For that, Swisscom determined to make use of AWS Config.
AWS Config supplies an in depth view of the configuration of AWS sources in your AWS account. This contains how the sources are associated to 1 one other and the way they have been configured up to now so you’ll be able to see how the configurations and relationships change over time.
An AWS useful resource is an entity you’ll be able to work with in AWS, akin to an Amazon Elastic Compute Cloud (Amazon EC2) occasion, Amazon Elastic Block Retailer (Amazon EBS) quantity, safety group, or Amazon VPC.
The next diagram illustrates the method Swisscom carried out.
If an AWS Config rule isn’t compliant, a remediation may be utilized. Swisscom outlined the pause cluster motion as default in case of a non-compliant cluster (primarily based in your necessities, different remediation actions are attainable). That is lined utilizing an AWS Methods Supervisor automation doc (SSM doc).
Automation, a functionality of Methods Supervisor, simplifies widespread upkeep, deployment, and remediation duties for AWS companies like Amazon EC2, Amazon Relational Database Service (Amazon RDS), Amazon Redshift, Amazon S3, and plenty of extra.
The SSM doc is predicated on the AWS doc AWSConfigRemediation-DeleteRedshiftCluster. It appears to be like like the next code:
The SSM automations doc is deployed with the AWS CDK:
Swisscom outlined the principles to be utilized following AWS finest practices (see Safety Greatest Practices for Amazon Redshift). These are deployed as AWS Config conformance packs. A conformance pack is a set of AWS Config guidelines and remediation actions that may be rapidly deployed as a single entity in an AWS account and AWS Area or throughout a company in AWS Organizations.
Conformance packs are created by authoring YAML templates that include the listing of AWS Config managed or {custom} guidelines and remediation actions. You may also use SSM paperwork to retailer your conformance pack templates on AWS and immediately deploy conformance packs utilizing SSM doc names.
This AWS conformance pack may be deployed utilizing the AWS CDK:
Conclusion
Swisscom is constructing its next-generation data-as-a-service platform by way of a mixture of automated provisioning processes, superior safety features, and user-configurable choices to cater for numerous knowledge dealing with and knowledge merchandise’ wants. The mixing of the Amazon Redshift assemble within the ODP framework is a major stride in Swisscom’s journey in the direction of a extra related and data-driven enterprise panorama.
In Half 1 of this collection, we demonstrated methods to provision a safe and compliant Redshift cluster utilizing the AWS CDK in addition to methods to cope with the very best practices of secret rotation. We additionally confirmed methods to use AWS CDK {custom} sources in automating the creation of dynamic consumer teams which are related for the IAM roles matching completely different job capabilities.
On this publish, we confirmed, by way of the utilization of the AWS CDK, methods to tackle key Redshift cluster utilization matters akin to federation with the Swisscom IdP, JDBC connections, detective controls utilizing AWS Config guidelines and remediation actions, price optimization utilizing the Redshift scheduler, and audit logging.
The code snippets on this publish are offered as is and can have to be tailored to your particular use circumstances. Earlier than you get began, we extremely advocate talking to an Amazon Redshift specialist.
Concerning the Authors
Asad bin Imtiaz is an Knowledgeable Knowledge Engineer at Swisscom, with over 17 years of expertise in architecting and implementing enterprise-level knowledge options.
Jesús Montelongo Hernández is an Knowledgeable Cloud Knowledge Engineer at Swisscom. He has over 20 years of expertise in IT techniques, knowledge warehousing, and knowledge engineering.
Samuel Bucheli is a Lead Cloud Architect at Zühlke Engineering AG. He has over 20 years of expertise in software program engineering, software program structure, and cloud structure.
Srikanth Potu is a Senior Advisor in EMEA, a part of the Skilled Companies group at Amazon Internet Companies. He has over 25 years of expertise in Enterprise knowledge structure, databases and knowledge warehousing.