Use AWS Glue Information Catalog views to research knowledge


On this submit, we present you easy methods to use the brand new views function the AWS Glue Information Catalog. SQL views are a robust object used throughout relational databases. You should utilize views to lower the time to insights of information by tailoring the information that’s queried. Moreover, you should utilize the ability of SQL in a view to precise complicated boundaries in knowledge throughout a number of tables that may’t be expressed with less complicated permissions. Information lakes present clients the flexibleness required to derive helpful insights from knowledge throughout many sources and plenty of use circumstances. Information shoppers can devour knowledge the place they should throughout strains of enterprise, rising the speed of insights era.

Clients use many alternative processing engines of their knowledge lakes, every of which have their very own model of views with completely different capabilities. The AWS Glue Information Catalog and AWS Lake Formation present a central location to handle your knowledge throughout knowledge lake engines.

AWS Glue has launched a new function, SQL views, which lets you handle a single view object within the Information Catalog that may be queried from SQL engines. You possibly can create a single view object with a special SQL model for every engine you need to question, resembling Amazon Athena, Amazon Redshift, and Spark SQL on Amazon EMR. You possibly can then handle entry to those sources utilizing the identical Lake Formation permissions which are used to regulate tables within the knowledge lake.

Answer overview

For this submit, we use the Ladies’s E-Commerce Clothes Evaluation. The target is to create views within the Information Catalog so you’ll be able to create a single frequent view schema and metadata object to make use of throughout engines (on this case, Athena). Doing so permits you to use the identical views throughout your knowledge lakes to suit your use case. We create a view to masks the customer_id column on this dataset, then we’ll share this view to a different consumer in order that they’ll question this masked view.

Stipulations

Earlier than you’ll be able to create a view within the AWS Glue Information Catalog, just remember to have an AWS Identification and Entry Administration (IAM) function with the next configuration:

  • The next belief coverage:
    {
      "Model": "2012-10-17",
      "Assertion": [
        {
          "Effect": "Allow",
          "Principal": {
            "Service": [
               "glue.amazonaws.com",
               "lakeformation.amazonaws.com"
            ]
          },
          "Motion": "sts:AssumeRole"
        }
      ]
    }

  • The next cross function coverage:
    {
      "Model": "2012-10-17",
      "Assertion": [
        {
          "Sid": "Stmt1",
          "Action": [
            "iam:PassRole"
          ],
          "Impact": "Enable",
          "Useful resource": "*",
          "Situation": {
             "StringEquals": {
               "iam:PassedToService": [
                 "glue.amazonaws.com",
                 "lakeformation.amazonaws.com"
               ]
             }
           }
         }
       ]
    }

  • Lastly, additionally, you will want the next permissions:
    • "Glue:GetDatabase",
    • "Glue:GetDatabases",
    • "Glue:CreateTable",
    • "Glue:GetTable",
    • "Glue:UpdateTable",
    • "Glue:DeleteTable",
    • "Glue:GetTables",
    • "Glue:SearchTables",
    • "Glue:BatchGetPartition",
    • "Glue:GetPartitions",
    • "Glue:GetPartition",
    • "Glue:GetTableVersion",
    • "Glue:GetTableVersions"

Run the AWS CloudFormation template

You possibly can deploy the AWS CloudFormation template glueviewsblog.yaml to create the Lake Formation database and desk. The dataset might be loaded into an Amazon Easy Storage Service (Amazon S3) bucket.

For step-by-step directions, discuss with Making a stack on the AWS CloudFormation console.

When the stack is full, you’ll be able to see a desk known as clothing_parquet on the Lake Formation console, as proven within the following screenshot.

Create a view on the Athena console

Now that you’ve your Lake Formation managed desk, you’ll be able to open the Athena console and create a Information Catalog view. Full the next steps:

  1. Within the Athena question editor, run the next question on the Parquet dataset:
SELECT * FROM "clothing_reviews"."clothing_parquet" restrict 10;

Within the question outcomes, the customer_id column is at present seen.

Subsequent, you create a view known as hidden_customerID and masks the customer_id column.

  1. Create a view known as hidden_customerID:
CREATE PROTECTED MULTI DIALECT VIEW clothing_reviews.hidden_customerid SECURITY DEFINER AS 
SELECT * FROM clothing_reviews.clothing_parquet

Within the following screenshot, you’ll be able to see a view known as hidden_customerID was efficiently created.

  1. Run the next question to masks the primary 4 characters of the customer_id column for the newly generated view:
ALTER VIEW clothing_reviews.hidden_customerid UPDATE DIALECT AS
SELECT '****' || substring(customer_id, 4) as customer_id,clothing_id,age,title,review_text,ranking,recommend_ind,positive_feedback,division_name,department_name,class_name 
FROM clothing_reviews.clothing_parquet

You possibly can see within the following screenshot that the view hidden_customerID has the customer_id column’s first 4 characters masked.

The unique desk clothing_parquet stays the identical unmasked.

Grant entry of the view to a different consumer to question

Information Catalog views assist you to use Lake Formation to regulate entry. On this step, you grant this view to a different consumer known as amazon_business_analyst after which question from that consumer.

  1. Sign up to the Lake Formation console as admin.
  2. Within the navigation pane, select Views.

As proven within the following screenshot, you’ll be able to see the hidden_customerid view.

  1. Sign up because the amazon_business_analyst consumer and navigate to the Views web page.

This consumer has no visibility to the view.

  1. Grant permission to the amazon_business_analyst consumer from the information lake admin.
  1. Sign up once more as amazon_business_analyst and navigate to the Views web page.

  1. On the Athena console, question the hidden_customerid view.

You’ve gotten efficiently shared a view to the consumer and queried it from the Athena console.

Clear up

To keep away from incurring future expenses, delete the CloudFormation stack. For directions, discuss with Deleting a stack on the AWS CloudFormation console.

Conclusion

On this submit, we demonstrated easy methods to use the AWS Glue Information Catalog to create views. We then confirmed easy methods to alter the views and masks the information. You possibly can share the view with completely different customers to question utilizing Athena. For extra details about this new function, discuss with Utilizing AWS Glue Information Catalog views.


Concerning the Authors

Leonardo Gomez is a Principal Analytics Specialist Options Architect at AWS. He has over a decade of expertise in knowledge administration, serving to clients across the globe handle their enterprise and technical wants. Join with him on LinkedIn

Michael Chess – is a Product Supervisor on the AWS Lake Formation workforce based mostly out of Palo Alto, CA. He makes a speciality of permissions and knowledge catalog options within the knowledge lake.

Derek Liu – is a Senior Options Architect based mostly out of Vancouver, BC. He enjoys serving to clients remedy huge knowledge challenges by AWS analytic companies.

Recent Articles

Related Stories

Leave A Reply

Please enter your comment!
Please enter your name here