Most corporations have adopted a various set of software program as a service (SaaS) platforms to help numerous purposes. The speedy adoption has enabled them to rapidly streamline operations, improve collaboration, and achieve extra accessible, scalable options for managing their crucial knowledge and workflows.
Extra corporations have realized there is a chance to combine, improve, and current this SaaS knowledge to enhance inside operations and achieve worthwhile insights on their knowledge. Utilizing AWS Glue, a serverless knowledge integration service, corporations can streamline this course of, integrating knowledge from inside and exterior sources right into a centralized AWS knowledge lake. From there, they’ll carry out significant analytics, achieve worthwhile insights, and optionally push enriched knowledge again to exterior SaaS platforms.
This publish introduces the new HubSpot managed connector for AWS Glue, and demonstrates how one can combine HubSpot knowledge into your current knowledge lake on AWS. By consolidating HubSpot knowledge with knowledge out of your AWS accounts and from different SaaS companies, you’ll be able to improve, analyze, and optionally write the info again to HubSpot, making a seamless and built-in knowledge expertise.
Answer overview
On this instance, we use AWS Glue to extract, remodel, and cargo (ETL) knowledge out of your HubSpot account right into a transactional knowledge lake on Amazon Easy Storage Service (Amazon S3), utilizing Apache Iceberg format. We register the schema within the AWS Glue Knowledge Catalog to make your knowledge discoverable. Subsequently, we use Amazon Athena to validate that the HubSpot knowledge has been efficiently loaded to Amazon S3. The next diagram illustrates the answer structure.
The next are key elements and steps within the integration:
- Configure your HubSpot account and app to allow entry to your HubSpot knowledge.
- Put together for knowledge motion by securely storing your HubSpot OAuth credentials in AWS Secrets and techniques Supervisor, creating an S3 bucket to retailer your ingested knowledge, and creating an AWS Identification and Entry Administration (IAM) position for AWS Glue.
- Create an AWS Glue job to extract and cargo knowledge from HubSpot to Amazon S3. AWS Glue establishes a safe connection to HubSpot utilizing OAuth for authorization and TLS for knowledge encryption in transit. AWS Glue additionally helps the flexibility to use complicated knowledge transformations, enabling environment friendly knowledge integration and preparation to fulfill your wants.
- Schema and different metadata shall be registered within the AWS Glue Knowledge Catalog, a centralized metadata repository for all of your knowledge belongings. This helps simplify schema administration, and in addition makes the info discoverable by different companies.
- Run the AWS Glue job to extract knowledge from HubSpot and write it to Amazon S3 utilizing Iceberg format. Apache Iceberg is an open supply, high-performance open desk format designed for large-scale analytics, offering transactional consistency and seamless schema evolution. Though we use Iceberg on this instance, AWS Glue affords strong help for numerous knowledge codecs, together with different transactional codecs resembling Apache Hudi and Delta Lake.
- The info loaded to Amazon S3 shall be organized into partitioned folders to optimize for question efficiency and administration. Amazon S3 will even retailer the AWS Glue scripts, logs, and different momentary knowledge required in the course of the ETL course of.
- Lastly, Amazon Athena shall be used to question the info loaded from HubSpot to Amazon S3, validating that each one adjustments within the supply system have been captured efficiently.
- Optionally, HubSpot can recurrently synchronize HubSpot knowledge to Amazon S3 and analyze knowledge updates over time.
Arrange your HubSpot account
This instance requires you to create a HubSpot public app for AWS Glue in a HubSpot Developer account, and join it to an related HubSpot account. A HubSpot public app is a sort of integration that may be put in in your HubSpot accounts or listed within the HubSpot Market. On this instance, you create a HubSpot app for the AWS Glue integration, and set up it in a brand new check account. Though HubSpot calls it a public app, it is not going to be listed of their Market and can solely have entry to your check account.
- In the event you don’t have already got one, join a free HubSpot developer account.
- Log in to your HubSpot developer account, the place you’ll see choices to create apps and check accounts.
- Select Create a check account and comply with the directions.
HubSpot check accounts have Enterprise variations of the HubSpot Advertising and marketing, Gross sales, and Service Hubs together with pattern knowledge, so you’ll be able to check most HubSpot instruments, create CRM knowledge, and entry it by way of APIs with Glue. For extra details about making a check account, seek advice from Create a developer check account.
Create a HubSpot app
Full the next steps to create a HubSpot app:
- Change again to your HubSpot developer account, and select Create an app.
- Fill within the App Information part with the identify AWS Glue and a quick description.
- Select the Auth tab.
- For Redirect URLs, enter the redirect URL for AWS Glue within the type:
https://<area>.console.aws.amazon.com/gluestudio/oauth
.
You should definitely change <area>
along with your AWS Glue working AWS Area. As an illustration, the code for the US East (N. Virginia) Area is us-east-1, so the AWS Glue redirect URL is https://us-east-1.console.aws.amazon.com/gluestudio/oauth
.
- Within the Scopes part, select Add new scope and choose the next permissions:
- automation
- content material
- crm.lists.learn
- crm.lists.write
- crm.objects.corporations.learn
- crm.objects.corporations.write
- crm.objects.contacts.learn
- crm.objects.contacts.write
- crm.objects.customized.learn
- crm.objects.customized.write
- crm.objects.offers.learn
- crm.objects.offers.write
- crm.objects.homeowners.learn
- crm.schemas.customized.learn
- e-commerce
- varieties
- oauth
- sales-email-read
- tickets
- Evaluate the Scopes and Redirect URL settings, then select Create app.
- Navigate again to your app Auth tab.
- Pay attention to the values for Consumer ID, Consumer secret, and Set up URL (OAuth). You have to these later to attach your AWS Glue occasion.
Choose or create an Amazon S3 bucket the place your HubSpot knowledge will reside
Choose an current Amazon S3 bucket in your account, or create a brand new bucket to retailer your HubSpot knowledge, in addition to scripts, logs, and so forth. For this instance, the bucket identify will comply with the format aws-glue-hubspot-<account>-<area>
, the place <account>
is the AWS account quantity and <area>
is the working Area. The account shall be configured with all defaults: public entry disabled, versioning disabled, and server-side encryption with Amazon S3 managed keys (SSE-S3).
In the event you use AWSGlueServiceRole in your IAM position as proven on this instance, it would present entry to S3 buckets with names beginning with aws-glue-
.
Create an IAM position for AWS Glue
Create an IAM position with permissions for the AWS Glue job. AWS Glue will assume this position when calling different companies in your behalf.
- On the IAM console, select Roles within the navigation pane.
- Select Create position.
- For Trusted entity sort¸ select AWS service.
- For Use case, select Glue.
- Add the next AWS managed insurance policies to the position:
- AWSGlueServiceRole for accessing associated companies resembling Amazon S3, Amazon Elastic Compute Cloud, Amazon CloudWatch, and IAM. This coverage allows entry to S3 buckets with names beginning with
aws-glue-
. - SecretsManagerReadWrite for learn/write entry to AWS Secrets and techniques Supervisor.
- AWSGlueServiceRole for accessing associated companies resembling Amazon S3, Amazon Elastic Compute Cloud, Amazon CloudWatch, and IAM. This coverage allows entry to S3 buckets with names beginning with
- Give the position a reputation, for example
AWSGlueServiceRole_blog
.
For extra info, see Getting began with AWS Glue and Create an IAM position for AWS Glue.
Create a AWS Secrets and techniques Supervisor secret
AWS Secrets and techniques Supervisor is used to securely retailer your HubSpot OAuth credentials. Full the next steps to create a secret:
- On the AWS Secrets and techniques Supervisor console, select Secrets and techniques within the navigation pane.
- Select Retailer a brand new secret.
- For Secret sort, choose Different sort of secret.
- Underneath Kay/worth pairs, enter the HubSpot shopper secret with the important thing
USER_MANAGED_CLIENT_APPLICATION_CLIENT_SECRET
. - Select Subsequent.
- Enter the key identify, resembling
HubSpot-Weblog
, an outline, and proceed. - Depart the key rotation as default, and select Subsequent.
- Evaluate the key configuration, and select Retailer.
Create an AWS Glue connection
Full the next steps to create an AWS Glue connection to your HubSpot account:
- On the AWS Glue console, select Knowledge connections within the navigation pane.
- Select Create connection.
- For Knowledge sources, seek for and choose HubSpot.
- Select Subsequent.
- On the Configure connection web page, fill within the required info:
- For IAM service position, select the service position created beforehand. On this instance, we use the position
AWSGlueServiceRole_blog
. - For Authentication URL, go away as default.
- For Consumer Managed Consumer Utility ClientId, enter the OAuth shopper ID from HubSpot.
- For AWS Secret, select the OAuth shopper secret identify configured beforehand in AWS Secrets and techniques Supervisor.
- Select Subsequent.
- For IAM service position, select the service position created beforehand. On this instance, we use the position
- Select Take a look at Connection to validate the connection to HubSpot.
- This can convey up a brand new HubSpot connection window. You should definitely choose your HubSpot check account (not your developer account) to check the connection.
- If that is your first connection try, you can be redirected to a different web page the place you might be requested to substantiate the entry degree granted to AWS Glue. Select Join App.
If profitable, the HubSpot window will shut and your AWS connection window will say Connection check profitable.
- Underneath Set properties, for Title, enter a reputation (for instance,
HubSpot_Connection_blog
). - Select Subsequent.
- Underneath Evaluate and create, assessment your settings after which create the connection.
Create a database in AWS Glue Knowledge Catalog
Full the next steps to create a database in AWS Glue Knowledge Catalog to arrange your HubSpot knowledge:
- On the AWS Glue console, select Databases within the navigation pane.
- Create a brand new database.
- Enter a reputation (for instance,
hubspot
). - You’ll be able to go away the placement area clean.
- Select Create database.
Create an AWS Glue ETL job
Now that you’ve an AWS Glue knowledge connection to your HubSpot account, you’ll be able to create an AWS Glue ETL job to ingest HubSpot knowledge into your AWS knowledge lake. AWS Glue supplies each visible and code-based interfaces to simplify knowledge integration, relying in your experience. On this instance, we use the Script interface to ingest HubSpot knowledge into the Amazon S3 location. Full the next steps:
- On the AWS Glue console, select ETL jobs within the navigation pane.
- Select the Script editor.
- Select Spark because the engine, and add the next script.
The AWS Glue Spark job reads the HubSpot knowledge and merges it into the S3 bucket in Iceberg format.
- On the Job particulars tab, present the next info:
- For Title, enter a reputation, resembling
HubSpot_to_S3_blog
. - For Description, enter a significant description of the job.
- For IAM Position, select the IAM position you created beforehand (for this publish,
AWSGlueServiceRole_blog
).
- Increase Superior properties.
- Underneath Connections, enter your HubSpot connection from the earlier part (for this publish,
HubSpot_Connection_blog
).
- Underneath Job parameters, enter the next parameters:
-
- For
--conf
, enterspark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions --conf spark.sql.catalog.glue_catalog=org.apache.iceberg.spark.SparkCatalog --conf spark.sql.catalog.glue_catalog.catalog-impl=org.apache.iceberg.aws.glue.GlueCatalog --conf spark.sql.catalog.glue_catalog.io-impl=org.apache.iceberg.aws.s3.S3FileIO --conf spark.sql.catalog.glue_catalog.warehouse=file:///tmp/spark-warehouse
- For
--datalake-formats
, entericeberg
- For
--db_name
, enter the AWS Glue database to retailer your knowledge lake (for this publish,hubspot
) - For
--table_name
, enter the HubSpot desk to be ingested (for this publish,firm
) - For
--s3_bucket_name
, enter the place the ingested Iceberg desk is saved, on this caseaws-glue-hubspot-<account>-<area>
- For
--connection_name
, enter the AWS Glue connection identify created, on this caseHubSpot_Connection_blog
- For
- Select Save to save lots of the job, then select Run.
Relying on the quantity of information in your HubSpot account, the job can take a couple of minutes to finish. After a profitable job run, you’ll be able to select Run particulars to see the job specs and logs.
Use Athena to question knowledge
Athena is an interactive and serverless question service that makes it easy to research knowledge immediately in Amazon S3 utilizing normal SQL. On this instance, we question the outcomes of the HubSpot knowledge ingested into Amazon S3.
- On the Athena console, select Question editor.
- For Database, select
hubspot
, and it is best to see yourfirm
desk. - Choose entries from the
hubspot.firm
desk to view the info captured fromhubspot
.
You’ll be able to attempt numerous queries on the HubSpot knowledge, resembling:
Over time, your HubSpot knowledge might change. You’ll be able to rerun your ETL job periodically, and the Iceberg knowledge lake desk will successfully seize your adjustments. You’ll be able to confirm by including, eradicating, and altering corporations in your HubSpot database, after which rerun the ETL job. Your knowledge lake ought to match your newest HubSpot knowledge. With this functionality, you’ll be able to schedule the ETL job to run as usually as you want.
Extending the HubSpot connector with AWS companies
The HubSpot connector for AWS Glue supplies a robust basis for constructing complete knowledge pipelines and analytics workflows. By integrating HubSpot knowledge into your AWS surroundings, you need to use extra companies like Amazon Redshift, Amazon QuickSight, and Amazon SageMaker to additional course of, remodel, and analyze the info. This lets you assemble subtle, end-to-end knowledge architectures that unlock the complete worth of your HubSpot knowledge, with out the necessity to handle complicated infrastructure. The seamless integration between these AWS companies makes it easy to construct scalable analytics pipelines tailor-made to your particular necessities.
Issues
You’ll be able to arrange AWS Glue job triggers to run the ETL jobs on a schedule, in order that the info is recurrently synchronized between HubSpot and Amazon S3. You can even combine the ETL jobs with different AWS companies, together with AWS Step Capabilities, Amazon MWAA (Amazon Managed Workflows for Apache Airflow), AWS Lambda, Amazon EventBridge , and Amazon Bedrock to create a extra superior knowledge processing pipeline.
By default, the HubSpot connector doesn’t import deleted information. Nevertheless, you’ll be able to set the IMPORT_DELETED_RECORDS
choice to true to import all information, together with the deleted ones.
Clear up
To keep away from incurring fees, clear up the sources used on this publish out of your AWS account, together with the AWS Glue jobs, HubSpot connection, AWS Secrets and techniques Supervisor secret, IAM position, and Amazon S3 bucket.
Conclusion
With the introduction of the AWS Glue connector for HubSpot, integrating HubSpot knowledge with info from different knowledge sources has change into extra streamlined than ever. This function allows you to arrange ongoing knowledge integration from HubSpot to AWS, offering a unified view of information from throughout platforms and enabling extra complete analytics. The serverless nature of AWS Glue means there is no such thing as a infrastructure administration required, and also you solely pay for the sources consumed. By following the steps outlined on this publish, you’ll be able to be sure that up-to-date knowledge from HubSpot is captured within the your knowledge lake, permitting groups to make sooner data-driven choices and uncover complicated insights from throughout knowledge sources.
To study extra in regards to the AWS Glue connector for HubSpot, seek advice from Connecting to HubSpot in AWS Glue. This information walks by way of your entire course of, from establishing the connection to working the info switch circulation. For extra info on AWS Glue, go to AWS Glue.
In regards to the Authors
Eric Bomarsi is a Senior Options Architect within the ISV group at AWS, the place he focuses on constructing scalable options for giant prospects. As a member of the AWS analytics group, he helps prospects get strategic insights from their knowledge. Exterior of labor, he enjoys taking part in ice hockey and touring along with his household.
Annie Nelson is a Senior Options Architect at AWS. She is an information fanatic who enjoys drawback fixing and tackling complicated architectural challenges with prospects.
Kartikay Khator is a Options Architect inside International Life Sciences at AWS, the place he dedicates his efforts to creating progressive and scalable options that cater to the evolving wants of consumers. His experience lies in harnessing the capabilities of AWS analytics companies. Extending past his skilled pursuits, he finds pleasure and success on the planet of working and climbing. Having already accomplished a number of marathons, he’s at present making ready for his subsequent marathon problem.
Kamen Sharlandjiev is a Sr. Massive Knowledge and ETL Options Architect, Amazon MWAA and AWS Glue ETL skilled. He’s on a mission to make life simpler for purchasers who’re going through complicated knowledge integration and orchestration challenges. His secret weapon? Absolutely managed AWS companies that may get the job executed with minimal effort. Comply with Kamen on LinkedIn to maintain updated with the most recent Amazon MWAA and AWS Glue options and information!