This entry-level Hands-On Lab exercise is designed to help you master the basics of Coalesce. In this lab, you'll explore the Coalesce interface, learn how to easily transform and model your data with our core capabilities, and understand how to deploy and refresh version-controlled data pipelines.

What you'll need

What you'll build

What you'll learn

By completing the steps we've outlined in this guide, you'll have mastered the basics of Coalesce and can venture into our more advanced features.

Coalesce is the first cloud-native, visual data transformation platform built for Snowflake. Coalesce enables data teams to develop and manage data pipelines in a sustainable way at enterprise scale and collaboratively transform data without the traditional headaches of manual, code-only approaches.

What can I do with Coalesce?

With Coalesce, you can:

How is Coalesce different?

Coalesce's unique architecture is built on the concept of column-aware metadata, meaning that the platform collects, manages, and uses column- and table-level information to help users design and deploy data warehouses more effectively. This architectural difference gives data teams the best that legacy ETL and code-first solutions have to offer in terms of flexibility, scalability and efficiency.

Data teams can define data warehouses with column-level understanding, standardize transformations with data patterns (templates) and model data at the column level.

Coalesce also uses column metadata to track past, current, and desired deployment states of data warehouses over time. This provides unparalleled visibility and control of change management workflows, allowing data teams to build and review plans before deploying changes to data warehouses.

About this guide

The exercises in this guide use the sample data provided with Snowflake trial accounts. This data is focused on fictitious commercial business data including supplier, customer, and orders information. For a full overview of the TPC-H schema, please visit Snowflake's documentation.

To complete this lab, please create free trial accounts with Snowflake and Coalesce by following the steps below. You have the option of setting up Git-based version control for your lab, but this is not required to perform the majority of the lab exercises. Please note that none of your work will be committed to a repository unless you set Git up before developing.

To set up version control for your lab, be sure to complete Step 3 (Create a Git Repo for Your Lab Project) and Step 5 (Set Up Your Coalesce Trial Account With Git) in this section.

We recommend using Google Chrome as your browser for the best experience.

Step 1: Create a Snowflake Trial Account

  1. Fill out the Snowflake trial account form here. Use an email address that is not associated with an existing Snowflake account.
  2. When signing up for your Snowflake account, select the region that is physically closest to you and choose Enterprise as your Snowflake edition. Please note that the Snowflake edition, cloud provider, and region used when following this guide do not matter.

  1. After registering, you will receive an email from Snowflake with an activation link and URL for accessing your trial account. Finish setting up your account following the instructions in the email.

Step 2: Create a Coalesce Trial Account with Snowflake Partner Connect

Once you are logged into your Snowflake account, sign up for a free Coalesce trial account using Snowflake Partner Connect. Check your Snowflake account profile to make sure that it contains your fist and last name.

  1. Select Data Products > Partner Connect in the navigation bar on the left hand side of your screen and search for Coalesce in the search bar.

  1. Review the connection information and then click Connect.
  1. When prompted, click Activate to activate your account. You can also activate your account later using the activation link emailed to your address.
  2. Once you've activated your account, fill in your information to complete the activation process.

Congratulations! You've successfully created your Coalesce trial account. Next, you'll be taken to the Coalesce UI. Proceed to optional Step 3 (Create a Git Repo for Your Lab Project) in this section if you plan on using git-based version control for your lab. If you do not plan to use git for this lab exercise, scroll down to Step 4 (Set Up Your Coalesce Trial Account Without Git) in this section.

Step 3: Create a Git Repo for Your Lab Project (Optional)

Coalesce leverages Git for version control, which you can access once your remote Git repository is connected to your Coalesce account. You will need access to a free or existing Git account from your preferred service (GitHub, Bitbucket, GitLab or Azure DevOps Git) to set up version control. For the purposes of this lab, we recommend using Github.

  1. Sign in to GitHub at https://github.com/ with your credentials. Click on the "New" button on the left side of the screen to create a new repository for your lab.

  1. For your repository name, enter: [yourlastname]-hands-on-lab and accept all other defaults. Then click the "Create Repository" button.

  1. Copy your repository URL as you will need it in a later step of this lab.

  1. Click on your profile picture on the right side and then click on Settings:

  1. In the left sidebar, scroll to the bottom and click on Developer Settings:

  1. In the left sidebar, click on Personal Access Tokens > Tokens (classic).

  1. Click on "Generate new token" and then click "Generate new token (classic)".

  1. Add "COA HOL" to the Note field and then select the box next to "repo." Then scroll down to click the "Generate token" button.

  1. Copy your token, as you will need it in a later step of this lab. Now scroll down to Step 5 (Set Up Your Coalesce Trial Account With Git) in this section to complete your setup.

Step 4: Set Up Your Coalesce Trial Account (Without Git)

Once you've activated your Coalesce trial account and logged in, you will land in your Projects dashboard. Projects are a useful way of organizing your development work by a specific purpose or team goal, similar to how folders help you organize documents in Google Drive.

Your trial account includes a default Project to help you get started. For this lab, you will create a new Project to work in.

  1. Click on the + icon next to "Projects" in the upper left corner of the interface. Name your Project "Hands-On Lab" and then click Next.

  1. As you are not using Git for this lab, click "Skip and Create" on the next screen to create your Project without a Git connection. The pop-up wizard will prompt you to configure version control. Click "Skip and Create" to complete the process and then scroll down to Step 5 (Create Your Workspace) in this section.

  1. Once you've created your Hands-On Lab Project, click the "Create Workspace" button in the upper right corner.

Workspaces are dedicated development areas within Coalesce where you can build out your data warehouse and apply transformations to data.

  1. Name your new Workspace "Lab1." Then click the Create button.
  2. Click the Launch button to open your Workspace. Then proceed to Section 5 of this guide (Navigating the Coalesce User Interface).

Congratulations - you've completed your setup! Proceed to Section 5 (Navigating the Coalesce User Interface) in this guide.

Step 5: Set Up Your Coalesce Trial Account (With Git)

  1. Navigate to your Snowflake account and create a new Worksheet. Replace LASTNAME in the code below with your actual last name to create your own database and two schemas for your Development and Test environments. You will need these schemas when deploying to different environments in optional portions of this lab.

CREATE DATABASE LASTNAME;

CREATE SCHEMA LASTNAME.WORKDEV;

CREATE SCHEMA LASTNAME.WORKTEST;

  1. Ensure that you've activated your Coalesce trial account by clicking the button in your activation email. Log into your Coalesce account and you will see your Projects dashboard. Projects are a useful way of organizing your development work in Coalesce by a specific purpose or team goal, similar to how folders help you organize documents in Google Drive.

  1. Your trial account includes a default Project to help you get started. For this lab, you will create a new Project to work in.

Click on the + icon next to "Projects" in the upper left corner of the interface. Name your Project "Hands-On Lab" and then click Next.

  1. As you are using Git for this lab, paste your git repository URL from Step 3 (Create a Git Repo for Your Lab Project) in the Git Repository URL field. Then click Next.

  1. Click the "Add New Account" button on the Configure Git Account screen.

  1. Add your Git account credentials into the open fields, including the token you copied from Step 3 (Create a Git Repo for Your Lab Project).

  1. Click the dropdown menu under Git Account to add your new account and click the Finish button.
  1. As you have set up Git, your Project can contain multiple development Workspaces and deployable Environments (e.g QA, Production) that are each tied to their own Git branch. For the purposes of this lab, we will just create one Workspace to work with.

  1. Click "Create Workspace" and name it "Lab1." Then click the Next button to proceed.

  1. Select the Main Git branch from the dropdown menu and press the button next to "Initial commit". Name your New branch "Lab1" – this Git branch will be associated with this particular Workspace. Then click "Create" to finish creating your Workspace.

  1. Click the Launch button to open your Workspace.

  1. Navigate back to your Snowflake trial account. Copy your Snowflake URL, which you can find in the lower left of Snowflake's UI upon login. You will need this in later steps.

  1. Return to your Coalesce Workspace and click the pencil icon next to "Lab1" in the upper left corner to open your Workspace Settings. Then paste your Snowflake URL identifier from your trial account into the "Snowflake Account" field and click Save.

  1. Click the User Credentials tab in your Workspace Settings and fill out the form with your Snowflake username and password. Be sure to click "Test Connection" and ensure your credentials work as expected. Then click the Save button and close your Workspace Settings.

  1. Once in the Build interface, click on the gear icon in the lower left corner to open your Build Settings.

  1. Under "Storage Locations," click on "New Storage Location" and create a new location named SRC. A storage location is a logical name you provide to represent a database and schema in Snowflake. This location will be mapped to the Snowflake sample (TPCH_SF1) dataset.

  1. Click "New Storage Location" again and create a new location named WORK. Make sure to set WORK as the default Storage Location. This is where you will write new tables as part of this lab.

  1. Now click the pencil icon in the upper left corner of the Build interface to reopen your Workspace Settings.

Under Storage Mappings, enter the actual databases and schemas that will correspond to each Storage Location. For this lab, we will use Snowflake's sample schema TPCH_SF1 as our data source (SRC) and your LASTNAME database and WORKDEV schema that we created earlier in this section.

  1. Return to the Browser tab of your Build interface. Congratulations - you've completed your setup! Proceed to Section 5 (Navigating the Coalesce User Interface) in this guide.

If you are using existing Snowflake and Coalesce accounts for this lab instead of trial accounts, please follow the steps below to set up your lab environment. If you are using trial accounts for your lab, proceed to Section 5 (Navigating the Coalesce User Interface).

Step 1: Log Into Your Snowflake Account

  1. Log into your Snowflake account using your existing credentials.

If you have an existing Coalesce account, log into our US or EU app locations based on the region of your Snowflake account. If your company has a Coalesce license and uses single sign-on, please use your company-specific URL to log in.

  1. Copy your Snowflake URL, which you can find in the lower left of Snowflake's UI upon login. You will need this in later steps.

  1. Create a new Worksheet within Snowflake. Replace LASTNAME in the code below with your actual last name to create your own database and two schemas for your Development and Test environments. You will need these schemas when deploying to different environments in optional portions of this lab.

CREATE DATABASE LASTNAME;

CREATE SCHEMA LASTNAME.WORKDEV;

CREATE SCHEMA LASTNAME.WORKTEST;

  1. Proceed to optional Step 2 (Create a Git Repo for Your Lab Project) in this section if you plan on using git-based version control for your lab. If you do not plan to use git for this lab exercise, scroll down to Step 3 (Set Up Your Existing Coalesce Account Without Git) in this section.

Step 2: Create a Git Repo for Your Lab Project (Optional)

Coalesce leverages Git version control, which you can access once your remote Git repository is connected to your Coalesce account. You will need access to a free or existing Git account from your preferred service (GitHub, Bitbucket, GitLab or Azure DevOps Git) to set up version control.

  1. Sign in to GitHub at https://github.com/ with your credentials. Click on the "New" button on the left side of the screen to create a new repository for your lab.

  1. For your repository name, enter: [yourlastname]-hands-on-lab and accept all other defaults. Then click the "Create Repository" button.

  1. Copy your repository URL as you will need it in a later step of this lab.

  1. Click on your profile picture on the right side and then click on Settings:

  1. In the left sidebar, scroll to the bottom and click on Developer Settings:

  1. In the left sidebar, click on Personal Access Tokens > Tokens (classic).

  1. Click on "Generate new token" and then click "Generate new token (classic)".

  1. Add "COA HOL" to the Note field and then select the box next to "repo." Then scroll down to click the "Generate token" button.

  1. Copy your token, as you will need it in a later step of this lab. Scroll down to Step 4 (Set Up Your Existing Coalesce Account With Git) in this section to complete your setup.

Step 3: Set Up Your Existing Coalesce Account (Without Git)

These instructions will help you set up your existing Coalesce account without using Git for version control.

  1. Log into your Coalesce account using your existing credentials. You will land in the Coalesce UI and see your Projects dashboard. Projects are a way of organizing your development work by a specific purpose or team goal, similar to how folders help you organize documents in Google Drive.

Click on the + icon next to "Projects" in the upper left corner of the interface to create a new Project for this lab. Name your Project "Hands-On Lab" and then click Next.

  1. As you are not using Git for this lab, click "Skip and Create" on the next screen to create your Project without a Git connection.

  1. The pop-up wizard will prompt you to configure version control. Click "Skip and Create" to complete the process.

  1. In your new Project, click the "Create Workspace" button. Workspaces are dedicated development areas within Coalesce where you can build out your data warehouse and apply transformations to data.

  1. Name your new Workspace "Lab1" and click the Create button.

  1. Click the Launch button to open your Workspace, and then scroll down to Step 5 in this section (Connect Your Snowflake and Coalesce Accounts).

Step 4: Set Up Your Existing Coalesce Account (with Git)

These instructions will help you set up your existing Coalesce account using the Git repository that you created in Step 2.

  1. Log into your Coalesce account using your existing credentials. You will land in the Coalesce UI and see your Projects dashboard. Projects are a way of organizing your development work by a specific purpose or team goal, similar to how folders help you organize documents in Google Drive.

Click on the + icon next to "Projects" in the upper left corner of the interface to create a new Project for this lab. Name your Project "Hands-On Lab" and then click Next.

  1. Paste your git repository URL from Step 2 in the open field and click Next.

  1. Click the "Add New Account" button on the Configure Git Account screen.
  2. Add your Git account credentials including the token you copied from Step 2.

  1. Click the dropdown menu under Git Account to add your new account and then click the Finish button.

As you have Git set up, your Project can contain multiple development Workspaces and deployable Environments (e.g QA, Production). For the purposes of this lab, we will just create one Workspace to work with.

  1. Click "Create Workspace" and name it "Lab1". Then click the Next button to proceed.

  1. Select the Main Git branch from the dropdown menu and the button next to "Initial commit". Name your New branch "Lab1" – this Git branch will be associated with this particular Workspace. Then click "Create" to finish creating your Workspace.

  1. Click the Launch button to open your Workspace, and then scroll down to Step 5 in this section (Connect Your Snowflake and Coalesce Accounts).

Step 5: Connect Your Snowflake and Coalesce Accounts

  1. Click the Launch button to open your Lab1 Workspace.

  1. In your Workspace, click the pencil icon next to "Lab1" in the upper left corner to open your Workspace Settings. Then paste your Snowflake URL identifier from Step 1 into the "Snowflake Account" field and click Save.

  1. Click the User Credentials tab in your Workspace Settings and fill out the form with your Snowflake username and password. Be sure to click "Test Connection" and ensure your credentials work as expected. Then click the Save button and close your Workspace Settings.

  1. Once in the Build interface, click on the gear icon in the lower left corner to open your Build Settings.

  1. Under "Storage Locations," click on "New Storage Location" and create a new location named SRC. A storage location is a logical name you provide to represent a database and schema in Snowflake. This location will be mapped to the Snowflake sample (TPCH_SF1) dataset.

  1. Click "New Storage Location" again and create a new location named WORK. Make sure to set WORK as the default Storage Location. This is where you will write new tables as part of this lab.

  1. Now click the pencil icon in the upper left corner of the Build interface to reopen your Workspace Settings.

Under Storage Mappings, enter the actual databases and schemas that will correspond to each Storage Location. For this lab, we will use Snowflake's sample schema TPCH_SF1 as our data source (SRC) and your LASTNAME database and WORKDEV schema that we created earlier in this section.

  1. Return to the Browser tab of your Build interface.

Congratulations - you've completed your setup! Proceed to Section 5 (Navigating the Coalesce User Interface) in this guide.

Let's get familiar with Coalesce by walking through the basic components of the user interface.

Upon launching your Development Workspace, you'll be taken to the Build Interface of the Coalesce UI. The Build Interface is where you can create, modify and publish nodes that will transform your data.

Nodes are visual representations of objects in Snowflake that can be arranged to create a Directed Acyclic Graph (DAG or Graph). A DAG is a conceptual representation of the nodes within your data pipeline and their relationships to and dependencies on each other.

  1. In the Browser tab of the Build interface, you can visualize your node graph using the Graph, Node Grid, and Column Grid views. In the upper right hand corner, there is a person-shaped icon where you can manage your account and user settings.

  1. By clicking the question mark icon, you can access the Resource Center to find product guides, view announcements, request support, and share feedback.

  1. The Browser tab of the Build interface is where you'll build your pipelines using nodes. You can visualize your graph of these nodes using the Graph, Node Grid, and Column Grid views. While this area is currently empty, we will build out nodes and our Graph in subsequent steps of this lab.

  1. Next to the Build interface is the Deploy interface. This is where you will push your pipeline to other environments such as Testing or Production. Like the Browser tab, this area is currently empty but will populate as we will build out Nodes and our Graph in subsequent steps of this lab.

Let's start to build a Graph by adding data in the form of Source nodes.

  1. Start in the Build interface and click on Nodes in the left sidebar (if not already open). Click the + icon and select Add Sources.

  1. Click to expand the tables within COALESCE_SAMPLE DATABASE.TPCH_SF001 and select all of the corresponding source tables underneath (8 tables total). Click Add Sources on the bottom right to add the sources to your pipeline.

  1. You'll now see your graph populated with your Source nodes. Note that they are designated in red. Each node type in Coalesce has its own color associated with it which helps with visual organization when viewing a Graph.

Now that you've added your Source nodes, let's prepare the data by adding business logic with Stage nodes. Stage nodes represent staging tables within Snowflake where transformations can be previewed and performed.

Let's start by adding a standardized "stage layer" for all sources.

  1. Press and hold the Shift key to multi-select all of your Source nodes. Then right click and select Add Node > Stage from the drop down menu.

  1. At the top right side of the screen, press the ellipsis to select Create All from the menu and then press the Run All button to populate your Stage nodes.

Now that we have a standard layer for performing basic transformations, let's add a more specific transformation by joining two of our Stage nodes. We'll complete this in the Node Editor, which is used to edit the node object, the columns within it, and its relationship to other nodes

  1. Press the Shift key and select the STG_LINEITEM and STG_ORDERS nodes. Once selected, right click and select Join Nodes > Stage to create a Stage node.

This action will cause a separate Node Editor tab to open up, which will appear next to your Browser tab. This is the Node Editor for the STG_LINEITEM_ORDERS node you just created. There are a few different components to this tab:

  1. The first section is the Mapping grid, where you can see the structure of your node along with column metadata like transformations, data types, and sources.

  1. On the right hand side of the Node Editor is the Config section, where you can view and set configurations based on the type of node you're using.

  1. At the bottom of the Node Editor, press the Create button to view the Data Preview pane.

Let's now join your STG_ORDERS data with your STG_LINEITEM data.

  1. Select the Join tab of your STG_LINEITEM_ORDERS node in the Node Editor. This is where you can define the logic to join data from multiple nodes, with the help of a generated Join statement.

  1. Add column relationships in the last line of the existing Join statement using the bolded code below:

FROM {{ ref('WORK', 'STG_LINEITEM') }} "STG_LINEITEM"

INNER JOIN {{ ref('WORK', 'STG_ORDERS') }} "STG_ORDERS"

ON "STG_LINEITEM"."L_ORDERKEY" = "STG_ORDERS"."O_ORDERKEY"

  1. To check if your statement is syntactically correct, press the white Validate Select button in the lower half of the screen. The Results panel should show a green checkmark, confirming that the SQL is accepted. If not, you will see a database message indicating the error.

  1. To create your Stage Node, click the Create button in the lower half of the screen. Then click the Run button to populate your STG_LINEITEM_ORDERS node and preview the contents of your node by clicking Fetch Data.

Let's apply a single column transformation in your STG_LINEITEM_ORDERS node. Click back to Mapping and scroll down in the Mapping grid to select the O_ORDERPRIORITY column.

  1. Right click on O_ORDERPRIORITY and select Duplicate Column from the drop down menu.

  1. Rename the original O_ORDERPRIORITY column as O_ORDERPRIORITY_NUM and the duplicate column as O_ORDERPRIORITY_DESC. You will need to double-click on the column name to trigger the renaming.

  1. In the transform field for O_ORDERPRIORITY_NUM, enter the transformation SPLIT_PART("STG_ORDERS"."O_ORDERPRIORITY", '-', 1 ). This transformation formula takes the original O_ORDERPRIORITY column in STG_ORDERS, splits its value at instances of '-', and returns the first portion of the split.

  1. In the transform field for O_ORDERPRIORITY_DESC, enter the transformation SPLIT_PART("STG_ORDERS"."O_ORDERPRIORITY", '-', 2 ) . Notice that this is the same transformation you entered previously, except this will now return the second portion of the split.
  1. Click the ellipsis on the Create button and select Validate Create to validate that your create statement will run. On confirmation, click Create. Click the ellipsis on the Run button and select Validate Run to validate that your create statement will run. On confirmation, click Run. Once the Run has completed, scroll to the right in the Data Preview pane to preview your new O_ORDERPRIORITY_NUM and O_ORDERPRIORITY_DESC columns.

Now let's apply another single column transformation by concatenating two different columns from different Stage nodes.

  1. Scroll to the bottom of the Mapping grid and double click on the gray column named Column Name.

  1. Name this column DAYS TO SHIP and press the Enter key to create a new column. Under Data Type, enter NUMBER.

  1. In the Transform field, enter the following transformation to calculate the difference between order dates and shipping dates listed in the STG_ORDERS and STG_LINEITEM nodes.

DATEDIFF('DAY',"STG_ORDERS"."O_ORDERDATE","STG_LINEITEM"."L_SHIPDATE")

  1. Click the Create and Run buttons to execute the statements. Once the Run has completed, scroll to the right in the Data Preview pane to view your newly created DAYS TO SHIP column.

Now let's experiment with creating Dimension nodes. These nodes are generally descriptive in nature and can be used to track particular aspects of data over time (such as time or location). Coalesce currently supports Type 1 and Type 2 slowly changing dimensions, which we will explore in this section.

  1. In your STG_LINEITEM_ORDERS node, navigate to the Mapping grid. Select the O_CUSTKEY column and drag it to the top of the Mapping grid. Repeat this motion by dragging the L_SUPPKEY and L_PARTKEY columns to the top of the Mapping grid.

Now let's create a Dimension node for each of these foreign key columns by navigating back to the Browser tab.

  1. Press the Shift key and select STG_CUSTOMER, STG_SUPPLIER and STG_PART source nodes on the graph. Then right click and select Add Node > Dimension.

  1. A Dimension node will appear for each of your selected Stage nodes. Under the Run All button, click the first icon to optimize the layout of your graph.
  1. Double click on your new DIM_PART node to open up the Node Editor. In the Config section on the right hand side of the screen, click to expand Options. Under Business Key, check the box next to P_PARTKEY and press the > button to select it as your business key. This will remain a Type 1 dimension, so we will not select columns for change tracking over time.

  1. Return to the Browser tab and double click on the DIM_CUSTOMER node. In the Config section of your editor, expand Options and select C_CUSTKEY as your business key.

To create a Type 2 dimension, select the C_NAME and C_ADDRESS columns under Change Tracking to track changes to these columns over time.

  1. Return to the Browser tab once more and double click to open up the DIM_SUPPLIER node. Select S_SUPPKEY as your business key to create a Type 1 dimension.

  1. Return to your graph and click Create All and Run All in the upper right hand corner.

Now let's apply a bulk transformation to some of the columns in your DIM_CUSTOMER node. Double click on DIM_CUSTOMER in your graph to open up your Node Editor.

  1. In the Mapping grid, click twice on Data Type to bring all VARCHAR columns to the top of the grid. Press and hold the Shift key to multi-select all the VARCHAR columns beginning with the C_ prefix. Right click on your selections and click on Bulk Edit in the dropdown menu.

  1. In the Column Editor on the right hand side of the screen, select Data Type and Transform under Attributes. Clarify the Data Type as STRING and enter the transformation UPPER({{SRC}})in the Transform field.

  1. Click the Preview button and then press Update to apply the bulk transformation.

  1. Return to the Browser tab and right click the STG_LINEITEM_ORDERS node to create a new Stage node, which will be a lookup table for our Dimension nodes.
  1. Open the new Stage node and rename it STG_LINEITEM_ORDERS_DIM_LOOKUP in the Node Editor. On the left hand side of the screen in the Node and Column selector, select the DIM_CUSTOMER node. Drag the DIM_CUSTOMER_KEY column from the column selector to the Mapping grid.

  1. Repeat this action by selecting the DIM_PART node and dragging the DIM_PART_KEY column into the Mapping grid.
  1. Repeat this action once more by selecting the DIM_SUPPLIER node and dragging the DIM_SUPPLIER_KEY column into the Mapping grid.

  1. Select the Join tab and delete the existing code shown.
  1. Ordinarily, you can click Generate Join and then Copy to Editor to automatically import your join statement. For this particular step, copy and paste the code below into the editor instead.

FROM {{ ref('WORK', 'STG_LINEITEM_ORDERS') }} "STG_LINEITEM_ORDERS"

LEFT JOIN {{ ref('WORK', 'DIM_CUSTOMER') }} "DIM_CUSTOMER"

ON "STG_LINEITEM_ORDERS"."O_CUSTKEY" = "DIM_CUSTOMER"."C_CUSTKEY"

AND "STG_LINEITEM_ORDERS"."O_ORDERDATE" BETWEEN "DIM_CUSTOMER"."SYSTEM_START_DATE" AND "DIM_CUSTOMER"."SYSTEM_END_DATE"

/* The dimension lookup will not yield any matches due to a lack of historical data in the customer dimension. */

LEFT JOIN {{ ref('WORK', 'DIM_PART') }} "DIM_PART"

ON "STG_LINEITEM_ORDERS"."L_PARTKEY" = "DIM_PART"."P_PARTKEY"

LEFT JOIN {{ ref('WORK', 'DIM_SUPPLIER') }} "DIM_SUPPLIER"

ON "STG_LINEITEM_ORDERS"."L_SUPPKEY" = "DIM_SUPPLIER"."S_SUPPKEY"

LEFT JOIN {{ ref('WORK', 'STG_PARTSUPP') }} "STG_PARTSUPP"

ON "STG_LINEITEM_ORDERS"."L_PARTKEY" = "STG_PARTSUPP"."PS_PARTKEY" AND

"STG_LINEITEM_ORDERS"."L_SUPPKEY" = "STG_PARTSUPP"."PS_SUPPKEY"

  1. Then click the Create and Run buttons to create and populate your STG_LINEITEM_ORDERS_DIM_LOOKUP node.

Note that the join did not yield any matches for DIM_CUSTOMER_KEY due to a lack of historical data in the customer dimension node. With proper historical data, this would result in a valid dimension lookup.

Now let's create a Fact node. Fact nodes represent Coalesce's implementation of a Kimball Fact Table, which consists of the measurements, metrics, or facts of a business process and is typically located at the center of a star or Snowflake schema surrounded by dimension tables.

  1. Return to the Browser tab and right click on the STG_LINEITEM_ORDERS_DIM_LOOKUP node to create a Fact node. Then click Click Create All and Run All in the upper right hand corner.

  1. Let's rename some of the columns in our Fact node for readability. Double click to open your FCT_LINEITEM_ORDERS_DIM_LOOKUP node. Once you are in the Node Editor, click Column Name to sort. Press and hold the Shift key to multi-select all columns that begin with the L_ prefix. Then right click and select Bulk Edit from the drop down menu.

  1. Select Column Name as your attribute and enter the code LINEITEM_{{column.name[2:]}} to rename the columns. This particular Jinja code snippet will concatenate the prefix and remove the first 2 characters from the original column name.

  1. Click Preview to check your renaming rule, and then press Update to apply it to your selected columns.

  1. Repeat this process by bulk renaming all columns with an O_ prefix using the code ORDER_{{column.name[2:]}}

  1. Click Preview and then Update to apply your transformations, and then return to your Browser tab. Click Create and then Run to create your Fact table and populate it with data. You have now built out a small data mart.

Of course, business requirements are always changing and the business has now decided that they want to include supply costs as part of their analysis. Let's explore how you can manage your pipeline under these evolving circumstances.

  1. First, open the STG_LINEITEM_ORDERS node. Once in your Node Editor, select the STG_PARTSUPP node in the left hand side of the screen. Drag the PS_AVAILQTY and PS_SUPPLYCOST columns into the Mapping grid.

  1. Switch to the Join tab. Replace the existing join with the code below, and then click the Create and Run buttons.

FROM {{ ref('WORK', 'STG_ORDERS') }} "STG_ORDERS"

INNER JOIN {{ ref('WORK', 'STG_LINEITEM') }} "STG_LINEITEM"

ON "STG_ORDERS"."O_ORDERKEY" = "STG_LINEITEM"."L_ORDERKEY"

INNER JOIN {{ ref('WORK', 'STG_PARTSUPP') }} "STG_PARTSUPP"

ON "STG_LINEITEM"."L_PARTKEY" = "STG_PARTSUPP"."PS_PARTKEY" AND "STG_LINEITEM"."L_SUPPKEY" = "STG_PARTSUPP"."PS_SUPPKEY"

Now let's take a look at our column lineage and ensure that all of our columns are carried through our pipeline.

  1. Switch back to the Mapping tab in the STG_LINEITEM_ORDERS node editor. Select the PS_AVAILQTY and PS_SUPPLYCOST columns and right click to select View Column Lineage from the drop down menu.

  1. Once in the Column Lineage view, select both columns and click the ellipses to select Propagate Addition. This will allow you to propagate these columns through your pipeline.

  1. Check the boxes to add these columns to the last two successor nodes in your pipeline.
  1. Click the Preview button to confirm that your two columns have been added to the bottom of each successor node. Then Click Apply and Confirm to propagate your additions.

  1. By clicking on the DAYS TO SHIP column, you can view the concatenation and split lineage of this column.

  1. Return to the Browser tab and click the Run All button to refresh your pipeline.

Return to the Browser tab of the Build Interface.

  1. Click the icon under Run All to visually optimize the organization of your nodes.

  1. From the Run All dropdown, select Validate Create All and confirm there are no errors with the create statements in your pipeline. Upon confirmation, execute a Create All. Repeat these validation and execution steps with Run All.
  1. Switch to your Snowflake account and view the tables and data within the Snowflake platform to ensure everything is as intended. Please note that if you are using trial Snowflake and Coalesce accounts and are not using Git, you will find your build in PC_COALESCE_DB.PUBLIC (database.schema) as shown in the screenshot below.

If you are using an existing Snowflake account with your trial Coalesce account, or using non-trial Snowflake/Coalesce accounts, you will find your build in your LASTNAME.WORKDEV (database.schema) that you created as part of your Snowflake setup earlier in this lab.

Jobs Overview

Although we've built our data pipeline at this point and validated the build in our Snowflake account, we haven't yet created one or more Jobs to refresh it in an ongoing manner.

A Job is a subset of Nodes that is defined by a Selector query and run in isolation of a broader pipeline. You can run Jobs manually in the Coalesce UI in your development Workspace, or refresh them in an API call or with a CLI for deployed environments. Jobs are the only way to refresh a deployed environment, so it is essential that we create the needed Jobs to refresh our nodes before we commit and deploy our build.

  1. In the Browser tab of the Build interface, click the Jobs icon on the left hand side of the screen (under Sub Graphs). Then click the + icon next to the search bar to create a new Job. Double click your new Job to open it, and then click the pencil icon to name it Job1.

  1. Jobs can be edited by dragging and dropping nodes into the Browser interface regardless of whether it is viewed as a Graph, Node Grid, or Column Grid. By adding nodes to your Job, a Selector query will be created based on the search syntax that is generated or entered in the Include and Exclude fields.
  1. In the left sidebar, click on Nodes and scroll down to select your Fact node. Drag the Fact node to the right to add it to your Job. Note that the search syntax has been generated for you in the Include field.
  1. Jobs can be modified by adjusting the search syntax in the Include and Exclude fields. As an example, adding a + before or after the specified attribute(s) will add preceding or subsequent nodes to your Job. To learn more about Selector query syntax and available attributes, view our search syntax documentation.
  1. Add +{location: WORK name: FCT_LINEITEM_ORDERS_DIM_LOOKUP} to the Include field to add our full data mart to the Job, and then click Done to lock the fields.

Now that you've finished your build, it is time to commit your work to Git.

  1. Click the Git icon in the lower left hand corner above the Settings icon. This will bring up Coalesce's built-in Git interface. Review your Current Branch in the top left to ensure it is the branch you created earlier in this lab.

  1. Review the list of YAML files in the list of changed files. These represent the objects you've built within your data pipeline. Click on a file if you wish to to view its contents.

  1. In your Commit Message, enter "Initial build of data mart" and then click Commit and Push.

  1. The interface should now read "No changes to show"

  1. Click Branches to view your commit, then close out of the Git Interface.

Congratulations! You have finished the build of your data pipeline.

An Environment in Coalesce represents a non-development Snowflake environment where your data pipeline is deployed and refreshed (e.g. for Test or Production). Your Environments track the state of Coalesce objects and help you execute efficient deployment strategies. Once an Environment has been deployed, it can be refreshed via API call or using a command line interface (CLI) on an ad-hoc, scheduled, or triggered basis.

Before you can create a new Environment in Coalesce, you will need to create two new schemas in Snowflake for your Production and Testing environments.

  1. If you are using a trial Snowflake account without Git, you will be using the same PC_COALESCE_DB database used for development earlier in this lab and creating two new schemas, one for Test and one for Production. Login to your Snowflake account and execute the code below to switch roles and create the two new schemas:

USE ROLE PC_COALESCE_ROLE;

CREATE SCHEMA PC_COALESCE_DB.PROD;

CREATE SCHEMA PC_COALESCE_DB.TEST;

If you are using trial Snowflake/Coalesce accounts (with Git) or non-trial Snowflake/Coalesce accounts (with or without Git), you do not need to run the code above.

If you are using your existing Snowflake and Coalesce accounts, enter the following code to create a schema for Production environment:

CREATE SCHEMA LASTNAME.WORKPROD;

  1. Navigate back to the Browser tab of the Build Interface in the Coalesce UI. Click the Settings icon in the lower left hand corner and then click Environments to view your Testing and Production environments that were automatically generated in your account.

  1. Click the pencil icon on your Testing environment to open the details. Then click Storage Mappings to map WORK to your newly created TEST schema in Snowflake and click Save. You should receive a confirmation that your Environment was updated.

If you are using non-trial Snowflake and Coalesce accounts, use your LASTNAME database and your WORKTEST schema.

  1. Now click User Credentials. Ensure your credentials are entered and working by clicking the Test Connection button. You should receive a Testing Successful confirmation.

  1. Close out of your Testing environment details and click the pencil icon on your Production environment. Repeat the Storage Mappings configuration–mapping to your Production schema–and the connection testing that you completed in your Testing environment. Close out of your Production environment settings.
  1. Once you've made updates to your environments, click on the Git icon in the lower left side of your Build interface. You will need to commit your changes to Git so that you can deploy properly to your Test and Production environments. Enter a note in the field (such as "Env_Update" and then click the Commit and Push button to commit your changes.

Now that your Environments are configured, we'll now deploy, first to your Testing Environment and then optionally to your Production Environment.

In a Deploy operation, database objects for each node are created in the target database and schema, but they are not loaded with data. This is because the Deploy is responsible for the Responsible for structural changes to your data warehouse, or Data Definition Language (DDL), but not for the load and transformations on the actual data, or Data Manipulation Language (DML).

  1. Launch the Deploy interface by clicking the Deploy icon in the top center of your screen. The Deploy Interface is used to promote your pipeline to your Environments as well as view the status of both your Deploys and your Refreshes.

Click the Deploy button next to Testing. This will launch the built-in Deployment Wizard.

  1. In the first step of the Deployment Wizard, you will be prompted to select the Git branch and specific commit, known as a Package, that you want to deploy. Select the dev- branch you created earlier in this lab, and your Initial build of data mart commit and click Next.

  1. We are not making use of parameters in this pipeline or deployment, so click Next on the Parameters screen. However, if you would like to learn about Parameters in Coalesce, see the Parameters section of our documentation.

  1. On the Review Plan screen of the Deployment Wizard, expand the object types to see what will be Deployed. Once confirming everything is as intended, click Deploy.

  1. The Deployment Wizard will close automatically, and you will see your deployment now listed in the Activity Feed. Monitor the Deployment to confirm it was successful.

  1. Click on your deployment in the Activity Feed to open the details and review what was completed during the deployment. Note that there are several steps of different types (e.g. Validating Source Exists, Creating Stage Table, Creating Dimension Table, Creating Fact Table, etc).

  1. Log into your Snowflake account and confirm that all of the objects have been created in the TEST schema, but that they are empty. Remember, this is because while a Deploy executes the DDL to create the objects you've defined in your data pipeline, it doesn't complete the DML to load or refresh the them. This load / refresh of the objects will come in the Refresh exercise we will complete later in this lab.

Congratulations! You've successfully deployed your data pipeline to test.

It is now time to refresh your Testing Environment, meaning that you will complete the load and transformation of data for the data pipeline you've deployed in Test.

In Coalesce, refreshes are completed at the Job level, and can be run via the Command Line Interface (CLI), REST API, or triggered via a service like a DevOps platform. In this exercise, we will use the REST API to run the Job we created earlier in the lab. See the Refreshing an Environment section of our documentation for more details on refresh methods.

Before you are able to refresh an Environment, you will need to generate an Access Token that will need to be passed with your refresh calls.

  1. Navigate to the Deploy Interface of the Coalesce UI and click Generate Access Token. Copy the token, you will need to provide it in the API call you make to refresh your environment in the later steps of this lab.

Open your preferred API platform. If you do not have a preferred API platform, we recommend the built-in terminal on your machine.

  1. Navigate to the Coalesce API: Start Refresh / Run documentation. This documentation will help you execute a Job refresh by generating the code you need to execute for your API call.

  1. On the right hand side of the screen, paste your access token you generated in the previous step of this Lab into the Query field.

  1. Scroll down and expand the runDetails object section. Enter your environmentID into the first field. This ID can be found in the Deploy Interface of the Coalesce UI next to the name of your Testing Environment.

  1. Next, enter the jobID for Job1 in the runDetails object section. Your jobID can be found within the Browser tab of your Build Interface in the Coalesce UI, located next to the naming field for Job1.

  1. Navigate back to the Start Refresh / Run documentation and scroll down to open the userCredentials object section. Enter your username and password for your Snowflake account in the snowflakeUsername and snowflakePassword fields of the userCredentials object.

If you do not have your credentials, you can find them within the Environment Settings for your Testing environment in either the Build or the Deploy interface of the Coalesce UI.

  1. Copy the resulting code from the code generator in the documentation in your required language (if you are using the built in terminal on your machine this language is Shell), and execute that code on your preferred API platform to refresh your Job.

Navigate back to the Deploy Interface of the Coalesce UI after you have executed this code. You will see the Refresh now listed in the Activity Feed with its status.

  1. When complete, click on the Refresh to review the details of the steps it completed.
  1. Navigate back to your Snowflake account and confirm that the objects in the TEST schema now contain data.

Congratulations! You've successfully refreshed your data pipeline.

Throughout this lab, we have used the out-of-the-box objects and options that ship with any Coalesce account. However, Coalesce offers a number of ways to control and enhance the options in and/or behavior of your account. While no customization is required beyond the out-of-the-box offerings, these customization options introduce flexibility and scale for your data pipeline requirements without having to write extensive custom code.

As an example, Coalesce allows you can build your own User-Defined Nodes (UDNs) and reuse logical transformations without writing extensive custom code. UDNs are repeatable data patterns that standardize and govern how transformations are applied, enabling data teams to build pipelines more efficiently.

  1. To make your own node, click the Settings icon in the lower left hand side of the Build tab. Then select Node Types. To create a new node from scratch, click on the New Node Type button or create a copy of an existing node by selecting Duplicate next to it.

  1. By clicking the New Node Type button, you will see the Node Definition section, where you can specify the behavior, options, and fields that will be configured for your new node type. These specifications are available to the Create and Run Templates next to the Node Definition.

  1. We recommend that you dive deeper into UDNs after this session. While we won't go through hands-on exercises around customization in this entry-level Lab, you can explore the documentation sections for customization methods for each of the methods you can use to customize your data pipeline.

Congratulations on completing this entry-level lab exercise! You've mastered the basics of Coalesce and are now equipped to venture into our more advanced features. Be sure to reference this exercise if you ever need a refresher.

We encourage you to continue working with Coalesce by using it with your own data and use cases and by using some of the more advanced capabilities not covered in this lab.

What we've covered

Continue with your free trial by loading your own sample or production data and exploring more of Coalesce's capabilities with our documentation and resources.

Additional Resources