This Hands-On Lab exercise is designed to help you learn how to build and manage Snowflake Cortex LLM nodes within Coalesce. In this Snowflake award winning exercise, you'll explore the Coalesce interface, learn how to easily transform and model your data with our core capabilities, and use the Cortex LLM functions node that Coalesce provides users.
By completing the steps we've outlined in this guide, you'll have mastered the basics of Coalesce and can venture into our more advanced features.
Coalesce is the first cloud-native, visual data transformation platform built for Snowflake. Coalesce enables data teams to develop and manage data pipelines in a sustainable way at enterprise scale and collaboratively transform data without the traditional headaches of manual, code-only approaches.
With Coalesce, you can:
Coalesce's unique architecture is built on the concept of column-aware metadata, meaning that the platform collects, manages, and uses column- and table-level information to help users design and deploy data warehouses more effectively. This architectural difference gives data teams the best that legacy ETL and code-first solutions have to offer in terms of flexibility, scalability and efficiency.
Data teams can define data warehouses with column-level understanding, standardize transformations with data patterns (templates) and model data at the column level.
Coalesce also uses column metadata to track past, current, and desired deployment states of data warehouses over time. This provides unparalleled visibility and control of change management workflows, allowing data teams to build and review plans before deploying changes to data warehouses.
Coalesce currently only supports Snowflake as its target database, As you will be using a trial Coalesce account created via Partner Connect, your basic database settings will be configured automatically and you can instantly build code.
A Coalesce organization is a single instance of the UI, set up specifically for a single prospect or customer. It is set up by a Coalesce administrator and is accessed via a username and password. By default, an organization will contain a single Project and a single user with administrative rights to create further users.
Projects provide a way of completely segregating elements of a build, including the source and target locations of data, the individual pipelines and ultimately the git repository where the code is committed. Therefore teams of users can work completely independently from other teams who are working in a different Coalesce Project.
Each Project requires access to a git repository and Snowflake account to be fully functional. A Project will default to containing a single Workspace, but will ultimately contain several when code is branched.
A Coalesce Workspace is an area where data pipelines are developed that point to a single git branch and a development set of Snowflake schemas. One or more users can access a single Workspace. Typically there are several Workspaces within a Project, each with a specific purpose (such as building different features). Workspaces can be duplicated (branched) or merged together.
A Coalesce Environment is a target area where code and job definitions are deployed to. Examples of an environment would include QA, PreProd and Production.
It isn't possible to directly develop code in an Environment, only deploy to there from a particular Workspace (branch). Job definitions in environments can only be run via the CLI or API (not the UI). Environments are shared across an entire project, therefore the definitions are accessible from all workspaces. Environments should always point to different target schemas (and ideally different databases), than any Workspaces.
You will be stepping into the shoes of the lead Data & Analytics manager for the fictional Snowflake Ski Store brand. You are responsible for building and managing data pipelines that deliver insights to the rest of the company in a timely and actionable manner. Management has asked that they receive insight into how customers feel about product defects, and which customers are the most unhappy when defects are received. The data you will be analyzing is from the Snowflake Ski Store call center, where customers call in with questions or complaints. Each conversation is recorded and stored as a call transcript within a table of your database. This is what you will use to uncover the insights that management is seeking.
Management is hoping you can build and operationalize a pipeline that uses Large Language Model functions that allow them to have consistent insight into customer sentiment and around the brand and product.
To complete this lab, please create free trial accounts with Snowflake and Coalesce by following the steps below. You have the option of setting up Git-based version control for your lab, but this is not required to perform the following exercises. Please note that none of your work will be committed to a repository unless you set Git up before developing.
We recommend using Google Chrome as your browser for the best experience.
Once you are logged into your Snowflake account, sign up for a free Coalesce trial account using Snowflake Partner Connect. Check your Snowflake account profile to make sure that it contains your fist and last name.
Once you are logged into your Snowflake account, sign up for a free Coalesce trial account using Snowflake Partner Connect. Check your Snowflake account profile to make sure that it contains your fist and last name.
Congratulations! You've successfully created your Coalesce trial account.
CREATE or REPLACE schema pc_coalesce_db.calls;
CREATE or REPLACE file format csvformat
SKIP_HEADER = 1
FIELD_OPTIONALLY_ENCLOSED_BY = '"'
type = 'CSV';
CREATE or REPLACE stage pc_coalesce_db.calls.call_transcripts_data_stage
file_format = csvformat
url = 's3://sfquickstarts/misc/call_transcripts/';
CREATE or REPLACE table pc_coalesce_db.calls.CALL_TRANSCRIPTS (
date_created date,
language varchar(60),
country varchar(60),
product varchar(60),
category varchar(60),
damage_type varchar(90),
transcript varchar
);
COPY into pc_coalesce_db.calls.CALL_TRANSCRIPTS
from @pc_coalesce_db.calls.call_transcripts_data_stage;
You will need to add the ML Forecast node into your Coalesce workspace in order to complete this lab.
Let's get familiar with Coalesce by walking through the basic components of the user interface.
Once you've activated your Coalesce trial account and logged in, you will land in your Projects dashboard. Projects are a useful way of organizing your development work by a specific purpose or team goal, similar to how folders help you organize documents in Google Drive.
Your trial account includes a default Project to help you get started.
Nodes are visual representations of objects in Snowflake that can be arranged to create a Directed Acyclic Graph (DAG or Graph). A DAG is a conceptual representation of the nodes within your data pipeline and their relationships to and dependencies on each other.
Before you can begin transforming your data, you will need to configure a storage location. Storage locations represent a logical destination in Snowflake for your database objects such as views and tables.
CALLS
for your call transcript data.CALL
location to the PC_COALESCE_DB
database and the CALLS
schema. Let's start to build the foundation of your LLM data pipeline by creating a Graph (DAG) and adding data in the form of Source nodes.
CALL_TRANSCRIPTS
table from the CALLS
source storage location. Then click the Add 1 Sources button. Now that you've added your Source node, let's prepare the data by adding business logic with Stage nodes. Stage nodes represent staging tables within Snowflake where transformations can be previewed and performed. Let's start by adding a standardized "stage layer" for the data source.
CALL_TRANSCRIPT
source node then right click and select Add Node > Stage from the drop down menu. This will create a new stage node and Coalesce will automatically open the mapping grid of the node for you to begin transforming your data. STG_CALL_TRANSCRIPTS_GERMAN
UPPER({{SRC}})
To do this, navigate to the Join tab, where you will see the node dependency listed within the Ref function. Within the Join tab, we can write any Snowflake supported SQL. In this use case, we will add a filter to limit our dataset to only records with German data, using the following code snippet:
WHERE "LANGUAGE" =
'German'
Navigate back to the Build Interface and right click on the STG_CALL_TRANSCRIPTS_GERMAN
node and select duplicate node.
STG_CALL_TRANSCRIPTS_FRENCH
. WHERE "LANGUAGE" IN ('French', 'English')
STG_CALL_TRANSCRIPTS_GERMAN
carry over into this node, so we don't need to rewrite any of our transformations. Let's change the view back to Graph and select Create All and then Run AllNow that we have prepared the call transcript data by creating nodes for each language, we can now process the language of the transcripts and translate them into a singular language. In this case, English.
STG_CALL_TRANSCRIPTS_GERMAN
node and hold the Shift key and select the STG_CALL_TRANSCRIPTS_FRENCH
node. Right click on either node and navigate to Add Node. You should see the Cortex package that you installed from the Coalesce Marketplace. By hovering over the Cortex package, you should see the available nodes. Select Cortex Functions. This will add the Cortex Function node to each STG
node. LLM_CALL_TRANSCRIPTS_GERMAN
node to open it. TRANSLATE
. TRANSLATE
toggle on, In the column name selector dropdown, you need to select the column to translate. In this case, the TRANSCRIPT
column. 'de', 'en'
LLM_CALL_TRANSCRIPT_GERMAN
node is running, you can configure the LLM_CALL_TRANSCRIPT_FRENCH
. Double click on the LLM_CALL_TRANSCRIPT_FRENCH
node to open it. TRANSLATE
. TRANSCRIPT
column through as the column you want to translate. 'fr', 'en'
Any values in the transcript field which do not match the language being translated from will be ignored. In this case, there are both French and English language values in the TRANSCRIPT field. Because the English values are not in French, they will automatically pass through as their original values. Since those values are already in English, they don't require any additional processing.
You have now processed the transcript data by translating the German and French transcripts into English. However, this translated data exists in two different tables, and in order to build an analysis on all of our transcript data, we need to unify the two tables together into one.
LLM_CALL_TRANSCRIPTS_GERMAN
node and Add Node > Stage. STG_CALL_TRANSCRIPTS_ALL
.Multi Source
. UNION ALL
. New Source
. LLM_CALL_TRANSCRIPTS_FRENCH
node into the multi source mapping area of the node. This will automatically map the columns to the original node i.e. LLM_CALL_TRANSCRIPTS_GERMAN
. Now that we have all of our translated transcript data in the same table, we can now begin our analysis and extract insights from the data. For the sake of our use case, we want to perform a sentiment analysis on the Transcript, to understand how each of our customers felt during their interaction with our company.
Additionally, our call center reps are trained to ask for the customer's name when someone calls into the call center. Since we have this information, we want to extract the customer name from the transcript so we can associate the sentiment score with our customer to better understand their experience with our company.
SENTIMENT
and EXTRACT ANSWER
. TRANSCRIPT
column and select Duplicate Column. TRANSCRIPT
column name and rename the column to TRANSCRIPT_SENTIMENT
. TRANSCRIPT
column name and rename the column to TRANSCRIPT_CUSTOMER
. TRANSCRIPT_CUSTOMER
column. Change the data type to ARRAY
. This is necessary because the output of the EXTRACT ANSWER
function is an array that contains JSON values from the extraction. SENTIMENT ANALYSIS
, select the TRANSCRIPT_SENTIMENT
column as the column name. EXTRACT ANSWER
function, select the TRANSCRIPT_CUSTOMER
column as the column name. EXTRACT ANSWER
function accepts a plain text question as a parameter to use to extract the values from the text being processed. In this case, we'll ask the question "Who is the customer?
"You have now used Cortex LLM Functions to process all of your text data without writing any code to configure the cortex functions, which are now ready for analysis. Let's perform some final transformations to expose for your analysis.
Right click on the TRANSCRIPT_CUSTOMER column and hover over Derive Mappings and select From JSON.
CUSTOMER
. CONFIDENCE_SCORE
. Now that you have created a view in Snowflake that contains all of the data you needed for your analysis, you can query the view we just created to learn more about the data.
select customer, transcript_sentiment
from pc_coalesce_db.public.v_call_transcripts_all
where confidence_score > .8
This query is allowing us to view all of the customers and their associated sentiment, where the likelihood of the customer name being correct is 80% or higher.
Additionally, you can uncheck the box next to Limit number of bars
to see all of your results.
select damage_type, avg(transcript_sentiment)
from pc_coalesce_db.public.v_call_transcripts_all
where transcript_sentiment < 0
group by damage_type
This query allows you to see the aggregated average sentiment towards damage types, where the sentiment is below zero. This would allow decision makers to understand which damages are the most frustrating for customers, and take action on how to improve processes to avoid these damages.
Congratulations on completing your lab! You've mastered the basics of building and managing Snowflake Cortex LLM functions in Coalesce and can now continue to build on what you learned. Be sure to reference this exercise if you ever need a refresher.
We encourage you to continue working with Coalesce by using it with your own data and use cases and by using some of the more advanced capabilities not covered in this lab.
Continue with your free trial by loading your own sample or production data and exploring more of Coalesce's capabilities with our documentation and resources. For more technical guides on advanced Coalesce and Snowflake features, check out Snowflake Quickstart guides covering Dynamic Tables and Cortex ML functions.
Reach out to our sales team at coalesce.io or by emailing sales@coalesce.io to learn more!