What is DataOps?

DataOps has emerged in recent years as a new approach improving the speed, efficiency, and quality of data analytics processes. It is a methodology that combines cross-functional teams, automated data workflows, and standardized data management practices to quickly and effectively process, analyze, and deliver data-driven insights.

In this guide, you will find best practices for applying the principles of DataOps within the context of Coalesce and how to best implement them by fully leveraging our Git integration for version control.

By saving (or committing) any code changes in Coalesce to Git, you will ensure that previous versions of your data warehouse are preserved and accessible. These changes are accessible through a dedicated Git repository, which is a collection of files that store the different versions of your data warehouse.

We strongly recommend setting up Git within your Coalesce instance to perform proper version control and avoid losing your work.

What is Git?

Git is a widely-used version control solution that is used to store, branch and merge developers' work. Git maintains a centralized code "database" (known as a Git repository) on a hosted provider such as GitHub, Bitbucket, GitLab and Azure DevOps Git. This centralized repository is copied / cloned to a Git client (such as Coalesce) and used to manage development operations.

How Does Git Work?

Git allows developers working on the default / live version of their data warehouse (the main branch) to isolate code changes by creating new branches of work. Branches are then merged together to incorporate new changes into the main branch. All deployments to non-development environments (e.g. Test / Production) are performed from the main branch.

Branching and Merging

Branching provides flexibility in team environments by separating development from deployment. Types of branches include:

Git Permissions

Git permissions and approvals can be set up directly on your Git platform (not through Coalesce). Any Git pipelines (also known as actions or runners) can be configured along with Coalesce's Command Line (CLI) utility to automate deployments and executions.

Git operations such as pull requests and merge conflict resolution can be done through Coalesce, or off of the platform via Git commands or other clients (e.g VS Code).

1. Create a Project

The first step to setting up version control is by creating a Project in Coalesce. Projects are logical groupings of data warehousing efforts in Coalesce, and a good way to organize your work by a particular initiative or team focus.

When you create a Project, you'll be asked for a Git repository URL. Git repository URLs are used by Git hosting services (e.g. GitHub) as a way of pointing to a Git repository. However, everyone still has a full copy of the repository on their own machine. Coalesce generates and manages metadata which is converted and executed as SQL code – it is this metadata that is stored in Git as YAML files.

After adding the Git repository URL, you'll be asked to add your Git credentials. Your credentials are similar to a username and password, and a way of controlling access to the Git repository.

2. Create a Workspace Within Your Project

Workspaces are dedicated development areas within Coalesce where you can build out your data warehouse and apply transformations to data.

When you create a Workspace, you'll be asked to select a Git branch, and create a new branch from it.

Branches are one of the most powerful features of Git, as they allow different people to work on the same items at the same time, and later combine (merge) their changes. This is particularly useful if you need to work on a hotfix for your data warehouse, or need to add new items to it that shouldn't be exposed to end users just yet.

With feature branches, teams can make changes to the main data warehouse while other people work on hotfixes or new features. Any changes made to the main data warehouse can be synced with the new features, and vice-versa.

3. Transform Data in Your Workspace

Once your Workspace has been created, you can start creating or modifying your data warehouse. Although your workspace is linked to Git, Coalesce doesn't automatically save your changes to Git - changes are, however, saved somewhere else, and you won't risk losing your work. However, it is a good idea to commit your changes when you're at a good stopping point (e.g. when creating a node or multiple nodes, mapping storage locations, building a subgraph, building a job). You should also consider committing if you need to work on another Workspace, or if you want to make your work visible to other members of your team.

4. Commit Your Changes to Git and Deploy to Non-Development Environment

A commit is a snapshot in time of your Workspace. You can use commits to go back to a previous version of a workspace, or to troubleshoot recent changes. Commits are also the mechanism Coalesce uses to deploy to non-development Environments.

When you trigger an environment deployment, you'll be asked to select a branch (a Workspace) and a commit (the snapshot in time of that Workspace). When you perform a commit, your work becomes visible to anyone else viewing the Workspace. Before a commit, only you can see those changes - so it is a good idea to commit often.

Anyone that has access to the Project your workspace is part of can also see those commits; they can use your commit as a starting point for a new workspace, and build on the work you've done.

Example of a high-level Git workflow in Coalesce

Starting a New Project

  1. Create a new Project with a unique name and description.

  1. Set up version control within your Project by creating and attaching a dedicated Git repository.
  2. Next, link your Git account to your new Project and repository. Your Git credentials can be found in your User Settings if you have an existing account, or you can add a new account altogether.

  1. Click on the question mark icon for instructions on how to add your credentials depending on your Git provider. Make sure to test your account before you finish creating the Project.
  2. To start building, create a new Workspace in your Project.

  1. Create a new Git branch associated with your Workspace. Every Workspace is attached to its own Git branch, meaning that you can have multiple branches of work happening in parallel across different Workspaces within your Project.

  1. Launch your new Workspace and complete your Build settings by connecting to a Snowflake account and adding storage locations.

  1. Switch to your Snowflake account (outside of Coalesce) and set up your Snowflake environment to include the following:
  1. Create corresponding Storage Locations in Coalesce. There should be one for each unique database schema you plan to use in Snowflake.

  1. Click the gear icon in the lower left hand corner of the interface to open your Build settings and map any Storage Locations to databases / schemas.

  1. Create at least one target Environment and also map the Storage Locations to the corresponding databases / schemas.

  1. Rename DEV branch to MAIN, fill in the configuration and Commit.

When you first set up your Organization in Coalesce, you'll notice that a single Workspace called DEV is created – rename this to "Main."

  1. Create feature branch(es) by duplicating the Main branch. Use a meaningful naming convention for your Workspaces and branches with descriptions. Tag colors can be used to provide additional context to the user based on the organization's preferences. For example, all feature workspaces could be BLUE, main Workspaces GREEN, testing Environments YELLOW, and production Environments RED.

  1. Commit your configuration to Git (see illustrative workflow below).

Your development strategy will vary depending on team size, geographical location, communication frequency, and your organization's overall culture. We recommend developing a baseline strategy that can be used as a starting point and adjusted based on your needs.

Step-by-Step: Creating a Feature Branch

  1. Create a feature branch by creating or duplicating a Workspace. Note that each branch is associated with a unique Workspace.

  1. Re-confirm user credentials within your Workspace and test your connectivity before proceeding.

Step-by-Step: Developing in Your Feature Branch

  1. Start developing your graph in the Workspace / feature branch that you just created.

  1. Commit your code using the Git modal. The list of yaml files will include changes made by any developer using the same Workspace.

When the Git modal is opened, a snapshot of the code for a particular Workspace is taken and is only refreshed by closing and re-opening the Git modal again. If changes were made in a different browser, they would not be reflected in the list.

  1. Make an Initial commit of data.yml (containing configuration), into the ‘Main' branch.
  1. Branch ‘Main' into a feature branch and begin development.

  1. Make frequent commits as development continues on the particular feature.

  1. When the feature is complete, merge the code back into ‘Main' and discard the feature Workspace.

  1. Merge back to Main Workspace by opening the target Workspace (associated with ‘Main'). Then, from within the Git modal, choose the required commit from the source feature branch and click on ‘Merge'.

Alternatively, if the latest commit is being merged, choose the ‘Merge Latest' button.

  1. Deploy code into a target Environment. Code can be deployed from a development Workspace into a target Environment, by using the GUI, the CLI tool or by using the API.
  2. By choosing a particular commit to deploy, a comparison between the previous deploy (also considering the state of the target Snowflake environment) and the later version of the code is made, before a ‘plan' of the changes is generated. The plan will contain all of the SQL steps that will be executed to deploy the changes that can be reviewed before the actual deployment takes place.
  3. Once deployed, the steps will be actioned and the status of the changes will be displayed in the GUI.

Hotfixes are typically released in response to urgent issues that cannot wait for the next scheduled software update or release. They are designed to quickly resolve a specific problem without requiring a full software update or installation.

Hotfixes are usually smaller and less comprehensive than regular software updates, and they are often focused on fixing a single issue or a small set of related issues. They are usually distributed to customers as a standalone patch or a small package that can be easily installed on top of the existing software installation.

We recommend that you follow these steps to resolve hotfixes:

  1. Determine the scope of the hotfix.

  1. Branch from Main to create a hotfix branch.

  1. Implement the code to resolve the hotfix.

  1. Deploy to QA for testing and if everything looks good, continue to deploy to Production.

  1. Merge main branch into the hotfix branch and verify everything is working properly.

  1. Merge hotfix branch into the main branch and continue the release to QA and then to Production as usual.

Merging branches allows for the new development and changes in two feature branches to be combined into one branch. If no common objects have been changed, then the merge is straightforward and can be managed in Coalesce.

If the same object has been changed on two different branches that are being merged, then a merge conflict is identified. The developer responsible for the merge needs to decide which version of the change should be integrated and deployed, and resolve the merge conflict.

The merge conflict should be resolved in the Git host portal or in Git client such as VS Code. Merges are only allowed when there are no uncommitted changes in the target branch. A warning will be displayed if there have been changes, which will either need to be discarded or committed before a merge attempt can take place.

If there have been changes made by another developer while the Git modal is open, merging could reverse their changes without a warning. Often, merge types are dictated by the complexity of changes since the last merge.

These processes are applicable to any target Environments that have been created. Commits in the Git modal will be tagged with the Environments that reflect that particular version of code.

Step-by-Step: Resolving Merge Conflicts (Example Exercise)

In this example we will purposefully create a merge conflict and demonstrate how to resolve it in Github:

  1. Begin in your primary Workspace and create a new Stage node.

  1. Commit all changes in Git.

  1. Now create a new Workspace and branch from the primary Workspace where changes were just committed.

  1. Make a change to your node's description and commit the change to Git.

  1. Switch to the primary Workspace and add something different to the description field of your node. Commit the change. This step will create a merge conflict to resolve off platform.

  1. Now in the primary branch, try to merge the copied branch in. It should result in a merge conflict.

  1. Instead of using the Coalesce merge conflict editor, close the pane and go to Github.

  1. Once in Github you will see a prompt to create a pull request. Create the pull request.

  1. After creating the pull request, a message will appear saying that the branches cannot be automatically merged due to conflicts. Click the "Resolve conflicts" button.
  1. In the conflict editor, markers will indicate where the unresolvable differences were between the two files.

  1. Delete the code you don't want to keep, as well as all of the Git generated conflict markers. After you do this, click the "Mark as Resolved" button in the upper right and then click "Commit Merge."

  1. After you commit the merge, click the green button on the next screen that says "Merge Pull Request."

  1. From here you will be able to see your changes reflected in Coalesce. To do this, go to the Git window and click "Resync Branch."

  1. After the branch is resynced, you can check in your node and verify that the change you selected in Git exists.

Congrats! You just resolved a merge conflict!

Follow these guidelines from your Coalesce team to ensure a smooth and successful development experience:

  1. Never develop directly in your main development Workspace.
  2. Avoid making breaking commits to your main Git branch.
  3. Never merge into a Workspace which has uncommitted changes.
  4. Never change the Git repo settings when having uncommitted changes in ANY Workspace.
  5. If you cannot commit your changes in a Workspace to Git, contact Coalesce support for help. Potentially overwriting your live metadata with a previous commit will cause you to lose all recent uncommitted development.
  6. Avoid all of the above by committing frequently!
  7. All Environments should be mapped to different schemas, except for source nodes in some cases.
  8. All development Workspaces should be ideally mapped to different schemas (except perhaps source nodes). You can't have a Workspace AND Environment on the same set of Snowflake schemas, unless the Workspace is used as read-only (which cannot be enforced).

By following this guide and using Git consistently in Coalesce, you will set yourself and your team up for success when it comes to version-controlled developments and deployments.

We welcome your feedback and suggestions on improving this guide. If you have any questions or need assistance, please reach out to us at support@coalesce.io.

Happy transforming!