Using GitHub with Microsoft SSIS
There was a point in time when it was common practice to manually manage and track versions of software development code across developers. It was meticulous, cumbersome, prone to error and frankly, for any developer, an unpleasant experience. Something had to be done.
Using Git with VS Code Integrated Git support is one of the great features of Visual Studio Code. Git allows you to track your work, move back and forth between versions, and work on different branches at the same time. Visual Studio Code supports Git by default, which means you don't have to install any packages or extensions to use it. Visual Studio 2013 Update 1 through Visual Studio 2019 version 16.7 have a git client built into Team Explorer. To locate the feature, open a project that’s controlled by Git (or just git init an existing project), and select View-Team Explorer from the menu. You’ll see the 'Connect' view, which looks a bit like this.
The speed at which businesses evolve introduces significant challenges with releasing and managing code at an equivalent velocity.
The Solution
While not perfect, many of these issues are overcome through version control systems. Most VCs enable the ability to rollback to previous changes, merge new features of a product, centralizing committed source code as well as invites the opportunity for continuous deployment.
Git is a popular option in this space due to the combination of its involvement with open source projects, branch-merge competency and distributed code control. Can the same benefits be applied to the data domain? Particularly with extract, transfer and load (ETL) routines, that are typically written in codified routines and undergo rapid changes to adapt to business requirements? The answer is YES!
ETL Version Control
An active analytics pipeline sees ETL development code undergo quite rapid changes throughout its lifecycle. This is analogous to the crown jewels of software development.
Recently, multiple ETL type vendors have supported this fact by releasing source control plugins to external tools or developing inhouse control systems. Tools such as Talend, DataStage and SSIS support this notion to varying degrees.
Recently, we engaged on a client project using the Microsoft Stack and leveraged SSIS with Github for source control. By way of tutorial, the next few sections will provide you with steps on how to get your SSIS environment setup with Github to enable source control on the ETL portion of any data project. These instructions will work for SSDT 2010 or later.
Installing Git Source Control Provider using Visual Studio
Step one: Select Tools | Extension Manager
Step two: Run Visual Studio
![Using git with visual studio 2019 Using git with visual studio 2019](/uploads/1/0/5/9/105990885/204830767.png)
Step three: Go to Tools | Extension Manager; search the online gallery for “Git Source Control Provider” and install
Step four: Apple app store catalina. Download Addin. Place it in the Add-ins folder under the Visual Studio user documents folder:
C:UsersxxxxxxxDocumentsVisual Studio 2010
Step five: Restart Visual Studio
Configuring the Git Extensions
Step one: Download the Git Extension and Open it. It will be installed under C:Program Files (x86)GitExtensions unless otherwise specified
Visual Studio Git Tutorial
Step two: Go to Tools -> Settings and configure the checklist items as needed
Step three: Enter your name and email tied to the Github repository that will be leveraged for source control of the data project
Once configured you can Clone the Repository
Step one: In Git Extension, click on Clone repository. Fill in the fields as shown below:
A Github window will open asking a user to log in (enter GitHub information to your repository).
Step two: After logging in, click the “Clone” button.
After the clone is successful, the project can be opened by Visual Studio by navigating to the locally cloned directory and double-clicking on the Visual Studio solution file.
Everything will now be set up to start source controlling your ETL into Git. Going forward you can use this menu drop down in Visual Studio to access different commands with git through the GUI:
We will refer to these in the next steps. There is also the option of issuing Git CMD instructions to perform the same actions, but we won’t be covering those in this tutorial. As for the Github environment, it is configured as having a Master branch, as our development branch and demo-stats is a feature branch which is then merged back into Master for commits.
![Using git with visual studio code Using git with visual studio code](/uploads/1/0/5/9/105990885/228447433.png)
Example of Pulling from Master
(Master = dev environment in this case)
Step One: Open local copy of solution file called *.sln
Step Two: Once open in SSDT go to GitExt > Pull
Step Three: Set Remote branch to the GitHub server version to merge changes down into the local copy
Example of merging development into Master
(local development environment in this case)
Step One: Check out local branch: For example, Master
Step Two: Merge branch: dev/demo-stats (local) to Master
One Notable Drawback and Solution
Microsoft is still working out the kinks in merge conflicts with SSIS. There is one file in particular, related to the project metadata that requires manual merge resolving from time to time. This drawback applies to the merging of branches that was covered above.
If two developers are working on developing packages in SSDT in the same ETL project, merge conflicts will occur. They will reach a merge conflict for an overlapping file that is always present with SSIS development. The file affected is: *.dtproj . Therefore, a merge conflict on *.dtproj will have to be resolved manually.
If you’re merging code back you’ll need to ensure the conflict includes the package name of any additional package that has been created and merged into Master.
An example is a *.dtproj file package list that needs Remote desktop windows 10 for mac. to include your package name.
Here is the example project metadata file with the quick solution:
For every package you’ve added to this list, you will also need to include it as an entry in the package metadata. Example below:
Conclusion
Setting up a proper source control in SSIS is well worth the time from a development code manageability perspective. Even with the multi-developer version controlling issues presented above, it provides a very quick and convenient way to roll back recent changes in ETL process. A benefit that cannot be understated when UAT or Production data issues occur that were unforeseen from a previous ETL promotion.
What’s been your experience (or lack thereof) with using version control on SSIS or other ETL/data projects? Send me an email or connect with me on LinkedIn to continue the conversation!
We Are Hiring
Indellient is a Canadian Software Development Company that specializes in Data Analytics, Managed IT Solutions, Cloud Application Development, DevOps Services, and Document Process Automation. We are currently hiring for multiple positions. Learn more and apply on our Careers page!
Check out our Best Practices Guide: Migrating SVN to Git
Git And Visual Studio 2017
Sign up below to instantly receive our free Best Practices EBook on Migrating from SVN to Git.