BiOps
BI-Ops – 2/5 – Source Control (Git)

BI-Ops – 2/5 – Source Control (Git)

BI-Ops – 1/5 – What we are trying to do
BI-Ops – 2/5 – Source Control (Git)
BI-Ops – 3/5 – Build (CI)
BI-Ops – 4/5 – Deploy DB, SSIS and SSAS (CD)
BI-Ops – 5/5 – Deploy PowerBI (CD)

Everybody in IT uses source control. There is no more debating why. We’ve all struggled without it for long enough.

For us, it has become the root of our work. If it isn’t in source control, it doesn’t exist. But how do we go about it?

Git

We ran on Team Foundation Server and today run with Azure DevOps Server. And we used to run on TFVC for version control. It was nice, everything was central. But as the years went by, a lot of people started using Git around us. It sounded cool, but we felt it was more for the open source world than for us, the BI team of a private company.

But over time, we realised a few things:

  • New people know Git
  • TFVC seems to be abandoned by Microsoft to the benefit of Git
  • Its nice to be able to commit and branch locally before messing with the central repo

But how do you use it?

So we started to wonder how we would use git. We had a fresh start. We were going to migrate our data and it was the perfect time to get it right. So we looked online, read blogs, looked at Github projects. And in the end, things were pretty easy.

Repos

We create one repo for each major project we have:

  • One repo for each Data Warehouse we manage. Basically that is one per division in our company as their data very rarely intersects.
  • One repo for referentials (stuff all Data Warehouses join on)
    • Dates: You always need date dimensions, day off calendars per country, etc.
    • Currencies: My favorite thing in BI after dates.
    • Users: We put everything regarding company users in one DB to easily manage their information. Especially since RGPD is a thing, it helps us control where the personal information might be and secure it.
  • One for DB Tools. Those are the tools we use to monitor and maintain our environments. I’ll talk about those some other time.
  • One for PowerBI reports that are not part of big projects. Reports we do for users that aren’t proficient enough to do them on their own and that we maintain for them. I could name this one “you-owe-us projects”.
  • One for “other projects”. Small things like “can you please archive data from this machine in the factory?” but that don’t fit in our Data Warehouses or “Can you do KPIs on this IT administration application please?”. Not my favorite repo.
  • One final repo for stuff we cancel or decommission. Just to keep the other ones clean.

Branches

For each repo, we try to have 3 default branches to keep things simples

  • Develop: this is the classic branch that everybody uses to develop. The one for which people bring pastries when it stops compiling after they commit to it.
  • Release: this is a branche we have to merge into when we are ready to go to UAT and then production. We merge dev into release and run the build and release pipelines.
  • Master (or main, or trunk. We’ll wait for the community to settle on a new default to rename it). We merge release into this branch when the business approves UAT and deploying it in production worked.

The philosophy we tried to have is the following:

  • If we need to redeploy production on a new server for whatever reason, we just need to deploy the master branch.
  • If we need to fix something during UAT, work from the release branch and then merge into dev once it is done.
  • If you need to hotfix production, create a new hotfix branch off master, and then merge it into dev (and into release if we have UAT going)
  • Work off dev, and if you have something big just create a custom branch for your feature.

Work tracking

As a manager, I want to be able to tell my clients what is in each release we ship. So we use branch policies to forbid the completion of pull requests into Release (and then into Master) if no work items from our Azure Boards are not linked to the commits being merged.

It allows us to track which requirement are in a release branch. It also allows us to go back and see which merge into master contributes to which feature (or report) and where to look at faulty code when things go wrong. And finally, it brings a feeling of achievement to the developer when doing a pull request is not only pushing code but also delivering a feature to a client and crossing work off the backlog.

BI Project Content

Now that we’ve talked about how we use Git… in what I feel is a very “normal” way even if we apply it to BI, here is how we organise our projects.

All the backend goes into one Visual Studio solution per Data Warehouse:

  • Databases: We have one project for each database in the solution. Usually we have three:
    • Staging: Where we bring in the raw data from source systems
    • Business: this is the core Data Warehouse DB where we hold our data. Its where we store our magic.
    • Views: This DB holds no data but it allows us to shape the data from the model DB into different virtual Datamarts for the business to use in self-BI. It also enables us to cross-reference data from other Data Warehouses if necessary, especially the Referential one.
  • One SSIS project
  • One SSAS project
  • Sometimes one SSRS project when for some reason something wasn’t possible in Power BI or people really, really wanted a scheduled Excel extract.
Here is an example on our C2S Data Warehouse solution having our 3 standard DBs, each with multiple schemas inside for business areas, 1 SSAS and 1 SSIS project.

We don’t include PowerBI reports in the solution as that is not possible, so we just put them in a folder in the repository alongside the rest of the code.

We put everything in the repo like this to have one whole version packaged during our build phase and then shipped into UAT and production. We don’t want to have to chase after which PowerBI report version corresponds to our SSAS cube version if we ever need to rollback or redeploy.

Room for improvement?

There always is. We feel Git is the part of our process we will iterate on the most.

We don’t use many branch policies except for disabling commit directly into Master. We know we could do more there.

Merge or fast forward? For now we merge in release and master. I don’t mind what the team does in dev. But we still need to discuss this.

We should create proper Readme.md files for each repo and each project to make sure people know what we do in each and be able to take over in 3, 5 or 10 years if they ever have to.

We should use branches and tags more.

We should cross-review PRs and not just have the one submitting them approving them as well.

And I’m sure there are things I haven’t though of because I’ve never read about them or simply haven’t hit the next wall next. But for sure, I will be writing more on this topic in the future as the organisation of the repos is an important part of how the team works and really drives our organisation and our daily rythm.

Oh yeah, and we make sure our Git repo is backed up 😎

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: