Governance
My view on self-BI governance

My view on self-BI governance

In my previous post, I pointed out a lot of ways things could go wrong and the necessity to control what users of a service can do because if you don’t, be sure they will abuse every right they have. Not all of them, but it usually only takes a few. And the worst part of it is they are doing it for good reasons!

Data Access

Lets first consider data access. Where are your users getting data from? Most of the time, from databases or referentials they have built. And the most common answer we give them as IT is: “No, you can’t access databases, you’re a user! You’re going to mess up the app.” and “Oh… Excel… oh god no. We’re not ingesting that, you’re going to mess it up and its going to break the ETL package”. And are we wrong? Hell no, we’re IT. But do we offer another solution? Hell no, we’re IT. We need to fix that second bit.

Your users need access to databases? Give it to them! Not to THE database, but replicate it. Aggregate the data in a Data Warehouse. There are tons of solutions but if you don’t implement one, they’ll find a way. Most likely they’ll have somebody, or a function in the app itself, provide an extract. Data isn’t going to be more secure. They’re going to be frustrated that they have to repeat that process daily and that sometimes its going to crash. Same result, just worst service. And if you control the source, you can at least log who has access and recover some control.

Let’s go even further and imagine you spent a bit of time with them. Not only will you grant them access to your system, but you could give them performance. And even if they are analysts, they’re probably not data engineers. Indexes, locks, query plans. Do they know about all these things? If you want to make your users happy, spend some time aggregating and preparing the data for them. They’ll save hours of work and will be grateful for it. If you’re a good salesman you might even get them to throw budget your way to do more of that performance magic you do under the hood.

Now, back to those excel files. Why are they there? Usually because some data isn’t in systems. Mappings, objectives, or simply fields that were never implemented in a core system. How can we expect them to throw that away? Or crossing it in their PowerBI report with database data, forcing them into import mode and the performance hell it will generate for our servers?

On the other hand, we could again help them with a range of solutions.
The easiest? probably an ETL package transforming their file straight to a table. Bulkload, no transformation. Secure it by sending them a mail if it fails and they’ll fix their file for you to load the next time it runs.
The best? build a small web-app. A simple table they can edit online. Its a bit more costly, you need to deploy stuff but if you have a dev team that owes you favors, they can probably work something for you in a day. And it can be re-used for all cases.
The most exotics? I used to like putting an Access front-end with a linked table. Currently, I’m working on deploying SQL Server MDM to allow them to edit their “master” mappings and objectives with the excel add-in. I’ll share how that turns out in a couple of months.

Users just want access to data. And they’ll find a way to get it. The best way to help them and to make sure they don’t kill your backend is just to help them. A simple replication can go a long way. And by replication I mean that a daily snapshot refresh is probably enough when you compare it to daily extracts they are currently doing.

Data Ownserhip

If a person requesting me to work on some data can’t tell me who the owner is, I’m probably not going to touch it. Been there, done that. I’m not going to be responsible for business-owned data and touching it is the best way to become the owner. “Please come back when you will have identified or nominated an owner for this perimeter”. Love. Kisses.

Data Catalog

Your users have access to your data. Great. Now, what about a bit of documentation? I don’t mean a description of every single field but maybe just explicit column tables, explicit columns names. Admins and engineers coming after you will also thank you for those. Adding a description into extended properties can also be nice when describing complex things.
If you do calculations, especially SSAS measures, simply adding the information in the description will serve as documentation but it will also show up when your usersmouse-over the fields in PowerBI. “Hey, how is this KPI really calculated? Can I trust this measure?” Well, now they know… let it go, don’t hold back anymore.

PowerBI

Now this is where the real fun begins. This is where I have gotten it wrong for years. And I’m still probably getting it wrong but I want to believe I’m getting better and my opinion is now decent enough to be shared.

So, you have this great tool you’re paying very expensively. A salesperson told your business it would solve all their issues. What could go wrong?

FYI, I’m on Power BI Report Server because… reasons. Good ones. But no Cloud for now, no PowerBI.com. But its ok, Governance to the rescue.

Who gets access to PowerBI Desktop?

Everybody. Yes. It’s the best way to NOT have to deal with them all wanting to publish stuff on the server. And it’s free, so why care? Remember, when you give it to them, they’re not going on the internet to download it themselves. They’ll be on the same version as your server. You can give them the training and the rules at the same time. You’re the good guy giving them a new tool!

Who gets to publish on the portal?

Nobody! Don’t we wish?
Actually, I started to think it should be everybody. Then I went full 180 like my manager suggested and it was “just my team”. It made sense. But we can’t keep track of thousands of users with just the handful we are. So how do we do it?

PowerBI is a great tool, but it comes in two flavors.
One I love: Direct Query.
One I hate: Import mode.

Direct Query is great. But its limited. Whenever possible, users should use it. Its great as long as you’re connecting just to databases* but isn’t that most of the time the case? Remember those Excel files we talked about earlier? Putting them in a database just got way sexier didn’t it? Now they can do Direct Query. It offloads the report server, its usually faster if its done right. The only downside if the 1 million row limit, but if the queries are properly done and you only get the aggregates back, you should never need to hit it.
Direct Query also allows for Kerberos delegation, meaning you don’t have to give users service accounts or have them use their personal account to access the data when they publish (we’ll discuss that publishing part in a minute). Configure a few SPNs, delegate your PowerBI service account and there you have it: Magic!
*Web services, Active Directory and a few other useful sources also work if my memory serves me right. Just no flat files.

Import mode is nice because it brings awesome features but do those reports really need to be published? Aggregating logs in a folder, does that need to be on the portal or just analyzed on the desktop? Let’s get those out of the way. Remains the few cases where people HAVE to use import mode and HAVE to publish the reports. Well, how do we work with that? How can we make sure we aren’t crashing the servers?

In order to “limit” the impact of import mode, there are tons of solutions. My awesome colleagues have come up with ideas like moving around schedule times, limiting the number of schedule per report per day, removing the schedule of reports nobody looks at, etc. And they have tool for it. But… well… it only takes one report to crash the server when its really badly designed. And you can put up more servers, it just takes one report per server. And we’ll get there eventually. So…? Governance to the rescue is my bet.

I’m convinced self-BI and corporate publication can’t live together. One needs to be fail-proof. The other needs to be free. Setup two instances. Instead of having a huge farm, with boosted machines, just get two, half the size.
On one side, corporate BI. With reports published by IT, tested, with a fixed and controlled schedule or Direct Query. This is what management will look at, this is THE TRUTH.
On the other side, self-BI. Users car publish the stuff they really need to share with everyone. We still need to control it, make it work. But what if it goes down or is slow for a little while? What if we need to be aggressive with some reports? Well, they’re not company critical. We take the time to explain to the business how much it costs to give them more freedom. But we don’t jeopardize Corporate BI and in the end, its all we can do. But it will go a long way. And I have to admit it, we will just probably tell the angry users to take it up with their management who will play their role of reminding them corporate spending priorities. Because in the end, are their reports worth the cost of boosting that PowerBI instance any further?

Don’t forget as well to remind users that they can pbix files. It’s not because its not on the portal that you can’t give it to others. Put it on a network drive or on a Sharepoint. If its Direct Query you can probably even send it by mail because it should be small enough.

Things that come to mind but that I won’t discuss (for now?)

Real time: If you want it, build an app. This is BI. Its Day +1, not T+1sec.

Excel extracts from PowerBI: just query the source. Extracting data from a report based on a database is just not the way to go. Go to the database, I gave you access to it.

Conclusion

In the end, again, it comes to how you want to govern your BI. You can give full access to everything to your users and pay the consequences. You can give them nothing but remember they’re just as smart as you, maybe even more, and they’ll find ways around or even worst, above you, to get what they want.

I hope that what I wrote here is just what everybody knows, and does. But its still not the case in my company, nor in most of the ones I know people who work at. So maybe I’m wrong, or maybe we all need to find the courage to try and do the right thing to bring a better service to our users. I hope its a bit of both, but more of the later.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: