13 things that scare me when working in BI:
Happy Halloween! While I personally have the fear tolerance of a shy toddler, I understand that many people out there (including a literal toddler that I’m very fond of) absolutely *love* getting spooked.
For everyone that loves a good fright, here’s a round-up of 13 common BI mistakes that make me scream in terror whenever I’m taking over and/or reworking a BI tool for a client.
Super important side note: this blog post is for educational and entertainment purposes only, not to guilt or embarrass anyone. If you see something on this list that you do yourself, you’re not alone – I’ve probably made every single one of these mistakes at some point. Please know that we’re all learning and growing here in the BI space together.
1. Using native queries to get your source data in Power Query
Native queries are what happens when you paste a SQL query that you developed in SSMS (SQL Server Management Studio) into that optional “SQL Statement” box when setting up a SQL data source. Power BI will let you do this; it’ll just give you vaguely ominous warnings about it from time to time to express its concern.
Why it’s scary: Other than the cryptic warnings from Microsoft? A few reasons:
First, native queries break query folding, which is basically the thing that makes Power Query an efficient(ish) ETL tool. Can you repair the query folding with some extra know-how and effort? Actually, yes—GuyInACube has a handy video on this that will walk you through it. However, if you and/or your team don’t know about this fix, you’re going to have extra slow, disjointed Power Query loads where the native query gets fully run first then all subsequent steps are chunked through.
Add to this the security risk native queries introduce for query injection. When I’ve trained up report writers in the past, one of the things I’ve emphasized is that performing transformations in Power Query can’t change the source data; native queries are the exception to that rule. You could easily drop or alter a table in SQL via a Power BI native query if you had the permissions to do so.
A more likely (if pernicious) risk is that you’ll proceed to build out your query the same way you usually do in SQL—which is to say, wide and flat—and forget all about your data modeling options. Combine that with the broken query folding issue above and you may end up with a seriously slow report.
A better option: Skip native queries unless your specific use case requires them. Instead, make your transformations upriver (when you can) or in Power Query itself.
2. An abundance of calculated columns
Why they’re scary: As Christopher Wagner beautifully summarized last week on LinkedIn, calculated columns are the enemy of efficient data models, especially when used on a fact table rather than a dimension table. When you add a calculated column to a table, it applies that calculation iteratively to each and every row in that table.
Calculated columns are easy enough to get away with on a smaller table when you only have a few hundred rows, but they’re a total nightmare once your data begins to scale.
A better option Measures are the right tool for the job if you’re trying to get calculated output that uses more than one column. If you’re aiming to get the iterative output from each individual row (say you need different discount rates applied to different base prices throughout the table), you can get the same output as a calculated column by using the SUMX function, or one of its cousins, which include AverageX(), ProductX(), and CountX().
If you need to create summary categories to support a hierarchy in a dimension table, you can get away with making a calculated column. However, it’s still better to follow Roche’s Maxim and apply those groupings in Power Query (or SQL) instead.
3. Unlabeled measures (Measure1, Measure2, Measure3, Measure4…)
Why they’re scary: One or two unnamed measures are merely annoying, but dozens of unnamed measures? Good luck maintaining that report when you have to tease out which measures go with which renamed fields on your report’s various visualizations.
A better option: Literally any name is better than no name. Please label your measures for everyone’s sanity. You get extra brownie points if you organize and provide descriptions for them in the data model using the Properties Pane.
4. & 5. Unlabeled fields and tables
See above. Please don’t do this. Have mercy.
6. ADF pipelines that look more like rollercoasters than logical processes
Why it’s scary: aaaaaaaaaAAAAAAAHHHHHHHH! Buckle in you guys and prepare to follow multiple interlocking pipelines of convoluted if-then logic. I hope you like hieroglyphs and spending your weekends huddled over your work laptop, because you’re not going anywhere for a while. Twenty hours from now, you’ll realize that the resulting output could have been achieved with a simple Copy Activity and you’ll start seriously considering quitting your job to run a goat farm.
It’s an option.
A better option: Plan out what your data transformations should be on paper, before you create some nausea-inducing loop-the-loop that not even you can explain once you’re done with it. The more complex your pipeline needs to be, the more important it is to carefully plan out the design so the logic can be both efficiently applied and plausibly understood.
7. Complex SQL queries with no comments (!)
Why it’s scary: Listen, this view has 800 rows. I don’t know why this 30-step case-when statement here on row 256 was added midway through a subquery, then got unioned onto three other subqueries and joined to six temp tables. Do you?
Even if you build the query with total confidence and have all the business logic straight in your head, there’s a zero percent chance you’re going to remember the specifics three months from now.
A better option: For your own sake and for the sake of any who follow you, please add comments to your views, stored procedures, and saved queries. Any time you’re adding logic for data cleanup (e.g., “this specific value couldn’t be fixed in our ERP system so we’re manually correcting it here”), to meet particular business requirements, or to incorporate more than a couple straight-forward joins, take the time to add those precious green notes to your SQL code. Future you thanks you, as do I.
8. Fact table queries with dozens of columns
Rufus does not deserve this abuse.
Why it’s scary: The platonic ideal of a fact table holds only aggregate-able data and key columns, with all the yummy descriptive information stored in the dimension tables. If your fact table has dozens of columns, something has gone very wrong. As your data scales over time, that puppy is going to have a hard time keeping up.
What to do instead: Think of your dataset as something you want to make as light as possible, like a hiking backpack you’re going to take on an extended trip. Every column adds weight to your dataset, but fact (or transactional) columns tend to be the heaviest because they routinely need to hold millions of rows. You can’t ditch those individual transactions because you need that level of information to accurately sum and calculate the output in your reports. Instead, you’ll want to take all that useful descriptive data and hand it over to the dimension tables for safekeeping. A good, but imperfect, guideline is that any text fields on a fact table (e.g., vendors’ addresses, company names, and employee titles) should be moved to a dimension table whenever possible.
9. Hard-coding individual email addresses as members in Row Level Security (RLS) roles
Why it’s scary & what to do instead: If you’ve only got a few people working at your company, you can get away with manually typing each person’s email into the member list for a RLS role, but anything more than that? Yeah, you’re going to want to use groups in Microsoft Entra ID (née Azure Active Directory). If you try to maintain your RLS member list manually, the effort to keep up with update requests will slowly take over your life until you either go mad or change your approach, whichever comes first.
I recommend the latter. RADACAD’s got a nice write up on setting up Dynamic RLS that may help to save your sanity.
10. Literally anyone who says “Oh you don’t need to worry about maintaining this [pipeline],[dataset], [report], etc. after go-live”.
Run.
Why they’re scary: Business intelligence tools are perpetual works in progress. To be useful, BI reporting and data plumbing must reflect the organizations they support—and those organizations are perpetually growing and changing. If you end a project thinking you can rest on your data laurels indefinitely, you won’t be prepared to support those changing needs.
A better option: Embrace “iterative improvement, forever”, then hire and train your staff accordingly. The alternative is to watch your organization’s BI systems slowly break down into obsolescence, then paying a premium for someone like me to come clear out your technical debt just to get you back to zero. It’s your call, but I know what I’d prefer.
11. “Datasets” that are actually just one ENORMOUS table
Why it’s scary: Listen, pretty much everyone starts by making a big table. Data modeling is intimidating when you’ve spent your time up to this point using lookups to whack things into a single Excel table. Thankfully, Power BI forces each of us to face our data modeling fears by ensuring that our reports spontaneously combust once the data floodgates open. It’s a rite of passage.
A better option: Conquer data modeling basics before that data model conquers you. SQLBI has an excellent—and free—resource to get you started: Introduction to Data Modeling for Power BI Video Course - SQLBI.
12. Rainbow Brite reports
Why it’s scary: My EYES! I’m blind!!!
We had a great speaker at Power BI DC last week, who noted that, when everything’s calling for your attention in a report, nothing can keep it. When putting the visual components of your report together, aim to use color sparingly, strategically, and consistently to get your point across.
A better option: I’m fond of a fairly boring grey and blue color palette, though heavens know my first couple of dashboards were closer to Willy Wonka. Blue is still enough of an attention-getter that it’s useful for drawing the eye and emphasizing key details, but it’s still visible to most color-blind people (unlike red and green, two thirds of the popular “stoplight” color palette.) You can use more colors of course, just make sure they’re earning their ink.
Happy Halloween! (Yikes!)
13. Data sources connected to a employee’s desktop who has already left the company.
Why it’s scary: Ohhh honey, I hope that data source wasn’t too important. Any chance the IT department hasn’t cleared the hard drive yet?
A better option: This might be the scariest thing on this list. Moving forward, you want to store important data centrally, never on anyone’s desktop. Having everything beautifully organized in a data warehouse is ideal, but keeping data in SharePoint will work fine too. Can’t help you much with your old co-worker’s data retroactively though. Maybe a time machine?