Access-ible Data - Part 1
5 July 2018
Welcome back to Power BI Tips!
One of the neat things about Power BI is the large number of connectors to Get & Transform data. We’ve been working with flat files until now but to realise the full potential of Power BI, we’re going to start having to work with REAL data. The most efficient way to store data is in a database.
Databases instead of storing all information in one file (for example like an Excel file), data is structured in tables. Instead of repeating information unnecessarily, tables hold information about certain data and these tables can have relationships to each other.
For example, let’s have a look at some standard sales transaction data:
Notice how the product information is repeated on every line? As a new attribute is added to the product, the sales transaction data will have to add a new column to the data set. The number of transactions can be quite large, meaning our file will need to store “number of product attributes” x “number of sales transactions” pieces of data. That can grow quite large!
Instead, we could store it this way:
- A table full of products, each uniquely identified (in this case with Product ID)
- A table full of sales transactions that has the Product ID tied to the transaction.
This is much more efficient (even though it wouldn’t work in a standard pivot table!).
A schema is a blueprint of the database which shows how the data is structure in tables and how they relate to each other. The schema that would describe this basic database would look like this:
Information is stored efficiently in databases and Power BI makes it trivial to work with it.
For our series let’s work with a real database that is much more complex. The data source for this series will come from “The Integrated Postsecondary Education Data System”(IPEDS). This is a system of interrelated surveys conducted annually by the U.S. Department of Education's National Center for Education Statistic. All their survey data is available for download in MS Access format here. I’ll be using the 2016-2017 Access data set. Download the file (over 50mb) and we’ll start working through it over the next few weeks.
Tune in next time for more Power BI Tips!