Power Query 101: Get NHL Standings Data with Power Query
Last week at the PASS Summit 2014, I attended Matt Masson and Tessa Palmer’s session on Power Query Deep Dive. One of Tessa Palmer’s demo was a shout out to the Vancouver Canucks and how she has kept track of the standings of the Canucks by using Power Query.
That demo really piqued my interest (I like the Canucks too, but they keep on breaking my heart), so I wanted to try it out myself. This is my first stab at it, so I will probably look back at this post a few months from now and see where I could improve the process.
1.First thing to do, after installing and setting up Power Query, is to get the URL for the NHL standings site. The division standings can be found in: http://www.nhl.com/ice/standings.htm?type=DIV#&navid=nav-stn-div
The data is deeply embedded and formatted in HTML tables, as can be seen in the screenshot below. Getting this data pre-Power Query would have been possible, but would have required some nifty programming
2.Open Excel and go to the Power Query tab. Since we have the URL, we can go ahead and use the “From Web” ribbon and paste in the URL http://www.nhl.com/ice/standings.htm?type=DIV#&navid=nav-stn-div
Once you click on OK, you will notice in your Navigator pane on the left a number of tables retrieved by Power Query. There’s Table 0, which corresponds to the first table in the web page for Eastern Conference – Atlantic. Table 1 is also Eastern Conference – Metropolitan. Table 2 is Western Conference – Central, and Table 3 is Western Conference – Pacific.
3.Check the checkbox to select multiple items, and select Table 0 to 3. Click on Load.
4.Once the tables are loaded, click on Append. Set Table 0 as primary, and append the other tables to it, starting with Table 1.
When you click OK, a query editor pops up. Just click on Close & Load to load this to the spreadsheet.
5.To load the next tables, you will need to set the primary table to the final result set that was returned. You should see this result set under the Workbook Queries. In the screenshot below, it’s the “Append1” query.
Once all the tables are loaded, you will see that some of the fields are formatted weird, so we need to do some cleanup.
6.Double click on Append3 from Workbook Queries pane to edit the data set.
6a. Rename the leftmost column. These are the current standings, so we will just call this Standing.
6b. Add two new columns called Conference and Division.
Since in the Query Editor we cannot manually change values, we can temporarily assign a value first for the new columns. After we Load & Close, we can adjust the values from the spreadsheet.
After the columns are added, Load & Close and adjust the values from the spreadsheet.
Table 0 Eastern Conference – Atlantic
Table 1 Eastern Conference – Metropolitan
Table 2 Western Conference – Central
Table 3 Western Conference – Pacific
For the Conference column, the value of the first two sets of 8 would be “Eastern” and the value of the next two sets (one with 8, another with 7) would be “Western”
For the Division column, these have been imported as column headers. We need to use these as values for the rows.
(I am sure there is a better way to automate this, but I will leave that to a later post, once I get more familiar with Power Query and M)
Your final worksheet should look like this:
It is now ready to be visualized!
Matt Masson just posted a Notepad++ language file for M (Power Query’s formula language). Will come in quite handy when working with M.