The Microsoft Big Data Hackathon is happening in Vancouver. (Register here)
April 18-19, 2015 (Saturday and Sunday)
Simon Fraser University – Morris J Wosk Centre for Dialogue (Map)
580 W Hastings St
Vancouver British Columbia V6B 1L6
You will need to register for the event. Register here.
To participate, you will need to be part of a 3-5 member team.
Last week at the PASS Summit 2014, I attended Matt Masson and Tessa Palmer’s session on Power Query Deep Dive. One of Tessa Palmer’s demo was a shout out to the Vancouver Canucks and how she has kept track of the standings of the Canucks by using Power Query.
That demo really piqued my interest (I like the Canucks too, but they keep on breaking my heart), so I wanted to try it out myself. This is my first stab at it, so I will probably look back at this post a few months from now and see where I could improve the process.
1.First thing to do, after installing and setting up Power Query, is to get the URL for the NHL standings site. The division standings can be found in: http://www.nhl.com/ice/standings.htm?type=DIV#&navid=nav-stn-div
The data is deeply embedded and formatted in HTML tables, as can be seen in the screenshot below. Getting this data pre-Power Query would have been possible, but would have required some nifty programming
I always enjoy the PASS Summits, and most of the time there is always a handful of things that I get excited about. I don’t even wait ‘til I get back to work to try these out, I usually try it out right after the session or the night after the session.
Lately I have been more involved in data warehousing, ETL and visualization/dashboarding/analytics. I have not had the chance to look more closely at some of the new packages or tools from any vendor for that matter. At this week’s PASS summit, I attended many of the Power BI and Power Query sessions. After I saw the first session on Power Query, I was hooked. There were some data sets that I’ve been trying to scrape with other tools – SSIS, PowerShell etc – that seemed to have been scraped and cleaned much, much easier (or should I say magically) by Power Query. Power Query – you had me at “hello”.
What is Power Query?
Let me quote Microsoft on this definition.
Microsoft Power Query for Excel, is a new add-in that provides a seamless experience for data discovery, data transformation and enrichment for Information Workers, BI professionals and other Excel users
To put it differently, it is the desktop ETL tool you wish you had, which you now do. I am no Excel guru, and I usually don’t look to Excel if I need to do any kind of data dump, cleanup and transform. Before even thinking of Excel, I would probably think of ways to do the cleanup and transform in T-SQL, SSIS, PowerShell, C# (sometimes in this order, sometimes not). After seeing Power Query, my list and order of preference for these tools have just changed.
So what do you need to start your Power Query adventure?
Download and Install
First off, you will need to download the Power Query add-in. You can download it from http://www.microsoft.com/en-us/download/details.aspx?id=39379. There is an Excel 2010 and Excel 2013 version, and there is also a 32 and a 62 bit version. So just make sure you know which Excel version you have. Should you make a mistake, the installer will tell you anyway, so no harm done.