We have all seen the amazing things Microsoft is doing each month with Microsoft Power BI updates. I’ve been working in the Microsoft ecosystem for 20 years now, and I’ve never seen an update frequency like the one we see with Microsoft Power BI and the Power Platform in general.
These updates are not trivial either and they usually span across several areas of Power BI itself. With the recent August update, Microsoft has now giving us the ability to host, render and use Python script for visuals and within PowerQuery (M) for loading data frames into Power BI data sets. With this, we now have a great choice of either R Script visuals or Python Script visuals to help us craft more advance data stories. This includes such stories that would be descriptive in nature, but also moving more and more into the forecasting and predictive / prescriptive models.
With this new Python visualization option, it is super important to understand now what does this bring to the table for crafting stories? What are the current limits? What is the perceived and known road map for Python being valid target for effective data stories in Power BI?
Now lets dive into Python for Power BI, how to set this up and getting started within using it with Microsoft Power BI. In the following sections we will answer the questions set in the previous paragraphs which should enable you to start using Python and looking to incorporate it into your data story needs.
I. Install and Setup of Python for Power BI
In order to start to take advantage of Python with Microsoft Power BI you will need an acceptable install of Python running on your desktop along side Microsoft Power BI. As of this writing the requirements for executing this then lay on the requirements for running Microsoft Power BI Desktop. Basically a windows machine that can run Microsoft Power BI. The following is the official Microsoft page for requirements: Microsoft Power BI Desktop Hardware and Software requirements.
With the requirements for running and executing Microsoft Power BI in place, next you need the ability to supply the working instance of Python that Microsoft Power BI Desktop will use. This is the compiler, execution and editing of Python code enabled through the targeted use of the new Python visual element.
You can see from the above image I have already installed and I am working with Anaconda3 implementation of Python. This brings with it PyData, which is the major Python library that you will be using for working with data using Python within your targeted data stories.
Before you can reach the point of selecting a Python home directory, as well as a targeted IDE to use, you will need to enable the ‘Python support’ within the File > Options and Settings > Preview Features of your Microsoft Power BI desktop report file. You can see this option enabled in the screen shot below.
Having this enabled you will need to also restart your instance of Power BI for this to take affect. Simply click save, close Microsoft Power BI desktop, and then launch your report file again.
If you do not have an install of Python, or if you want to ensure your using one of the latest implementations of PyData, you can download and install Anaconda3 from the following resource page: Download and Install Anaconda3 for Windows.
After installing Anaconda3, or if you already have your desired implementation of Python installed, then you can proceed with setting up your ‘Python scripting’ options found in File > Options and Settings > Python Settings.
Now that this is in place, you can move to selecting your desired IDE. This can launch from within the coding section of the Python visual you are targeting. You will notice in my screen shot above I have listed Notepad++. I use Microsoft Visual Studio and Notepad++ as my two IDE’s of choice. This is always a preference and what gives you the most productivity targeting.
Having your options set, and having python in place, you are now ready to unleash your Python skills and bring to Power BI all the great options that PyData has to offer.
II. Basic Usage of Python in Power BI
Now that we have our Microsoft Power BI Desktop and Python environment setup, we can start targeting the new ‘Py’ visualization control that you will find within your visualizations options. When you first select to add a new ‘Py’ visual control to your report page, you will most likely get a warning about enabling scripting. A notification similar to the one below is what you should see.
Here simply enable script visuals and you are ready to start tackling python based data stories.
In adding your first Python visual, you will notice a few things. First it is blank, second a scripting window appears that will house your python scripting magic. Further you will see the message to drag fields into the visualization. This is to help setup the PyData, data frames that will be consumed and used within the python data visual.
Once you have a python visual selected, you need to think in terms of what data you will target for your story. For our usage I have sales data set that I’m working with which I will drag the Customer dimension attribute, and two measures called Total Sales and Target Sales.
In adding these as Values to the Python visual you will see some generated code in the gray area in which we can not modify. This is setting up the pandas.DataFrame object, which comes from the PyData library, which is part of the Python implementation that you installed or enabled earlier.
The following screen shot shows the Python generated code, that establishes a pandas.DataFrame. This dataframe will be used within the scope and context so that you can then plot visualizations from your data using Python script.
The Python visual generated code, found in the Python script editor window gives us a clean dataframe in which to proceed with. Breaking down the comments and code you can see that our targeted fields for: Customer, Total Sales and Target Sales are used to load the pandas.DataFrame() with. The object instance in which represents the initialized and loaded dataframe is called dataset.
The next action that the generated Python code does is call the ‘Pandas.DataFrame().drop_duplicates()’ method. This cleans the dataframe so we have only unique records across the three data elements that are initialized within the dataframe itself. You can google: pandas.dataframe.drop_duplicates to find out more around the invocation of this method for a targeted dataframe.
Now that we have some fields to work with, lets jump into some actual Python code that will be used in order to generate a simple line visual. To start we will drop a green line that represents the Total Sales value. The following code snippet achieves just that:
from matplotlib import pyplot; ax = pyplot.gca(); dataset.plot.line(x = 'Customer', y = 'Total Sales', color = 'green', ax = ax);
Now that we have added some python code to plot a line, we should be able to hit play button on the script and see the nice results. In doing this however we get a vague error message. The error message states: “Can’t display this visual. No image was created. The python code didn’t result in creation of any visuals. Make sure your Python script results in a plot to the Python default device.”
In reviewing this, the problem is even though we make a call in our python script to plot a line from our dataframe there is a missing final call to make this actually happen. The following code has been updated in order to include this critical piece of missing code.
from matplotlib import pyplot; ax = pyplot.gca(); dataset.plot.line(x = 'Customer', y = 'Total Sales', color = 'green', ax = ax); pyplot.show();
With adding the highlighted ‘pyplot.show();’ line of code you are able to tell Python to render what was called to-be plotted. We see the measure named ‘Total Sales’ across ‘Customer(s)’ plotted as our desired green line. In adding this we now have what we need in order to start using Python in more meaningful ways.
Its important to point out now that you can easily filter the Python visuals, and the data contained within just like any other visual. For the simple data story I’m using as part of this blog post, I’m also filtering the data set by year, and have the option to filter by specific customer as well.
In the above image you can see I have my date filter top-right hand corner of the report tab. I also have 7 customers selected in the bar chart to the right of the Python based visual. In looking at this, since I have customer as the X-axis then you can see how the 7 different customers are ranking in 2012 data for total sales. I would also like to point out that for a line visual, most likely a better view would be to have ‘Total Sales’ and have a date value, like month, for example, to place along the X-axis. You can do this by adjusting the value fields for the Python visual itself. This will then bring into the rendering capability of that visual, and the underlying dataframe that is built, the month value to work with.
Having now established valid working values, the follow are some examples of plotting visuals using Python that you might find useful. First off the following code will render correctly both ‘Target Sales’ and ‘Total Sales’ as lines within the same visualization.
from matplotlib import pyplot; ax = pyplot.gca(); dataset.plot.line(x = 'Customer', y = 'Total Sales', color = 'green', ax = ax); dataset.plot.line(x = 'Customer', y = 'Target Sales', color = 'red', ax = ax); pyplot.show();
With the above code you now see how you can plot line visuals in Python, and actually have multiple lines appear as part of your visual. Notice that we are plotting using the same DataFrame object named ‘dataset’. We are calling the .plot.line() with passing in the desired parameters to plot via. This includes what should be plotted along the x, y axis as well as color to use.
With PyData and the Pandas.DataFrame class, there are several options in plotting Python based visuals using your data model within Microsoft Power BI desktop. To find out what all is possible, please review the following resource that will give you options and examples on what all can be plotted using an instance of the Pandas.DataFrame class: Pandas API Online Reference.
When using the reference lookup the reference point under ‘Plotting’ for the Panda.Series.plot. This will give you exact method calling signatures and how to implement the various plotting outcomes for your Power BI Python visual.
Now that you have established the basics of plotting using Python, and specifically PyData and Pandas you are ready to fully unleash the power of Python onto your Microsoft Power BI data stories!
III. Limits of Python in Power BI. (as of 9/1/2018)
Like all things new and preview there are some existing restrictions and limits to the current release of Python visuals within Microsoft Power BI Desktop. The largest of these is the fact that Python visuals are *only* supported within the Microsoft Power BI Desktop experience. That means you can not publish your Power BI reports and expect to use Python visuals with your published story lines – yet.
This limitation really puts the brakes on, so to speak, in being able to fully unleash Python to help you craft your data stories. However lets look at Microsoft Power BI Desktop for all that it is. Microsoft Power BI Desktop is *not* a report authoring tool. It is a model crafting tool! Yes you do create Power BI Report files with the tool. However the entire experience of connecting to data, mashing up data from several locations and crafting end-to-end data stories is all centered around the concept of a model file.
The very language in which you connect to data and bring your stories to life via is called ‘M’ or better known these days as PowerQuery. The ‘M’ stands for modeling, and if you look at the evolution of the ‘M’ language you will find out that it was / is built to be a domain specific modeling language. Hence the entire experience of Microsoft Power BI Desktop is a modeling experience.
To this end, and with the ability to cluster, graph, and create deep machine learning and predictive models with Python – then there is plenty you can do now in adding Python scripting to your overall Microsoft Power BI experience. For the data analyst, data architect and data scientist role this means Python is very much ready for use in exploring and training predictive models with.
We have reached the end of this blog post. I hope you have enjoyed the experience and find it useful in helping you getting started with truly using Python with your Microsoft Power BI data stories. With the rate in which Microsoft releases updates for Power BI, you can bet very soon not only will Python help you in data discovery and stories within the Microsoft Power BI Desktop experience but also with being able to publish, share and embed Python based Power BI data stories on PowerBI.com.
As new updates to this great marriage of technologies are released you can also bet I will be creating new and meaningful post that will help you get the most out of your Microsoft Power BI experience.
Until next time!