While not every data analys is a programmer, learning Python can be of great benefit to any analyst who utilizes Power BI for their data processing and visualization, as Python can be used to increase the capacity of Power BI on both of these fronts.
Setup:
To begin using Python in Power BI, you have to have Python installed on your machine. The best approach is to create a virtual environment containing a stable version of Python and any libraires that may be needed for data manipulation and visualization with Pandas being a required library.
Data visualization:
From the Report view, open the “Visualizations” tab and confirm that you have “Python Visual” as an option. With this as an option, you greatly increase the number of visualizations you can create in Power BI. Seaborn, Matplotlib, Plotly, as well as other data visualizations libraries can all output image files that Power BI can then render for an end user.
Data loading:
To load in data using a Python script, click on “Get Data”, select “Other” from the sidebar, then select “Python Script”, then click “Connect”. This will open up a simple text editor for you to copy and paste your Python Script into. A Python script must end with a pandas Dataframe to be used as a valid dataset.
Access Data from more places
There are several benefits to using Python scripts as datasets. First, you are more likely to find Python libraries that enable you to connect to a greater variety of data sources than you are to find a native connector in Power BI. These libraries are also more likely to be updated / maintained.
Access Data more securely
Secondly, even if a native connector does exist for Power BI, using a Python script may allow you to specify extra parameters and utilize extra security features in a more seamless manner. For example, Power BI has a native connector for Snowflake, but if your organization uses a single sign on browser or two factor authentication, you will be out of luck using Power BI alone, but with Python, you can specify authenticator='externalbrowser' when creating your connection engine. Similarly, Power BI has a native connector for online databases (e.g. MySQL, PostgreSQL, MongoDB, etc.), but these connectors have a limited number of security parameters. If you have an SSL certificate, a CA, certificate, or additional .pem files that need to be included when connecting, you may not be able to connect easily though the native connector. However, you can pass all these as arguments into a Python Script.
Greater control over data manipulation
Beyond the ability to pull in more data from more sources, you also have greater flexibility when it comes to manipulating your data. While Power Bi’s M query language has a lot of options, having access to Python’s core language and libraries allows you to programmatically alter data before loading it into Power BI. With Python, you can use complex conditional logic, regular expressions, and mappings to perform more advanced data operations than you would otherwise be able to accomplish in Power BI alone. It is also possible that these data manipulations could be done faster if you are using a library like Polars to modify a very large dataset.
Multi-file consolidation
Many organizations do not have data well organized. For example, you may have data scattered across 1,000 emails, or have a process that creates an excel file every month. Loading in all of these files into Power BI one at a time would be a nightmare. However, having a basic Python script iterate through all of those files and consolidate them all into one dataset can radically improve the speed you get your data and future-proof your script for when new data comes in.
Automated Procedures
If you wanted to run a Python program on a weekly or daily basis, you could create a project with a cloud service provider and use cloud functions that run on a schedule. However, this will result in charges based on how many functions you need to run and the amount of computing resources utilized to complete the function, along with any storage space needed if the function results in a file.
However, if you already have a premium Power BI license with a premium workspace, there is a way you can do something similar at no extra cost. If you are using python as a data source for published dashboards, you will need a local default gateway installed so that Power BI’s webservices can connect to your local machine whenever scheduled refreshes are run. Now we have two tools at our disposal:
Now let’s say you have a python program that you want to run once a week. You can write the Python program and have it end with a pandas Dataframe with fake data in it. Save and Publish this dashboard to your premium workspace. Then, go to scheduled refresh and have the data refresh once a week or once a day. If you need the program to run monthly, you can add additional logic to the program using the datetime library to meet your needed conditions. For example, “If it is the first week of the month, run, otherwise, exit the program / do nothing.”
Conclusion
Python is an incredibly popular and user-friendly programming language to learn, and any analyst using Power BI would benefit from learning it. Being able to pull in data from a larger variety of sources in a more secure and efficient manner, being able to manipulate data with greater flexibility, and visualize your data in ways that Power BI cannot do natively will radically increase your capabilities as an analyst.