How Can I Do Market Research on NHS England Prescribing in Power BI?
11th April 2023
By Tope O
This article describes how a data analyst can conduct analysis and produce prescribing insights using Power BI and an analysis-ready version of the NHS English Prescribing dataset. The analysis-ready dataset is modelled as a star schema (i.e. dimensional model) and published in data file formats that are optimal for analytic workloads. It is also freely and openly available to download through the Open Data Blend Datasets service.
Combining modern BI tools with analytics-optimised open datasets makes it possible to achieve a level of productivity and efficiency that would otherwise be out of reach for most data analysts. In this case, the pairing of Power BI Desktop with the Open Data Blend Prescribing dataset made it possible to download, load, model, analyse, and report on over 380 million rows of NHS England prescribing data with about three hours of total effort on a laptop.
A Power BI report presenting insights derived from over 380 million rows of NHS England prescribing data
The analysis was done in December 2022, and the NHS English Prescribing dataset published by the NHS Business Services Authority (NHSBSA) is always two months behind the current month. This means the latest available data for 2022 at the time was October 2022.
How to Get Data from the Open Data Blend Datasets Service
To begin my analysis I went to the Open Data Blend website to download the required datasets. You can find the link to the datasets page here. I navigated to the 'Prescribing' dataset, and from there, I downloaded the data files listed below:
- English Prescribing
- English Prescribing 2021
- English Prescribing 2022 (with data up till October 2022)
- Prescribing Chemical
- Prescribing Info
- Prescribing Practice
- Primary Care Organisation
It is important to highlight that although the NHSBSA publish one data file per month for the English Prescribing Dataset, the Open Data Blend Prescribing dataset has these consolidated into heavily-compressed yearly data files. Thanks to this, I could get what I needed without downloading many data files, and they took up less storage space on my laptop.
I downloaded the Parquet versions for each data file that I needed because these are column-oriented and designed to provide efficient columnar storage compared to row-based file types such as CSV. However, you can also choose to download the csv.gz files if you want to.
How to access Parquet files using the Open Data Blend Datasets UI
After downloading the data files, I saved them to a OneDrive folder location, ready for loading into Power BI Desktop using the 'Get Data' feature.
Loading the NHS English Prescribing Data into Power BI
Loading Parquet Data Files into Power BI
To load the Parquet datasets into Power BI I followed the steps below:
- Open Power BI desktop and navigate to Get Data.
- Choose Parquet from the list of data sources that you are presented with.
- Copy and paste the file path of the folder where your files are and include the name of the file in the format folder path\filename.extension.
- Click on OK and this will load the data.
- It is good practice to rename each dataset to a more meaningful name so that it is easy to understand when working with your data models and designing your reports.
- Repeat this for all the Parquet datasets that you want to load and then click on the Close & Apply button on the top left side of the Power Query Editor screen.
Loading Gzip Compressed Data Files into Power BI (Without Decompressing Them First)
For scenarios where the Gzip data files are preferred over the Parquet equivalents, there is a way to load them into Power BI without decompressing them first. I was able to do this by downloading them to my laptop and then performing the following steps:
- Open Power BI Desktop and navigate to Get Data.
- Choose Blank Query from the options you are presented with. This will take you to the Power Query Editor page.
- On the Power Query Editor page select Advanced Editor from the top menu ribbon and enter the Power Query M expression below to load the csv.gz file that you have downloaded from the Open Data Blend website:
let Source = Binary.Decompress(File.Contents("C:\Documents\Prescribing Data\ data_fact_english_prescribing_2022_20221220T041225Z_english_prescribing_2022.csv.gz"), Compression.GZip), #"Imported CSV" = Csv.Document(Source,[Delimiter=",", Encoding=1252, QuoteStyle=QuoteStyle.None]) in #"Imported CSV"
- The Power Query M expression explicitly applies Gzip decompression in the 'Source' step. Without this, Power Query cannot read the compressed (Gzip) CSV file directly.
- Click Done on the Advanced Editor window and a preview of the data should appear.
- Apply the Use First Row as Header transformation and rename the query to something appropriate.
- Click Close & Apply to load the data into the data model.
Modelling NHS English Prescribing Data in Power BI
Creating Relationships Between the Tables
Once Power BI finished loading the data from the six data files required for the analysis, relationships were created between all tables using the Model view in Power BI Desktop. I used the English Prescribing Relationship Diagram model that can be found under the Useful Resources section at the bottom of the page here to ensure the relationships were correct.
Where you can find the relationship diagram for an Open Data Blend dataset
English Prescribing relationship diagram
For five out of the six table relationships, Power BI automatically creates the relationships because the names of the columns that should be linked to each other are the same. An exception to this is the Date dimension, so you should only need to create the relationship between the Date table and the English Prescribing table. The English Prescribing table is the fact table in this star schema model, and all the other five tables are dimension tables with their respective primary keys.
Creating Simple Measures (i.e. Calculations) using DAX
Once all the datasets had been loaded and the relationships had been created, I then proceeded to create the measures that will be used on visuals on the report. Some measures that were created include:
- Total Prescription Items: This gives the total quantity of prescription items and is calculated as:
- Total Actual Cost: This gives the aggregated actual cost of prescription items and is calculated as:
- Same Period Last Year Total Prescriptions: This gives the total quantity of prescription items but for the same period last year depending on what date the report user selects as a filter. This measure is calculated as:
CALCULATE( [Total Prescription Items], SAMEPERIODLASTYEAR( 'Date Dimension'[Date]), ALL('Date Dimension') )
- Previous Month Prescription Items: This gives the total quantity of prescription items but for the previous month for any date the report user selects. This measure is calculated as:
CALCULATE( [Total Prescription Items], PREVIOUSMONTH( 'English Prescribing'[drv_date]), ALL('English Prescribing') )
Visualising NHS English Prescribing Data in Power BI
This part of the article focuses on deriving some insights to answer market research questions such as:
- What are the top 10 most prescribed diabetes items in October 2022?
- What are the bottom 10 prescribed diabetes items in October 2022
- What are the average number and average cost of diabetes prescription items in October 2022?
- What is the actual cost and total prescription items trend for the last 18 months leading up to October 2022?
- What is the percentage share of the top 10 diabetes prescription items relative to all diabetes prescription items?
Following data visualisation best practices, bar charts have been used to represent the top 10 and bottom 10 prescription items analysis as this helps to compare the number of items prescribed for each of the diabetes prescription items in the top and bottom 10 categories. Card visuals were used to represent the average cost per diabetes prescription item, average prescription items per day, and percentage share of top 10 prescription items as these are single figures and representing them with cards allow users to spot the information quickly. Lastly, line charts have been used for trend analysis, enabling users to see changes in volumes and cost over the period in question.
Top 10 Diabetes Prescription Items
The top 10 diabetes prescription items for October 2022
Visual above shows the top 10 diabetes prescription items for October 2022. The chart shows that GlucoRx Nexus testing strips is the most popular prescription item prescribed for treating Diabetes in October 2022.
Bottom 10 Diabetes Prescription Items
The bottom 10 diabetes prescription items for October 2022
Visual above shows the bottom 10 diabetes prescription items for October 2022. The chart shows that Glucose 10% infusion 500ml is the least popular prescription item prescribed for treating Diabetes in October 2022.
Average Number, Average Actual Cost of Diabetes Prescription Items and Percentage Share of Top 10 Items in October 2022
Note that for the average number of items per day, two additional measures were created; one to count the number of days in each selected month (Days in Month) and the other to divide the total number of prescription items by the days in month measure (this gives the Average number of prescription items per day for any month selected by the user).
Calculating the Average Number of Prescription Items Per Day
Days in Month is derived using the formula below:
Days in Month = VAR MaxDate = MAX('Date Dimension'[Date]) VAR DaysInMonth = DATEDIFF( EOMONTH(MaxDate,-1), MaxDate, DAY ) RETURN DaysInMonth
Average Prescription items per day is derived using the formula below:
Average No of Items per Day = DIVIDE([Total Prescription Items], [Days In Month])
Calculating the Average Cost Per Prescription Item
Average Cost is derived using the formula below:
Average Cost per Item (£) = DIVIDE([Total Actual Cost], [Total Prescription Items])
Calculating the Percentage Share of the Top 10 Items
Percentage Share of the top 10 items is derived using the formula below:
Percentage Share of Top 10 = VAR TopGroup = TOPN( 10, SUMMARIZE( ALL('Prescribing Chemical'[src_british_national_formulary_description]), 'Prescribing Chemical'[src_british_national_formulary_description], "Prescriptions", SUM('English Prescribing'[src_english_prescribing_items_total_quantity]) ), [Prescriptions] ) VAR Totaltop10 = SUMX(TopGroup, [Prescriptions]) VAR PercentageShare = DIVIDE(Totaltop10, [Total Prescription Items]) RETURN PercentageShare
Combining the three metrics above, we can see that, on average, there were 1.84 million items prescribed for diabetes which cost an average of 16 pence per item in October 2022. We can also see that the top 10 diabetes prescription items accounted for approximately 65% of all diabetes prescription items in the same period.
Diabetes Prescription Trend Analysis
The line graphs below show the number and actual cost of diabetes prescription items over the 18 months leading up to October 2022.
Diabetes prescription items trend for the last 18 months to October 2022
Actual cost of diabetes prescription items over the last 18 months to October 2022
If you look at the total prescription and actual cost graphs, you will notice that the trends for both are similar, which suggests that there might not be economies of scale gained from buying more prescription items, or that the economies of scale have reached an upper limit. As an additional step to improve the quality of trend analysis of the costs, we could adjust the numbers for inflation. However, given the 18 month time frame, the inflation adjusted costs are unlikely to significantly impact the trends.
In this article, you saw how you could quickly and efficiently acquire the English prescribing data using the Open Data Blend Datasets service. You also saw how easy it is to model the data in Power BI Desktop and use appropriate visualisations to produce meaningful insights. Using the Open Data Blend Prescribing dataset, I saved a significant amount of time I would typically have spent transforming the source dataset into facts and dimensions (i.e. a star schema) and creating derived columns where needed.
The results of the analysis are the following:
- GlucoRx Nexus testing strips are the most popular prescription item prescribed for the treatment of Diabetes in October 2022
- Glucose 10% infusion 500ml is the least popular prescription item prescribed for the treatment of Diabetes in October 2022
- The top 10 diabetes prescription items accounted for approximately 65% of all diabetes prescription items in October 2022
These types of insights could be useful to procurement departments that purchase these items for hospitals and pharmacies, or for manufacturers with plans to introduce challenger products into the market.
The transition from data to insights can be further streamlined with the Open Data Blend Analytics and Open Data Blend Insights services. Using Open Data Blend Analytics, I could have connected Power BI Desktop to an analysis-ready data model and produced the same insights in a fraction of the time. That way, I could have spent more time exploring and analysing additional prescribing trends and patterns.
The Open Data Blend Analytics service supports a variety of analytical tools such as Excel, Power BI Desktop, and Tableau Desktop, and users can benefit from the following features when using this service:
In-memory Analytics: This enables users to flexibly summarise 100s of millions of rows in seconds, making it possible to analyse large data volumes interactively.
Semantic Data Model: Direct access to a semantic model made up of tables, columns, relationships, hierarchies, and measures. As the service already knows how the tables are related and what meaningful aggregations are applied, this enables analysts to save time that would have been spent trying to work out how tables are related or creating complex calculations/measures.
Lastly, I could have used the Pharma Market Insights report, one of the many reports available through the Open Data Blend Insights service, to quickly explore relevant trends and patterns before diving deeper into my bespoke analysis.
Follow Us and Stay Up to Date
Keep up to date with Open Data Blend by following us on Twitter and LinkedIn. Be among the first to know when there's something new.
Blog hero image by Towfiqu barbhuiya on Unsplash.