Building a macroeconomic dashboard with python and power bi

Moving from Beta v1.0 to Release v2.0

In this article, I would be releasing the final version of a macroeconomic dashboard that is built using python and power bi. This is a continuation of the previous v1.0 web scraping project which I have done. So do check it up if you are keen! What I did earlier was to write a python code that extracts out all the economic events for a particular month into a nice excel sheet for consumption.

In v2.0, it is going to do much more powerful things. For each economic event, I asked python to click all the way down until the earliest date, then extract the entire historical actual values. This allows us to plot out a trend. After all, a value itself doesn’t tell us much about anything. We have to compare it against historical data for it to be meaningful.

Additionally, I also incorporated Power BI in my analysis. This is because there are different countries and different economic metrics. It would not be practical to plot out each graph using python. Power BI is the best tool for this as it allows users to interactively filter out which country and economic metric they are interested in.

Similar to the previous post, I would walk through the thought process and explain how to scrape economic data and put it on power bi.

Part 1 – Filtering Out Power BI Links

The first problem is that NOT all the high-impact events have values. Things like speeches, reports, votes all have no values, they are just a text report. So we have to narrow down the filter to select only quantitative economic events that have historical values.

#Filtering out links that have ACTUAL values
powerbi_links = []
for row in table.find_all('tr', {'data-eventid':True}):
    list_of_cells = []

    if row.find_all('span', class_='high'):
        for span in row.find_all('td', class_='calendar__cell calendar__actual actual'):
            if span.text == "": 
                powerbi_links.append(url+"#detail="+ row['data-eventid'])

The code to get all the links is the same as v1.0. The only difference I added in was the if statement. So if the table row has an actual value of blank, then we are going to skip that. Else, put this link into our power bi links list. This code would then select only the economic events that are quantitative in nature.

Part 2 – Clicking “More” using Selenium

The second problem is to think about how to write a code that asks python and selenium to keep clicking “More”.

I wanted python to grab all the historical actual values as shown in the red box. However, you can see that it only shows us the dates up till 2019. To view the earlier dates, I have to click the “More” button. And not only do I have to click once, but I have to click MULTIPLE times until it goes all the way back to the earliest date. So we have to think a way to ask python to keep clicking that “More” button until it can’t click anymore. Here is the code to do it.

#Clicking more continuously until the last date
click_more = True
while click_more:
        element = WebDriverWait(driver, 3).until(
        EC.element_to_be_clickable((By.XPATH, "//div[@class='flexBox calendarhistory pad']//*[contains(text(),'More')]")))
        driver.execute_script("arguments[0].click();", element)
    except TimeoutException:
        click_more = False

I wrote a while loop so that python would keep executing the function. Then I asked selenium to go find the button which contains the word “More”. This is done using XPath which says, first find the div tag that contains the class name “flexBox calendar history pad” . Within this div tag, go and find whatever thing that contains the word “More”. After which, press the click button.

If selenium is unable to locate the “More” button, there would be a time out exception. The moment when selenium can’t find the “More” button is the moment when it has reached the earliest date. And when that happens, the variable click_more will become false, thus breaking out of the while loop.

The purpose of this step is to ensure that we get the full table of data displayed before we go about crawling the dates and actual values.

Part 3 – Web Scraping Historical Data

Once selenium has clicked until the end of the date, we can tell python to go and crawl all the dates and actual values in the table. The steps to do so is exactly the same as described in the earlier post of v1.0.

The only challenge here is getting the event detail name as well as the currency. In v1.0, that was done simply by using a for loop and iterate through each table row to get the values.

However, in this scenario, we are not inside any loops. So we need to figure out a fresh way do get the location of that event and currency, WITHOUT going through a loop.

I found that the title name is located inside a table data tag with the class name of “calendar__cell calendar__event event”. The parent of that table data tag is a table row tag. That table row has a particular unique id which represents the economic event name.

Hence, to get the text of that event detail, we have to FIRST ask selenium to search that table row WITH that unique id, then go one level deeper to search its children, the table data.

But the problem is we don’t have that id number. We only got the full link URL. The solution is just to extract out the last 6 characters from each of the links above and feed it to selenium. This is an example of how the code looks.

data_event = link[-6:]
for td in driver.find_elements_by_xpath("//tr[@data-eventid = 
" + data_event + "]/
td[@class='calendar__cell calendar__event event'] "):
    for i in range(count):

In the first row, I extracted out the last 6 characters of the link. That would give us a unique ID. Then I asked selenium to go find the table row which has that unique id that I just passed in. So each loop iteration would give a different event name.

Another problem is that the actual values would be different throughout the entire historical dates. It goes up and down. BUT, the event name is still the same. So I have to append the SAME economic event for x number of times.

But how do I know how many actual values are there? One way to do this is just to put a count function on top. Every time python appends an actual value, it would count 1 time. In this way, the final count would represent the total number of rows it has. This explains the last two rows in the code.

#Extracting the date, actuals, currency and event detail
count = 0
    table = driver.find_element_by_xpath("//table[@class='alternating calendarhistory']")
    table_rows = table.find_element_by_xpath("//tr[@class='odd' or @class='even']")
    for td in table_rows.find_elements_by_xpath("//td[@class ='history calendarhistory__row calendarhistory__row--history']"):
    for td in table_rows.find_elements_by_xpath("//td[@class='actual calendarhistory__row calendarhistory__row--actual']"):

Here is the code to extract the dates and the actual values.

The above blocks of codes are where the main heavy lifting is done. It took about 10 minutes for python to run through the entire code and scrape all the information. You can imagine the load given that we are extracting data that goes all the way back to 2008, for EACH event.

Once we have retrieved all the data about the dates, actuals, event name and currency, the rest is just putting it into a data frame. Data analysis is usually the easiest part. The hard part is all the logic behind extracting and organising how you want the information to be presented.

Part 4 – Converting it into a DataFrame

At this moment, we have four lists of data. The first is the currency, second is the event name, third is the dates and last is the actual values. Let’s create a data frame and put all these lists into four different columns.

df = pd.DataFrame()
df['Country'] = country
df['Events'] = names
df['Dates'] = Dates
df['Actuals'] = Actuals

After which we are going to set the index as the event name and drop the MPC Official Bank Rate Votes event. That is because its values don’t make much sense. It goes something like 0-2-7, 0-3-6 and etc.

Next, we have to convert all the actuals values into a number. Right now it is stored as a string text. However, there are some values with % some in thousands (k), millions (m) and billions (b). To change the whole column into a float format, we have to remove all these characters. Here is how I changed the entire Actuals column into a float value.

df['Actuals'] = df['Actuals'].str.strip('%')
df['Actuals'] = df['Actuals'].str.strip('<')
df['Actuals'] = df['Actuals'].map(lambda x: x.rstrip('K'))
df['Actuals'] = df['Actuals'].map(lambda x: x.rstrip('B'))
df['Actuals'] = df['Actuals'].map(lambda x: x.rstrip('M'))
df['Actuals'] = df['Actuals'].astype(float)

Let’s see how our data frame looks.

Pretty neat! That is exactly how it should look. The last and final step is to extract this into an excel file and feed it into Power BI. Power BI loves crunching huge data sets.

Part 5 – Power BI Macroeconomic Dashboard

This is where the fun part begins. Let’s check up some of the visuals. All the charts and figures are based on the data source which we just extracted from Forex Factory.

Suppose I want to see what is the macroeconomic outlook of the US. I can select US and that would automatically filter out the available economic indicators for that country.

macroeconomic dashboard

Let’s say I want to look at the ISM Manufacturing PMI. Power BI would automatically change the visual based on whatever filters you select. I can look at oil inventories, fed fund rates, unemployment rate, CPI and the chart would change entirely. That red dotted line there is the average.

What about China, let’s check out China.

macroeconomic dashboard

China only has 3 economic indicators. Their previous GDP was around 6%. Again, we can select the Caixin Manufacturing PMI or Manufacturing PMI and that chart would change dynamically.

Or the unemployment rate in Australia. Seems to be creeping up.

macroeconomic dashboard

Manufacturing activities in Euro have also been contracting since 2018.

macroeconomic dashboard

In a single dashboard, you can view a broad spectrum of economic indicators from all the major economies. This dashboard would be updated at every month-end, in a single click. Using BeautifulSoup and Selenium in Python, we have successfully scraped economic data from forex factory and convert it into something meaningful and insightful.

Image may contain: 6 people, people smiling

And with that, it marks the end of my web scraping project with Beautiful Soup and Selenium. Initially, I had doubts about whether v2.0 would be released due to its complexity. It was a bit of an ambitious one while I was still learning all these new python libraries. But I just did it anyway without thinking. You just have to take the leap of faith onto step 5 (I will try and do it).

Moving forward, my next topic would be to learn natural language processing (NPL) in python. Similarly to how I started with BeautifulSoup, I am completely clueless about that, know nothing about it. But I think it would be an interesting one! So do stay tuned for upcoming posts about NPL in the context of finance.

3 Responses

  1. your NPL typo is kinda cute haha. i kept reading it as no-pay leave.
    anws really impressive & encouraging post on using such a DIY method to translating data into insights!! a good demonstration of couple of the key knowledge to have today – coding and finance.

  2. Wow this is truly amazing. Would you have a link to the full code as it would be in Jupyter notebook? Many thanks

Leave a Reply

Your email address will not be published.