Home Artificial Intelligence Running Python via Excel VBA —a Case of Time Series Resampling Time series resampling

Running Python via Excel VBA —a Case of Time Series Resampling Time series resampling

0
Running Python via Excel VBA —a Case of Time Series Resampling
Time series resampling

Recently, I even have been working with Excel-based interface and models ceaselessly. During this time, I became acquainted with the Visual Basic for Application (VBA) for Office, which is a robust programming language to increase Office applications. VBA is used to automate repetitive tasks, extend user interactions, and interact between different Office applications, making the routine tasks more efficient and effective.

With a good programming background in Python, it took me relatively steep learning curve towards the start to learn different features and functionalities of VBA, which steadily turned less steep with time. I even have realized that learning VBA may be very useful because it interacts with Excel workbook directly and may automate the tasks on the workbook itself that otherwise would require coding in a distinct programming language. Nonetheless, if it is feasible to run a script in a distinct programming language (e.g., Python) inside Excel VBA, it might be much more helpful in automating our routine tasks.

On this post, I’ll share my experience of using Excel VBA and Python to perform a sure bet — time series resampling of solar irradiance data. Moreover, I’m going to present how a Python script could be run via Excel VBA to perform the identical task. Let’s start.

Image by Aron Visuals on Unsplash.

Data

The info used is hourly All Sky Surface Shortwave Downward Irradiance (ALLSKY_SFC_SW_DWN) for 2020 (UTC time zone) downloaded from NASA Power website for 4 cities: Chitwan (Nepal), Recent York (USA), Sydney (Australia), and Bonn (Germany). The info was accessed and downloaded using the API service of NASA Power with a Python script, which I intend to document in a distinct post.

Solar irradiance is the facility per unit area (W/m2) obtained from the sun in the shape of electromagnetic radiation within the wavelength range of the measuring instrument. Solar irradiance integrated over a time period gives the solar irradiation, known as solar insolation (Wh/m2).

Based on the definition of NASA Power, the parameter used All Sky Surface Shortwave Downward Irradiance (ALLSKY_SFC_SW_DWN) refers to the overall solar irradiance (direct plus diffused) incident on a horizontal plane on the surface of the earth under all sky conditions. Another term for total solar irradiance is Global Horizontal Irradiance (GHI). For the reason that values are hourly, its unit is Wh/m².

Data in Excel file

Since 2020 was a intercalary year, I obtained the hourly solar irradiance values for 8784 hours for the 4 cities. These values are placed in column B, C, D and E of the Excel file as shown below.

Hourly solar irradiance values for 4 cities in 2020 placed within the Excel file. The formula bar shows the formula utilized in highlighted cell A2. Illustration by Creator.

To get the values in column A in date format, the next formula was utilized in Excel (e.g., in cell A2):

=TEXT(DATE(2020,1,1)+ROW(A1)/24, “yyyy-mm-dd hh:mm:ss”)

ROW(A1)/24 is used to get the worth of hour inside a day (0–23 hr).

Also, I named cell A1 as datetime.

Plotting the unique hourly solar irradiance data for 2020 for the 4 cities in Excel looks as follows:

Plotting hourly solar irradiance values using Excel. Illustration by Creator.

Time series resampling refers to converting the frequency level of time series data. In easy words, resampling is a time-based groupby, followed by a discount method on each of the groups. The info will need to have a datetime-like index for this purpose.

1. Time Series Resampling using Excel VBA

There is no such thing as a default function in Excel for time series resampling. Subsequently, I wrote a few subroutines to get monthly and hourly average values as described within the sections a and b respectively below.

a. VBA code for converting hourly values to average monthly frequency

This section describes the code snippet (below) used to convert hourly value to average monthly value in VBA.

An array is a gaggle of variables in VBA. The default lower sure of an array element in VBA is 0. Mentioning Option Base 1 on the highest of the sub-routine changes the lower sure of array element to 1. I defined an array called columns(4) as a gaggle of strings containing 4 variables. And I passed the strings B, C, D and E inside this array.

Cell A1 in Excel is defined as a named cell datetime. To discuss with this cell in VBA, I declared datetime as a spread and assigned it to the range with same name in Excel.

To discuss with the month from the datetime column in Excel, I used the MONTH() function and assigned it to an integer variable called mnth. To loop through hourly values in each row, I declared one other integer called row. Finally, I declared sum and num_hours to calculate the monthly average values.

'Use Option Base 1 before this subroutine if you would like to start the list from 1 as a substitute of 0.
'https://excelchamps.com/vba/arrays/
Option Base 1
Sub GetMonthlyAverage()

'defining an array for 4 strings
Dim columns(4) As String
columns(1) = "B"
columns(2) = "C"
columns(3) = "D"
columns(4) = "E"

'Check with cell A1
Dim datetime As Range
Set datetime = Range("datetime")

'Definining mnth because Month is a function
Dim mnth As Integer
Dim row As Integer

Dim sum As Double
Dim num_hours As Double

Next, I created a for-loop to iterate through the columns for values for every city. Inside this for-loop, there are two other nested loops to iterate through every month and hour of the yr, respectively. The sum aggregates the hourly solar irradiance values for every month, and the num_hours aggregates the variety of hours in every month. Finally, by dividing the sum with num_hours, monthly average values of solar irradiance are obtained for every month for every city.

'Loop through column for every city
For Each column In columns

'Loop through every month of the yr
For mnth = 1 To 12

sum = 0
num_hours = 0

'Loop through each row
For row = 2 To 8785

If MONTH(Cells(row, datetime.column)) = mnth Then
Range(column & row).Interior.Color = RGB(255, 255, 0)
num_hours = num_hours + 1
sum = sum + Range(column & row).Value

End If

Next row

Range(column & mnth).Offset(1, 7).Value = sum / num_hours

Next mnth

Next column

End Sub

As a note, Range(column & row).Interior.Color = RGB(255, 255, 0) highlights each cell with yellow color while iterating through the row (city) and column (mnth).

b. VBA code for converting hourly values in a yr to average hourly values for every of the 24 hours (0–23 hr) of the day in 2020

The code used to convert hourly values (of a yr) to average hourly values for every of the 24 hours of the day in 2020.

Within the code for this section, first, I extracted the worth for the last_row (8785) in the information table using

Cells(datetime.row, datetime.column).End(xlDown).row

to loop through each row for further processing.

And I used the HOUR() function in Excel to retrieve the corresponding hour in column A in each row using:

Hour(Cells(row, datetime.column).column).Value

The entire code for the aim of this section is given within the snippet below:

Option Base 1
Sub GetHourlyAverage()

‘defining an array for 4 strings
Dim columns(4) As String
columns(1) = “B”
columns(2) = “C”
columns(3) = “D”
columns(4) = “E”

'Definining mnth because Month is a function
Dim mnth As Integer
Dim row As Integer

Dim sum As Double
Dim num_hours As Double

Dim wb As Workbook
Dim ws As Worksheet
Dim datetime As Range
Dim last_row As Integer

Set wb = ThisWorkbook
Set ws = ThisWorkbook.Sheets("Sheet1")
Set datetime = ws.Range("datetime")
last_row = Cells(datetime.row, datetime.column).End(xlDown).row

Debug.Print datetime.Value
Debug.Print "Row: " & datetime.row & " Column: " & datetime.column
Debug.Print "Last row: " & last_row

'Loop through column for every city
For Each column In columns

'Loop through each hour of the day
For hr = 0 To 23

sum = 0
num_hours = 0

'Loop through each row
For row = datetime.row + 1 To last_row

If Hour(Cells(row, datetime.column).Value) = hr Then
Range(column & row).Interior.Color = RGB(255, 255, 0)
num_hours = num_hours + 1
sum = sum + Range(column & row).Value

End If

Next row

Range(column & hr + 2).Offset(0, 14).Value = sum / num_hours

Next hr

Next column

End Sub

Within the above code snippet, Debug.Print command is used to print the intermediate ends in the intermediate window within the VBA developer space as shown below:

Output of Debug.Print as visible in Immediate Window. Illustration by Creator.

2. Time Series Resampling using Pandas

The pandas library in Python provides an in-built method for time series resampling using df.resample() and passing the rule for resampling. For instance, “M” is for monthly, “W” is for weekly, “Q” is for quarterly, “D” is for day by day, “B” is for Business Day, etc. The entire algorithm available for resampling to different frequency levels could be found here.

A pre-requisite for time series resampling is that the dataframe index must be converted right into a datetime type using pd.to_datetime().

Dataframe index must be of type DatetimeIndex as a pre-requisite for time series resampling. Illustration by Creator

Any built-in method available via Groupby is obtainable as a technique of the returned object of df.resample(), including min(), max(), mean(), median(), std(), first(), last(), ohlc(), and sem(). On this post, I’m simply assessing the mean values of the solar irradiance.

Within the Python code below, I allow the user to enter the frequency that they wish to return and display for the solar irradiance values. The choices include Original, Monthly average, Day by day average, Weekly average, Quarterly average, All the above, and Hourly average (for every of the 24 hours inside a day).

import pandas as pd
import matplotlib.pyplot as plt
import os
import sys

#Enter os system to current working directory
os.chdir(sys.path[0])
file = "solar_irradiance.xlsm"

#read all rows and first 5 columns
df = pd.read_excel(file).iloc[:, :5]
df["Datetime"] = pd.to_datetime(df["Datetime"])
df.set_index(["Datetime"], inplace = True)

frequency = input("Enter the frequency you would like to display? n1. Original n2. Monthly averagen3. Day by day average n4. Weekly averagen 5.Quarterly average n 6.All the above n 7. Hourly average n? ")

if frequency == "Original":
print (df)
df.plot()
plt.title("Original solar irradiance in 2020")
plt.ylabel("Wh/m$^2$")
plt.legend()
plt.show()

elif frequency == "Monthly average":
print (df.resample(rule = "M").mean())
df.resample(rule = "M").mean().plot()
plt.title("Monthly average solar irradiance in 2022")
plt.ylabel("Wh/m$^2$")
plt.legend()
plt.show()

elif frequency == "Day by day average":
print (df.resample(rule = "D").mean())
df.resample(rule = "D").mean().plot()
plt.title("Day by day average solar irradiance in 2022")
plt.ylabel("Wh/m$^2$")
plt.show()

elif frequency == "Weekly average":
print (df.resample(rule = "W").mean())
df.resample(rule = "W").mean().plot()
plt.title("Weekly average solar irradiance in 2022")
plt.ylabel("Wh/m$^2$")
plt.legend()
plt.show()

elif frequency == "Quarterly average":
print (df.resample(rule = "Q").mean())
df.resample(rule = "Q").mean().plot()
plt.title("Quarterly average solar irradiance in 2022")
plt.ylabel("Wh/m$^2$")
plt.legend()
plt.show()

elif frequency == "All the above":
fig, axs = plt.subplots(2, 2, figsize = (20, 10), sharex = True, sharey = True)
df.resample(rule = "D").mean().plot(ax = axs[0, 0])
axs[0, 0].set_title("Day by day mean")
axs[0, 0].set_ylabel("Wh/m$^2$")
df.resample(rule = "W").mean().plot(ax = axs[0, 1])
axs[0, 1].set_title("Weekly mean")
df.resample(rule = "M").mean().plot(ax = axs[1, 0])
axs[1, 0].set_title("Monthly mean")
axs[1, 0].set_ylabel("Wh/m$^2$")
df.resample(rule = "Q").mean().plot(ax = axs[1, 1])
axs[1, 1].set_title("Quarterly mean")fig.suptitle("Mean solar irradiance in 4 locations converted to different temporal frequencies")
plt.show()

elif frequency == "Hourly average":
#average value in each hour inside 24 hours of a day
print (df.groupby(df.index.hour).mean())
df.groupby(df.index.hour).mean().plot()
plt.title("Hourly average solar irradiance in 2022")
plt.ylabel("Wh/m$^2$")
plt.legend()
plt.show()

else:
print ("The frequency you entered is wrong.")

This script could be run by going to the terminal/command prompt and typing python -m python_script.py if one is identical path because the script file. To abrubt the run, one can type Ctrl+C.

3. Running Python script via Excel VBA

The above Python script may also be run via Excel VBA. For this purpose, I saved the above script as python_script.py file.

The entire subroutine written in VBA to run the Python script is given in code snippet below.

Sub RunPythonScript()
Dim objShell As Object
Dim PythonExePath As String, PythonScriptPath As String
ActiveWorkbook.Save

'Enter into the trail of given workbook
ChDir Application.ThisWorkbook.Path
Set objShell = VBA.CreateObject(“Wscript.Shell”)

'Enter into the trail of given workbook
ChDir Application.ThisWorkbook.Path

Set objShell = VBA.CreateObject("Wscript.Shell")

'Goto cmd. Type where python to get this path. Note that there are three quotes below.
' The hash symbol # below must be stuffed with the trail in your system.
PythonExePath = """C:Users#######################python.exe"""

'Get the trail of the file.
PythonScriptPath = Application.ThisWorkbook.Path & "python_script.py"

objShell.Run PythonExePath & PythonScriptPath

End Sub

First, I declared objShell variable to discuss with the address of an object (Wscript.shell). This object allows to access the Windows functionality to run the external program (here, Python script).

The PythonExePath refers back to the path of the Python application in the pc system. In a Windows system, this path could be found by typing where python within the Windows command prompt.

The trail of the python script is defined as string in PythonScriptPath. Note that this path shouldn’t have space for the script to run

Finally, the next line is used to run the Python script using the Python application via the Excel VBA interface.

objShell.Run PythonExePath & PythonScriptPath

Output

I assigned the above sub-routine/macro (command written using VBA language) to a button in Excel file as shown below:

Button where the macro is assigned to run the Python script. Illustration by Creator.

Clicking the button runs the Python script as shown below and asks the user to enter the frequency the user desires to display output as:

Python asks the user to enter the choice. Illustration by Creator.

Upon choosing “All the above”, I get the output plot for the day by day, weekly, monthly, and quarterly average values of solar irradiance within the 4 cities in 2020. It’s observed that because the time resolution increases from right to left and top to bottom, the road/curve becomes smoother because the variabilities in lower time resolutions are averaged out.

Output while choosing All the above shows hourly values of solar irradiance averaged out for day by day, weekly, monthly and quarterly levels. Illustration by Creator.

Conclusion

On this post, I presented three techniques for time series resampling of hourly solar irradiance data for 4 cities in 2020:

  1. Using Excel VBA
  2. Using pandas in Python
  3. Running Python script via Excel VBA interface

Excel VBA could be very handy while working with data in Excel since it allows us to perform various operations and have direct interactions with data in several sheets in the identical or different Excel file where the sub-routines/macros are written. Also, by writing macros and assigning them to interactive buttons/userforms or other widgets, it is straightforward to share Excel file with other users to whom only the functionalities matter and never what happens with the code within the backend.

One among the most important benefits of Python is that there are packages with several in-built functionalities, which makes writing a separate code for routine tasks redundant. On this context, combining the strengths of Excel VBA and Python could be very advantageous. That is depicted by the way in which I ran the Python script for time series resampling by clicking a button on an Excel file, which is related to just a few easy lines of VBA code.

The Excel file including the macros, and the Python script can be found within the script folder inside this repository. Thanks for reading!

LEAVE A REPLY

Please enter your comment!
Please enter your name here