Home Artificial Intelligence The Starter Guide For Transitioning Your Python Projects To R Tutorial: Electric Vehicle Licenses in Canada

The Starter Guide For Transitioning Your Python Projects To R Tutorial: Electric Vehicle Licenses in Canada

0
The Starter Guide For Transitioning Your Python Projects To R
Tutorial: Electric Vehicle Licenses in Canada

R Tutorial

Exploring Electric vehicle trends in R

Towards Data Science
Photo by Milad Fakurian on Unsplash

Are you inquisitive about delving into the world of R programming? While Python stays the dominant selection amongst the info science community, with roughly 60% of developers using it in 2022¹, there are instances where R may pop up once in a while. That’s because R is optimized for statistics and data. Should you, like me, have a foundation in Python but now encounter job listings and internal company tasks that demand R skills, this text goals to interrupt that down. We’ll explore the basic distinctions between Python and R and wrap the project into a knowledge cleansing and visualization tutorial to make sure a smooth transition to R.

Note: If you’ve got a keen interest in green technology and electric vehicles, the tutorial includes some interesting visuals that showcase the recognition of electrical and hybrid vehicles in Canada, so be at liberty to skip ahead to the tutorial section to explore these visuals and associated analyses firsthand!

A temporary breakdown of R

R is an open-source, programming language that has a popularity for getting used primarily within the fields of statistical modelling and data visualization. Originally developed in 1993 by statisticians Robert Gentleman and Ross Ihaka, R was designed to handle statistical evaluation and data transformation tasks It still maintains its popularity for being a statistics-focused program. Nonetheless, because of an unlimited library of over 18,000 packages, it has also evolved to support a big selection of projects and applications beyond data science through the years.

On the subject of setup and application, R is often used throughout the RStudio environment, which is each free and simple to put in. You’ll find an installation guide here. Now that now we have covered some initial explanations, let’s move onto our cheatsheet transition guide from Python to R.

Exploring the important differences between Python and R

While it’s unimaginable to capture all of the nuances between Python and R in a single image, the diagram below provides a very good initial overview of the important thing differences between the 2 programming languages:

Python to R Transition Guide — Image by Writer

Please note that this diagram is just not exhaustive and doesn’t encompass all distinctions between Python and R. For a more detailed, and comprehensive breakdown tailored to your specific projects, MIT has a very good conversion resource here.

So to summarize the the important thing differences between Python and R, let’s highlight a few key items:

  • Syntax: Python adopts a more straightforward and concise syntax, whereas R’s syntax tends to involve the next usage of parentheses, brackets, and symbols. This may make R code initially appear more complex, but we are going to explore this idea afterward within the tutorial.
  • Data Manipulation: Python relies more on external libraries like NumPy and pandas for complex data manipulation tasks. In contrast, R often provides built-in functions and features specifically tailored for data manipulation.

We will explore these differences in practice to realize a more complete understanding of the contrasting points of Python and R. Let’s move onto the tutorial section where we are going to undergo some easy cleansing and data transformation and explore these visuals using data.

Our R package breakdown

Before we start, let’s get accustomed to the R packages we will likely be working with:

  • tidyverse: This package was created to follow the principles of tidy data (because the name suggests) and incorporates many essential packages. Amongst them, dplyr is popular for its capabilities in data manipulation and transformation, and ggplot2 offers a robust suite of tools for data visualization.
  • sqldf: This R package that lets you perform SQL queries on R data frames, providing a more convenient option to apply the SQL syntax for data manipulation and evaluation throughout the R environment.
Photo by Roberto H on Unsplash

On this tutorial, we are going to give attention to examining the recognition of sunshine and zero-emission vehicles following the launch of the Canadian Federal Government’s Incentives for Zero-Emission Vehicle Program (iZEV); which is a national program which offers financial rebates to Canadians who purchase electric vehicles, including plug-in hybrid vehicles. Lucky for us, now we have access to the Government of Canada’s data spanning from this system’s inception in 2019 up until March 2023.

The evaluation will likely be separated into 2 sections: data loading and cleansing which makes some light comparisons between Python and R, and subsequently the data evaluation. First, let’s outline a number of the questions we wish answered by our visuals to assist guide the sections, where the first focus is on understanding the shifts in popularity over time:

  1. How have the variety of vehicles registered under the iZEV program evolved through the years?
  2. What changes have occurred in automaker brand preferences because the implementation of this system?
  3. Which vehicle models have experienced essentially the most significant increases and reduces in popularity?

To keep the tutorial concise and focused on Python to R transition, evaluation will likely be kept fairly broad, nevertheless we are going to show the ultimate produced data visualizations to finish an image on iZEV licenses registered in Canada up to now. For a more in-depth evaluation, including the entire R code, additional markdown, and visuals, in addition to a link to the dataset, please confer with my GitHub repository here.

Data Loading and Cleansing

To start, we are going to install and cargo the R packages we mentioned earlier, which may be followed out with the next code:

Next, let’s load within the packages into our R environment:

To load our data we are going to use the <- operator to assign values to variables, versus the = operator commonly utilized in python. After which, we are able to quickly use the dim function to retrieve the variety of rows and columns of our loaded data set which is reminiscent of the numpy.shape() function in Python.

In R, you possibly can explore the primary rows of a dataframe using the pinnacle() function. This is analogous to the df.head() function in Python’s pandas library. Here’s an example of learn how to accomplish this in R:

The %>% operator helps to chain the operations. Within the code above it lets R know to take the df dataframe after which show the primary 5 lines, where you'd have an output just like the one below:

After obtaining an initial overview of the info frame and shape, straight away we are able to see some irrelevant columns which may be removed. At the identical time, we are able to revise some lengthy column names for easier reference afterward.

All of those steps may be replicated in Python using drop(), rename() and map() functions.

The following steps of the info cleansing process often entails removing nulls and duplicate rows, nevertheless for this particular dataset we are going to only remove nulls since now we have many duplicate rows, but within the absence of unique row identifiers (i.e. license ID) we risk losing helpful data if we remove duplicates, so we want to trust that the row inputs are correct. Here’s how you'd remove nulls in R:

The Python row equivalent function of dropping nulls may be called using dropna().

This final step is in preparation for the ultimate query on vehicle make and models popularity, where we are able to make automobile model naming conventions within the Vehicle_Make_and_Model column consistent. For instance, we might consider ‘Hyundai Ioniq PHEV’ the identical as ‘Hyundai Ioniq Plug-In hybrid’. We will do that by creating a listing and referring to it with the str_replace_all function.

Great, that’s some easy data cleansing out of the way in which, now onto the interesting part, the evaluation!

Data Evaluation

The next visuals will explore each of the three questions we had outlined at first, and explain how ggplot versus matplotlib differs. Let’s take a take a look at our first query:

How have the variety of vehicles registered under the iZEV program evolved through the years?

Here, we want to visualise years and counts of licenses registered, where the R dplyr package may be used to edit the clean_df data frame so it’s in an acceptable format. Each row of the info set counts as a vehicle entry, so summarise(total=n()) is required to receive total row counts:

On the subject of plotting data, there are some differences between R’s ggplot and Python’s plotting libraries. In ggplot, a layering syntax is used, where different components are added using the + operator.

Let’s compare with how you would possibly code this out this exact plot using matplotlib with Python:

Overall, the length of code is fairly similar between the 2, but with R we are able to see the code looks more condensed with the + operators. Now, let’s show this visual:

Data up until March 2023

I also included a second plot illustrating the breakdown of iZEV recipients by province (Full R code may be present in my GitHub Repository here):

Data up until March 2023

Observations

So what can we see from these two visuals? We will see that there was an overall increase within the variety of zero-emission vehicles registered under the iZEV program from 33,611 licenses in 2019 to 57,564 licenses in 2022, supporting the growing transition to electric vehicles in Canada. Note: the EV market represents a small position of overall passenger vehicle registrations at ~5%⁴.

Breaking this out by province, we see Quebec accounted for the most important share of licenses, surpassing that of BC, Ontario and Other provinces combined, likely partially as a result of higher financial motivations as Québec offers a further rebate of as much as $8,000 on top of the Federal government scheme (in contrast BC offers only as much as $3,000). As well as, a transparent mandate to utility company Hydro-Quebec, has aided the EV charger infrastructure within the province, helping to ease driver concerns of where to recharge.

Breaking this out by province, we see Québec accounted for the most important share of licenses, surpassing that of BC, Ontario and Other provinces combined, likely partially as a result of higher financial motivations as Québec offers a further rebate of as much as $8,000 on top of the Federal government scheme (in contrast BC offers only as much as $3,000). As well as, a transparent mandate to utility company Hydro-Québec, has aided the EV charger infrastructure within the province, helping to ease driver concerns of where to recharge.

2. What changes have occurred in automaker brand preferences because the implementation of this system?

To handle our second query, we wish to look at changes in popularity amongst automakers. As an alternative of specializing in absolute totals as we did in the primary query, we are going to explore relative changes. By analyzing proportions, we are able to compare the performance of various automakers on an analogous scale, allowing for more meaningful comparisons.

Now, bear with me here since there's a good amount of R code before we arrive at our next visual, but what we're ultimately going for listed below are subplots by brand showing proportional change yr over yr. All of the info now we have to date only represents absolute counts, so we want to calculate ‘per 1,000 vehicles sold’ for the proportional scale.

To start out we are going to create a table that shows vehicle brand by yr and count which we are able to perform using the sqldf R package:

Next, we wish to have every year split out onto its own row where we are able to use the pivot_wider function (fairly just like the pivot function in Python).

After, we wish to calculate the ‘per _1K’ licenses for every year which we are able to do by taking each vehicle count by brand, dividing it by the full vehicles registered and multiplying this by 1,000.

Now we wish to calculate the difference between 2022 and 2019, looking only at complete years by way of proportional change.

This next step re-pivots the years and the per_1K columns that we calculated into one long pivot table to assist prepare these for our graphs. After, we are going to join absolutely the counts and the per_1K counts into one long pivot table.

query_vehicle_counts has all the pieces we want already so we just need to affix these two:

Lastly, we wish to rank each vehicle by their totals by yr, which is where we are able to reuse sqldf and use window functions to do that easily.

Finally, to plot out how proportions of vehicles have modified over time we are able to use subplots (in ggplot that is known as facet_wrap, this is analogous to subplots within the Python matplotlib library).

Here’s our visual:

Observations

Having the subplots laid out above show us that when comparing proportions Tesla has the most important share of cars, accounting for 300 out of 1000 electric vehicles (EVs) on the road in Canada from 2019–2022. Previously couple of years they've lost some market share as newer incoming players have arrived on the Canadian market equivalent to Audi, Jeep, Mazda and Polestar.

3. Which vehicle models have experienced essentially the most significant increases and reduces in popularity?

Our last set of questions focuses on the recognition of specific vehicle models, where we are able to examine the changes in proportions of models purchased between 2019–2022. The code for this evaluation is fairly just like what we used for the previous query, exchanging Vehicle_Make for Vehicle_Make_and Model (again, for a more detailed step-by-step guide please confer with my GitHub link here).

Let’s code these out and explore why we see is likely to be seeing these patterns:

Observations

Based on the graph above we are able to see that the Hyundai IONIQ 5 saw the most important boost in popularity, where for each 1,000 licenses there have been 83 more purchased in 2022 versus 2019. In percentage terms, the model saw an 8.3% increase during this time period. It can be crucial to notice that the vast majority of the models that saw growth in popularity are SUVs, as evident from the highest five listed above. This goes with the preferences of the North American market, where consumers have awaited larger electric vehicle options, shifting away from smaller sedan formats that were previously dominant, equivalent to the Tesla Model 3.

Let’s move onto coding and plotting out the model decreases:

Observations

Whilst still within the Top 5 ranks of hottest automakers, the Toyota Prius Prime has seen the most important drop in popularity, with 114 fewer licences (per 1,000) from 2019 to 2022. The drop could also be as a result of supply issues and raised prices affecting popularity, but in addition because it is a plug-in hybrid model iZEV financial incentives are reduced, where you possibly can only get half of the total available rebate offered.

The Tesla Model 3 saw a decrease in popularity, with a decline of licenses of 64 per 1,000 licenses, however it is price noting that Tesla still stays a dominant automaker out there when taking a look at absolute totals.

Closing Thoughts

In conclusion, now we have launched into a mini data evaluation project to explore the coding differences between Python and R. Hopefully, this has made R feel somewhat more approachable and has lent some inspiration to create compelling visuals on your personal!

As a final, friendly reminder, for access to the total R code, please take a look at my GitHub repository here. Glad Coding!

LEAVE A REPLY

Please enter your comment!
Please enter your name here