An exploration of the advantages that Fortran can bring to Data Science and Machine Learning
Python is widely considered the gold standard language for Data Science, and all the range of packages, literature, and resources related to Data Science is all the time available in Python. This will not be necessarily a nasty thing, because it signifies that there are many documented solutions for any data-related problem that you might encounter.
Nonetheless, with the arrival of larger datasets and the rise of more complex models, it might be time to explore other languages. That is where the old-timer, Fortran, may turn into popular again. Due to this fact, it is worth it for today’s Data Scientists to turn into aware of it and possibly even attempt to implement some solutions.
Fortran, short for Formula Translator, was the primary widely used programming language that originated within the Fifties. Despite its age, it stays a high-performance computing language and will be faster than each C and C++.
Initially designed for scientists and engineers to run large-scale models and simulations in areas comparable to fluid dynamics and organic chemistry, Fortran remains to be ceaselessly used today by physicists. I even learned it during my physics undergrad!
Its specialty lies in modelling and simulations, that are essential for varied fields, including Machine Learning. Due to this fact, Fortran is perfectly poised to tackle Data Science problems, as that’s exactly what it was invented to do many years ago.
Fortran has several key benefits over other programming languages comparable to C++ and Python. Listed here are a number of the fundamental points:
- Easy to Read: Fortran is a compact language with only five native data types: INTEGER, REAL, COMPLEX, LOGICAL, and CHARACTER. This simplicity makes it easy to read and understand, especially for scientific applications.
- High Performance: Fortran is commonly used to benchmark the speed of high-performance computers.
- Large Libraries: Fortran has a big selection of libraries available, mainly for scientific purposes. These libraries provide developers with an unlimited array of functions and tools for performing complex calculations and simulations.
- Historical Array Support: Fortran has had multi-dimensional array support from the start, which is important for Machine Learning and Data Science comparable to Neural Networks.
- Designed for Engineers and Scientists: Fortran was built specifically for pure number crunching, which is different from the more general-purpose use of C/C++ and Python.
Nonetheless, it will not be all sunshine and rainbows. Listed here are a few of Fortran’s drawbacks:
- Text operations: Not ideal for characters and text manipulation, so not optimal for natural language processing.
- Python has more packages: Regardless that Fortran has many libraries, it is much from the overall number in Python.
- Small community: The Fortran language has not got as large a following as other languages. This implies it hasn’t got lots of IDE and plugin support or stack overflow answers!
- Not suitable for a lot of applications: It’s explicitly a scientific language, so don’t try to construct a web site with it!
Homebrew
Let’s quickly go over the best way to install Fortran in your computer. First, you need to install Homebrew (link here), which is a package manager for MacOS.
To put in Homebrew, simply run the command from their website:
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
You may confirm Homebrew is installed by running the command brew help
. If there aren’t any errors, then Homebrew has been successfully installed in your system.
GCC Compiler
As Fortran is a compiled language, we’d like a compiler that may compile Fortran source code. Unfortunately, MacOS doesn’t ship with a Fortran compiler pre-installed, so we’d like to put in one ourselves.
A preferred option is the GCC (GNU Compiler Collection) compiler, which you’ll install through Homebrew: brew install gcc
. The GCC compiler is a set of compilers for languages like C, Go, and naturally Fortran. The Fortran compiler within the GCC group known as gfortran, that may compile all major versions of Fortran comparable to 77, 90, 95, 2003, and 2008. It’s endorsed to make use of the .f90
extension for Fortran code files, although there’s some discussion on this topic.
To confirm that gfortran and GCC have been successfully installed, run the command which fortran
. The output should look something like this:
/opt/homebrew/bin/gfortran
The gfortran compiler is by far the preferred, nonetheless there are several other compilers on the market. An inventory of will be found here.
IDE’s & Text Editors
Once we have now our Fortran compiler, the subsequent step is to decide on an Integrated Development Environment (IDE) or text editor to write down our Fortran source code in. It is a matter of private preference since there are a lot of options available. Personally, I take advantage of PyCharm and install the Fortran plugin because I prefer to not have multiple IDEs. Other popular text editors suggested by the Fortran website include Sublime Text, Notepad++, and Emacs.
Running a Program
Before we go onto our first program, it is crucial to notice that I won’t be doing a syntax or command tutorial in this text. Linked here’s a short guide that can cover all the essential syntax.
Below is an easy program called example.f90
:
Here’s how we compile it:
gfortran -o example example.f90
This command compiles the code and creates an executable file named example
. You may replace example
with every other name you like. In the event you don’t specify a reputation using the -o
flag, the compiler will use a default name which is usually a.out
for many Unix based operating systems.
Here’s the best way to run the example
executable:
./example
The ./
prefix is included to point that the executable is in the present directory. The output from this command will seem like this:
Hello world
1
Now, lets tackle a more ‘real’ problem!
Overview
The knapsack problem is a widely known combinatorial optimization problem that poses:
A set of things, each with a worth and weight, have to be packed right into a knapsack that maximizes the overall value whilst respecting the load constraint of the knapsack
Although the issue sounds easy, the variety of solutions increases exponentially with the variety of items. Thus, making it intractable to unravel by brute force beyond a certain variety of items.
Heuristic methods comparable to genetic algorithms will be used to seek out a ‘ok’ or ‘approximate’ solution in an inexpensive period of time. In the event you’re considering learning the best way to solve the knapsack problem using the genetic algorithm, take a look at my previous post:
The knapsack problem has sundry applications in Data Science and Operations Research, including stock management and provide chain efficiency, rendering it essential to unravel efficiently for business decisions.
On this section, we are going to see how quickly Fortran can solve the knapsack problem by pure brute-force in comparison with Python.
Note: We will probably be specializing in the essential version, which is the 0–1 knapsack problem where each item is either fully within the knapsack or not in in any respect.
Python
Let’s start with Python.
The next code solves the knapsack problem for 22 items using a brute-force search. Each item is encoded as a 0 (not in) or 1 (in) in a 22-element length array (each element refers to an item). As each item has only 2 possible values, the variety of total mixtures is 2^(num_items)
. We utilise the itertools.product
method that computes the cartesian product of all of the possible solutions after which we iterate through them.
The output of this code:
Items in best solution:
Item 1: weight=10, value=10
Item 6: weight=60, value=68
Item 7: weight=70, value=75
Item 8: weight=80, value=58
Item 17: weight=170, value=200
Item 19: weight=190, value=300
Item 21: weight=210, value=400
Total value: 1111
Time taken: 13.78832197189331 seconds
Fortran
Now, let’s solve the identical problem, with the identical exact variables, but in Fortran. Unlike Python, Fortran doesn’t contain a package for performing permutations and mixtures operations.
Our approach is to make use of the modulo operator to convert the iteration number right into a binary representation. For instance, if the iteration number is 6, the modulo of 6 by 2 is 0, which suggests the primary item will not be chosen. We then divide the iteration number by 2 to shift the bits to the correct and take the modulo again to get the binary representation for the subsequent item. That is repeated for each item (so 22 times) and eventually leads us to getting every possible combination.
Compile and execute using the linux time
command:
time gfortran -o brute brute_force.f90
time ./brute
Output:
Items in best solution:
Item: 1 Weight: 10 Value: 10
Item: 6 Weight: 60 Value: 68
Item: 7 Weight: 70 Value: 75
Item: 8 Weight: 80 Value: 58
Item: 17 Weight: 170 Value: 200
Item: 19 Weight: 190 Value: 300
Item: 21 Weight: 210 Value: 400
Best value found: 1111
./brute 0.26s user 0.01s system 41% cpu 0.645 total
The Fortran code is ~21 times quicker!
Comparison
To get a more visual comparison, we will plot the execution time as a function of the variety of items:
Fortran blows Python out of the water!
Regardless that thte compute time for Fortran does increase, its growth will not be nearly as large because it is for Python. This truly displays the computational power of Fortran in terms of solving optimisation problems, that are of critical importance in lots of areas of Data Science.
Although Python has been the go-to for Data Science, languages like Fortran can still provide significant value especially when coping with optimisation problems because of its inherent number-crunching abilities. It outperforms Python in solving the knapsack problem by brute-force, and the performance gap widens further as more items are added to the issue. Due to this fact, as a Data Scientist, you would possibly want to think about investing your time in Fortran when you need an edge in computational power to unravel your corporation and industry problems.
The total code utilized in this text will be found at my GitHub here:
(All emojis designed by OpenMoji — the open-source emoji and icon project. License: CC BY-SA 4.0)