Single-threaded recursive algorithms
There are algorithms that per design are usually not a subject of parallelization — recursive algorithms. In recursion, the present value depends upon the previous values — one easy but clear example is the algorithm to calculate the Fibonacci number. An exemplary implementation is below. It’s inconceivable on this case to interrupt the chain of calculations and run them in parallel.
One other example of such algorithm is a recursive calculation of a factorial (see below).
Memory-Intensive Tasks
There are tasks where the memory access time is a bottleneck, not computations themselves. CPUs normally have larger cache sizes (fast memory access element) than GPUs and have faster memory subsystems which permit them to excel at manipulating often accessed data. A straightforward example may be an element-wise addition of huge arrays.
Nonetheless, in lots of cases, popular frameworks (like Pytorch) will perform such calculations on GPU faster by moving the objects to the GPU’s memory and parallelizing operations under the hood.
We are able to create a process where we initialize arrays in RAM and move them to the GPU for calculations. This extra overhead of transferring data causes end-to-end processing time to be longer than when running it directly on the CPU.
That’s after we normally use so-called CUDA-enabled arrays — on this case, using Pytorch. You will need to only ensure that that your GPU can handle this size of information. To offer you an summary — typical, popular GPUs have a memory size of two–6GB VRAM, while the high-end ones have as much as 24GB VRAM (GeForce RTX 4090).
Other Non-parallelizable Algorithms
There’s a bunch of algorithms that are usually not recursive but still can’t be parallelized. Some examples are:
- Gradient Descent — utilized in optimization tasks and machine learning
- Hash-chaining — utilized in cryptography
The Gradient Descent can’t be parallelized in its vanilla form, since it is a sequential algorithm. Every iteration (called a step) depends upon the outcomes of the previous one. There are, nonetheless, some studies on learn how to implement this algorithm in a parallel manner. To learn more check:
An example of the Hash-chaining algorithm you could find here: https://www.geeksforgeeks.org/c-program-hashing-chaining/
Small tasks
One other case when CPUs are a better option is when the info size could be very small. In such situations, the overhead of transferring data between the RAM and GPU memory (VRAM) can outweigh the good thing about GPU parallelism. It is because of the very fast access to the CPU cache. It was mentioned previously in a piece related to memory-intensive tasks.
Also, some tasks are just too small and although the calculations may be run in parallel, the profit to the tip user shouldn’t be visible. In such cases running on GPU generates only the extra hardware-related costs.
That’s why in IoT, GPUs are usually not commonly used. Typical IoT tasks are:
- to capture some sensor data and send them over
- to activate other devices (lights, alarms, motors, etc.) after detecting a signal
Nonetheless, on this field GPUs are still utilized in so-called edge-computing tasks. These are the situations when you might have to accumulate and process data directly at its source as a substitute of sending them over the Web for heavy processing. An excellent example is iFACTORY developed by BMW.
Task with small level of parallelization
There are many use cases where you might have to run the code in parallel but because of the speed of CPU it is sufficient to parallelize the method using multi-core CPU. GPU excel in situations where you wish a large parallelization (a whole bunch or 1000’s of parallel operations). In cases where you discover that, e.g. 4x or 6x speed up is enough you possibly can reduce costs by running the code on CPU, each process on different core. Nowadays, manufacturers of CPU offer them with between 2 and 18 cores (e.g. Intel Core i9–9980XE Extreme Edition Processor).
Summary
Overall, the rule of thumb when selecting between CPU and GPU is to reply these foremost questions:
- Can a CPU handle all the task throughout the required time?
- Can my code be parallelized?
- Can I fit all the info on a GPU? If not does it introduce a heave overhead?
To reply these questions, its crucial to know well each how your algorithms work and what are the business requirements now and the way can they modify in the longer term.