Allow us to consider a nonlinear function f(x), where x is a continuous variable. We would really like to search out the minimum value of this function f(x) by changing our decision variable x. The optimization problem can mathematically be formulated in the next manner:
Usually, the user comes across some constraints that must be considered. That may for instance be a minimum required temperature in a room, a constraint on how much pressure is placed on a fabric, or the minimum period of time you wish/have to wait until you drink your next coffee.
These constraints come either in the shape of equalities or inequalities. For the sake of simplicity, we are going to assume that we now have only inequality constraints, described by g(x)≤0. We will subsequently add this constraint to the optimization problem as as follows:
If the functions (f(x) and g(x), or only certainly one of them) are nonlinear, we would want to make use of nonlinear solvers. We attempt to avoid this, which is where a linearization step comes into play. So let’s pause our discussion about optimization and first take a look at a method that enables us to approximate a nonlinear function.
To visualise and higher understand this part, consider a logarithmic function f(x)=log(x) within the range of x=[1,6]:
# Get some packages
import numpy as np
import matplotlib.pyplot as plt# Create nonlinear function
def f(x):
return np.log(x)
# Define lower and upper values and an x-range
xlb = 1
xub = 6
numx = 100
x = np.linspace(xlb, xub, numx)
y = f(x)
# Plot the function
plt.figure()
plt.plot(x, y, color='blue', linestyle='--', label='Nonlinear f(x)')
plt.xlabel('x')
plt.ylabel('f(x)')
plt.legend(frameon=False, loc='lower right')
plt.show()
With this piece of code, we are able to produce the next plot:
Now, we start with the definition of a linear function, which has the next general form (with slope a and an intersection b):
Functions are here to explain something. That would for instance be a phyical behaviour or system. This method can probably be sampled, so we are able to select our x and might observe what our f(x) is (independent of the very fact if the system is linear or nonlinear). Example. If we cook Spaghetti in a pot of water, we could wait 1 minute (x1) and observe how well our Spaghetti are cooked (f(x1)). We will wait one other 5 minutes (x2) and check again how well the Spaghetti are done (f(x2)). I assume you recognize what I mean. 😄 🍝
If we now have an environment where we get some sample points out of it, we are able to use them to calculate the corresponding slope a and the intersection b of the road that connects those two points. Let’s say we now have 2 points, f(x1) and f(x2). This implies, because the linear model should hold in each points, we all know that these two equations hold in each points as well:
We’ve got two unknowns (a and b) and two equations, so we are able to solve for a and b:
Using these expressions, we are able to attempt to approximate our considered function f(x)=log(x) within the range of x=[1,6].
Let’s try this in Python. First, we take the lower and upper bounds as our two points to define the segment (remember, Python starts indexing at zero, so don’t be confused if it says “segment 0”, which is just our first considered segment):
num_segments = 1
x_segments = np.linspace(xlb, xub, num_segments+1)
for i in range(num_segments):
print(f'[+] Segment {i} goes from x={x_segments[i]:.2f} to x={x_segments[i+1]:.2f}.')
[+] Segment 0 goes from x=1.00 to x=6.00.
Then, we write a function that returns the slope and the intersection given two supporting points (sometimes also called “nodes”), namely x=[x1,x2] and y=[f(x1),f(x2)]:
def calc_slope_and_intersection(x, y):
slope = (y[1:] - y[:-1])/(x[1:] - x[:-1])
intersection = y[:-1] - slope*x[:-1]
return float(slope), float(intersection)
We then calculate the slopes and intersections and store the values in lists:
slope, intersection = [], []
for i in range(num_segments):
x_segment = x_segments[i:i+2] # Get the x values for the segment
y_segment = f(x_segment) # Sample from nonlinear environment
slope_, intersection_ = calc_slope_and_intersection(x_segment, y_segment)
slope.append(slope_)
intersection.append(intersection_)
print(f'[+] Linear function of segment {i}: x*{slope[i]:.2f}+({intersection[i]:.2f}).')
[+] Linear function of segment 0: x*0.36+(-0.36).
The slope is calculated to be a=0.36, where the intersection has the identical value of b=0.36. This brings us to the next linear equation as approximation:
We will plot this along with the unique logarithmic function:
# Function that creates a linear function from slope and intersection
def get_linear(slope, intersection, xlb, xub):
x_ = np.array([xlb, xub])
return slope*x_ + intersection# Plot the linear functions
plt.figure()
plt.plot(x, y, color='blue', linestyle='--', label='Nonlinear f(x)') # Nonlinear function
for i in range(num_segments):
x_segment = x_segments[i:i+2]
y_segment = get_linear(slope[i], intersection[i], x_segment[0], x_segment[1])
plt.plot(x_segment, y_segment, color='orange', linestyle='-', label='Linear segment' if i==0 else '__nolabel__') # Linear segments
plt.plot(x_segment, y_segment, color='black', linestyle='', marker='x', label='Nodes' if i==0 else '__nolabel__') # Nodes
plt.xlabel('x')
plt.ylabel('f(x)')
plt.legend(frameon=False, loc='lower right')
plt.show()
Well. That’s not very accurate. Nevertheless it’s linear, and it goes through the 2 nodes we used. So in essence, it does what we would like. Now we are able to sample the function a bit more often (use more of those nodes), which brings us to the piecewise linearization technique.
Allow us to split the complete x-range into n segments (i=1,2,…,n), and perform the above-shown linearization of the function f(x) in each segment. Within the equations below, the notation N describes the set of those segments, meaning N={1,2,…,n}. For every linear function, we’d like its individual slope and intersection:
Since we are able to define quite a lot of desired values for x after which sample the corresponding values for f(x) from our (unknown) function, we are able to again calculate the slopes and intersections. Allow us to subsequently define some more x-values for the various segments (the nodes). Say we use n=3.
The identical code snippets from above could be used, just adjust the variable num_segments to three. The x-ranges of the segments and their equations are given as follows:
[+] Segment 0 goes from x=1.00 to x=2.67.
[+] Segment 1 goes from x=2.67 to x=4.33.
[+] Segment 2 goes from x=4.33 to x=6.00.[+] Linear function of segment 0: x*0.59+(-0.59).
[+] Linear function of segment 1: x*0.29+(0.20).
[+] Linear function of segment 2: x*0.20+(0.62).
Looks a bit higher. We could further increase the variety of segments, but living at limits is just not at all times what we would like, so let’s keep on with n=3 in the meanwhile. 😎
With this, we identified a possible approximation of the nonlinear function f(x). Let’s return and replace this nonlinearity in our original optimization problem.
Again, the goal is to search out the minimum of the function f(x), considering that it must be above a given specification K. We will formulate this as shown above, where our inequality constraint g(x)≤0 is largely K-f(x)≤0:
Since we now have linearized our function in segments, all segments the optimization problem must be considered. This is feasible by creating one other variable z for every segment. This variable must be binary (either 0 or 1). It’s 0 if the segment is inactive, and 1 if the segment is energetic (meaning our minimum solution we’re on the lookout for is situated on this segment).
We will now sum up all of the linear functions of the segments and multiply them with the corresponding z variable.
We also consider that our optimal solution can only be in a single segment on the time. In other words, just one segment could be energetic, meaning the sum of all z variables must be 1:
Or in a rather different form:
Well, it looks like we didn’t a excellent job. This problem formulation is now nonlinear (resulting from the multiplication of x with z in the target function and the constraint) and even has integer variables (z).
Nevertheless, there is an easy trick that enables us rewriting such a “pseudo bilinear term” (which is a nonlinearity). First, we define an auxiliary variable x-tilde, which is the multiplication of the continual variable (x) with the binary variable (z):
Next, we’d like to define the domain by which x-tilde could be valid for the cases when z=0 (inactive segment) and z=1 (energetic segment). If the segment is inactive, our auxiliary variable must be zero. We achieve this by the next inequalities:
Where xmin and xmax are the lower- and upper “nodes” which we calculated above. You’ll be able to easily confirm that if z is zero, x-tilde is forced to be zero as well.
As well as, if z=1, the auxilary variable x-tilde must be bounded by the “true” continuous variable x:
It’s price to say here that xub is the upper certain of the continual variable x (not those of the intermediate nodes). With this, we are able to rewrite our optimization problem from above:
It is advisable to undergo it again with a cup of coffee ☕