Let’s start with a quite simple example and construct it up.
Example-1: Symmetric uint8 quantization
Let’s say we want to map the floating point range [0.0 .. 1000.0] to the quantized range [0 .. 255]. The range [0 .. 255] is the set of values that may slot in an unsigned 8-bit integer.
To perform this transformation, we would like to rescale the floating point range in order that the next is true:
Floating point 0.0 = Quantized 0
Floating point 1000.0 = Quantized 255
This known as symmetric quantization since the floating point 0.0 is quantized 0.
Hence, we define a scale, which is the same as
Where,
On this case, scale = 3.9215
To convert from a floating point value to a quantized value, we are able to simply divide the floating point value by the size. For instance, the floating point value 500.0 corresponds to the quantized value
In this easy example, the 0.0 of the floating point range maps exactly to the 0 within the quantized range. This known as symmetric quantization. Let’s see what happens when this isn’t the case.
Example-2: Affine uint8 quantization
Let’s say we want to map the floating point range [-20.0 .. 1000.0] to the quantized range [0 .. 255].
On this case, now we have a special scaling factor since our xmin is different.
Let’s see what the floating point number 0.0 is represented by within the quantized range if we apply the scaling factor to 0.0
Well, this doesn’t quite seem right since, in accordance with the diagram above, we might have expected the floating point value -20.0 to map to the quantized value 0.
That is where the concept of zero-point is available in. The zero-point acts as a bias for shifting the scaled floating point value and corresponds to the worth within the quantized range that represents the floating point value 0.0. In our case, the zero point is the negative of the scaled floating point representation of -20.0, which is -(-5) = 5. The zero point is at all times the negative of the representation of the minimum floating point value because the minimum will at all times be negative or zero. We’ll discover more about why that is the case within the section that explains example 4.
At any time when we quantize a worth, we’ll at all times add the zero-point to this scaled value to get the actual quantized value within the valid quantization range. In case we want to quantize the worth -20.0, we compute it because the scaled value of -20.0 plus the zero-point, which is -5 + 5 = 0. Hence, quantized(-20.0, scale=4, zp=5) = 0.
Example-3: Affine int8 quantization
What happens if our quantized range is a signed 8-bit integer as an alternative of an unsigned 8-bit integer? Well, the range is now [-128 .. 127].
On this case, -20.0 within the float range maps to -128 within the quantized range, and 1000.0 within the float range maps to 127 within the quantized range.
The best way we calculate zero point is that we compute it as if the quantized range is [0 .. 255] after which offset it with -128, so the zero point in the brand new range is
Hence, the zero-point for the brand new range is -123.
Thus far, we’ve checked out examples where the floating point range includes the worth 0.0. In the subsequent set of examples, we’ll take a take a look at what happens when the floating point range doesn’t include the worth 0.0
The importance of 0.0
Why is it essential for the floating point value 0.0 to be represented within the floating point range?
When using a padded convolution, we expect the border pixels to be padded using the worth 0.0 in essentially the most common case. Hence, it’s essential for 0.0 to be represented within the floating point range. Similarly, if the worth X goes for use for padding in your network, it is advisable be certain that that the worth X is represented within the floating point range and that quantization is aware of this.
Example-4: The untold story — skewed floating point range
Now, let’s take a take a look at what happens if 0.0 isn’t a part of the floating point range.
In this instance, we’re attempting to quantize the floating point range [40.0 .. 1000.0] into the quantized range [0 .. 255].
Since we are able to’t represent the worth 0.0 within the floating point range, we want to increase the lower limit of the range to 0.0.
We will see that some a part of the quantized range is wasted. To find out how much, let’s compute the quantized value that the floating point value 40.0 maps to.
Hence, we’re wasting the range [0 .. 9] within the quantized range, which is about 3.92% of the range. This might significantly affect the model’s accuracy post-quantization.
This skewing is crucial if we want to be certain that that the worth 0.0 within the floating point range could be represented within the quantized range.
Another excuse for including the worth 0.0 within the floating point range is that efficiently comparing a quantized value to ascertain if it’s 0.0 within the floating point range may be very priceless. Consider operators akin to ReLU, which clip all values below 0.0 within the floating point range to 0.0.
It will be significant for us to have the opportunity to represent the zero-point using the identical data type (signed or unsigned int8) because the quantized values. This allows us to perform these comparisons quickly and efficiently.
Next, let’s take a take a look at how activation normalization helps with model quantization. We’ll specifically give attention to how the standardization of the activation values allows us to make use of your entire quantized range effectively.