Principle#
To find the absolute value of a number, you simply convert a negative number to a positive one, which can be achieved by calculating its two's complement (one's complement plus one).
Code#
template <typename T, class = typename std::enable_if_t<!std::is_unsigned_v<T>>>
inline typename std::make_unsigned_t<T> mabs(T _val) {
const T mask = _val >> (sizeof(T) * 8 - 1);
return (_val ^ mask) - mask;
}
Principle Analysis#
Taking -12 as an example: the two's complement of -12 is 10100
(assuming the machine word length is 5 bits).
The steps are as follows:
-
First, obtain the mask
mask
:const T mask = _val >> (sizeof(T) * 8 - 1);
Shifting
10100
left by 4 bits gives11111
. Note that11111
is a very clever design.The reason we choose to right shift to obtain the mask instead of directly right shifting four bits and using & with 0x1 to get the sign bit is that: the mask
11111
is actually entirely composed of the sign bit. If the sign bit is 1, the generated mask can be directly XORed with the original data to get the one's complement, and then adding 1 gives the two's complement.If the sign bit is 0, the resulting mask will be all 0s, and XORing 0 with any number yields the number itself.
-
Calculate the one's complement:
_val ^= mask;
As shown in the previous step,
10100
XOR11111
equals01011
, which gives the one's complement.If a positive number were here, the mask would be
00000
, and XORing would leave it unchanged. -
Add 1:
_val - mask;
01011
-11111
=01100
, resulting in the absolute value of -12, which is 12.The reason adding one becomes subtracting the mask is simple: the mask we obtained earlier is
11111
, which is actually the two's complement of -1. To add 1, we just need to subtract the mask.If a positive number were here, the mask would be
00000
, and subtracting 0 would have no effect.
In this way, we have implemented the algorithm for calculating the absolute value. A negative number will be converted to a positive one, while a positive number remains unchanged. Isn't it simple?
Benchmark#
#include "benchmark/benchmark.h"
template<typename T, class = typename std::enable_if_t<!std::is_unsigned_v<T>>>
inline typename std::make_unsigned_t<T> mabs(T _val) {
const T mask = _val >> (sizeof(T) * 8 - 1);
return (_val + mask) ^ mask;
}
static void BM_mabs(benchmark::State &state) {
for (auto _: state) {
benchmark::DoNotOptimize(mabs(state.range(0)));
}
}
static void BM_std_abs(benchmark::State &state) {
for (auto _: state) {
benchmark::DoNotOptimize(std::abs(state.range(0)));
}
}
BENCHMARK(BM_mabs)->RangeMultiplier(32)->Range(INT64_MIN, INT64_MAX);
BENCHMARK(BM_std_abs)->RangeMultiplier(32)->Range(INT64_MIN, INT64_MAX);
BENCHMARK_MAIN();
Below are the results obtained using a MacBook Air (M1, 2020) and Apple clang 13.1.6
/Users/hominsu/CLionProjects/bit-hacks-bench/cmake-build-release-appleclang/bench/abs
Unable to determine clock rate from sysctl: hw.cpufrequency: No such file or directory
2022-03-26T13:00:40+08:00
Running /Users/hominsu/CLionProjects/bit-hacks-bench/cmake-build-release-appleclang/bench/abs
Run on (8 X 24.1207 MHz CPU s)
CPU Caches:
L1 Data 64 KiB (x8)
L1 Instruction 128 KiB (x8)
L2 Unified 4096 KiB (x2)
Load Average: 2.42, 2.09, 2.29
--------------------------------------------------------------------------
Benchmark Time CPU Iterations
--------------------------------------------------------------------------
BM_mabs/-9223372036854775808 0.340 ns 0.340 ns 1000000000
BM_mabs/-1152921504606846976 0.344 ns 0.344 ns 1000000000
BM_mabs/-36028797018963968 0.342 ns 0.342 ns 1000000000
BM_mabs/-1125899906842624 0.336 ns 0.336 ns 1000000000
BM_mabs/-35184372088832 0.343 ns 0.343 ns 1000000000
BM_mabs/-1099511627776 0.342 ns 0.342 ns 1000000000
BM_mabs/-34359738368 0.343 ns 0.343 ns 1000000000
BM_mabs/-1073741824 0.343 ns 0.343 ns 1000000000
BM_mabs/-33554432 0.344 ns 0.344 ns 1000000000
BM_mabs/-1048576 0.345 ns 0.345 ns 1000000000
BM_mabs/-32768 0.344 ns 0.344 ns 1000000000
BM_mabs/-1024 0.343 ns 0.343 ns 1000000000
BM_mabs/-32 0.347 ns 0.347 ns 1000000000
BM_mabs/-1 0.343 ns 0.343 ns 1000000000
BM_mabs/0 0.344 ns 0.344 ns 1000000000
BM_mabs/1 0.346 ns 0.346 ns 1000000000
BM_mabs/32 0.341 ns 0.341 ns 1000000000
BM_mabs/1024 0.346 ns 0.346 ns 1000000000
BM_mabs/32768 0.347 ns 0.347 ns 1000000000
BM_mabs/1048576 0.348 ns 0.348 ns 1000000000
BM_mabs/33554432 0.344 ns 0.344 ns 1000000000
BM_mabs/1073741824 0.340 ns 0.340 ns 1000000000
BM_mabs/34359738368 0.350 ns 0.350 ns 1000000000
BM_mabs/1099511627776 0.343 ns 0.343 ns 1000000000
BM_mabs/35184372088832 0.339 ns 0.339 ns 1000000000
BM_mabs/1125899906842624 0.341 ns 0.341 ns 1000000000
BM_mabs/36028797018963968 0.341 ns 0.341 ns 1000000000
BM_mabs/1152921504606846976 0.348 ns 0.348 ns 1000000000
BM_mabs/9223372036854775807 0.346 ns 0.346 ns 1000000000
BM_std_abs/-9223372036854775808 0.340 ns 0.340 ns 1000000000
BM_std_abs/-1152921504606846976 0.345 ns 0.345 ns 1000000000
BM_std_abs/-36028797018963968 0.347 ns 0.347 ns 1000000000
BM_std_abs/-1125899906842624 0.340 ns 0.340 ns 1000000000
BM_std_abs/-35184372088832 0.348 ns 0.348 ns 1000000000
BM_std_abs/-1099511627776 0.345 ns 0.345 ns 1000000000
BM_std_abs/-34359738368 0.347 ns 0.347 ns 1000000000
BM_std_abs/-1073741824 0.344 ns 0.344 ns 1000000000
BM_std_abs/-33554432 0.342 ns 0.342 ns 1000000000
BM_std_abs/-1048576 0.347 ns 0.347 ns 1000000000
BM_std_abs/-32768 0.345 ns 0.345 ns 1000000000
BM_std_abs/-1024 0.347 ns 0.347 ns 1000000000
BM_std_abs/-32 0.348 ns 0.348 ns 1000000000
BM_std_abs/-1 0.341 ns 0.341 ns 1000000000
BM_std_abs/0 0.352 ns 0.352 ns 1000000000
BM_std_abs/1 0.346 ns 0.346 ns 1000000000
BM_std_abs/32 0.348 ns 0.348 ns 1000000000
BM_std_abs/1024 0.344 ns 0.344 ns 1000000000
BM_std_abs/32768 0.342 ns 0.342 ns 1000000000
BM_std_abs/1048576 0.348 ns 0.348 ns 1000000000
BM_std_abs/33554432 0.346 ns 0.346 ns 1000000000
BM_std_abs/1073741824 0.346 ns 0.346 ns 1000000000
BM_std_abs/34359738368 0.347 ns 0.347 ns 1000000000
BM_std_abs/1099511627776 0.343 ns 0.343 ns 1000000000
BM_std_abs/35184372088832 0.347 ns 0.347 ns 1000000000
BM_std_abs/1125899906842624 0.351 ns 0.351 ns 1000000000
BM_std_abs/36028797018963968 0.340 ns 0.340 ns 1000000000
BM_std_abs/1152921504606846976 0.346 ns 0.346 ns 1000000000
BM_std_abs/9223372036854775807 0.343 ns 0.343 ns 1000000000
Below are the results obtained using i5-9500 and gcc 8.5.0 (Red Hat 8.5.0-10)
/tmp/tmp.CtmwmpTLjC/cmake-build-release-1104/bench/abs
2022-03-26T13:10:38+08:00
Running /tmp/tmp.CtmwmpTLjC/cmake-build-release-1104/bench/abs
Run on (6 X 4138.24 MHz CPU s)
CPU Caches:
L1 Data 32 KiB (x6)
L1 Instruction 32 KiB (x6)
L2 Unified 256 KiB (x6)
L3 Unified 9216 KiB (x1)
Load Average: 0.94, 0.61, 0.49
--------------------------------------------------------------------------
Benchmark Time CPU Iterations
--------------------------------------------------------------------------
BM_mabs/-9223372036854775808 0.366 ns 0.366 ns 1000000000
BM_mabs/-1152921504606846976 0.367 ns 0.367 ns 1000000000
BM_mabs/-36028797018963968 0.366 ns 0.365 ns 1000000000
BM_mabs/-1125899906842624 0.368 ns 0.367 ns 1000000000
BM_mabs/-35184372088832 0.370 ns 0.370 ns 1000000000
BM_mabs/-1099511627776 0.364 ns 0.364 ns 1000000000
BM_mabs/-34359738368 0.370 ns 0.370 ns 1000000000
BM_mabs/-1073741824 0.364 ns 0.364 ns 1000000000
BM_mabs/-33554432 0.367 ns 0.367 ns 1000000000
BM_mabs/-1048576 0.368 ns 0.368 ns 1000000000
BM_mabs/-32768 0.371 ns 0.370 ns 1000000000
BM_mabs/-1024 0.365 ns 0.365 ns 1000000000
BM_mabs/-32 0.365 ns 0.365 ns 1000000000
BM_mabs/-1 0.364 ns 0.363 ns 1000000000
BM_mabs/0 0.368 ns 0.368 ns 1000000000
BM_mabs/1 0.366 ns 0.365 ns 1000000000
BM_mabs/32 0.364 ns 0.363 ns 1000000000
BM_mabs/1024 0.368 ns 0.368 ns 1000000000
BM_mabs/32768 0.370 ns 0.369 ns 1000000000
BM_mabs/1048576 0.368 ns 0.367 ns 1000000000
BM_mabs/33554432 0.366 ns 0.366 ns 1000000000
BM_mabs/1073741824 0.367 ns 0.366 ns 1000000000
BM_mabs/34359738368 0.371 ns 0.371 ns 1000000000
BM_mabs/1099511627776 0.368 ns 0.367 ns 1000000000
BM_mabs/35184372088832 0.370 ns 0.369 ns 1000000000
BM_mabs/1125899906842624 0.363 ns 0.362 ns 1000000000
BM_mabs/36028797018963968 0.368 ns 0.367 ns 1000000000
BM_mabs/1152921504606846976 0.372 ns 0.372 ns 1000000000
BM_mabs/9223372036854775807 0.367 ns 0.366 ns 1000000000
BM_std_abs/-9223372036854775808 0.491 ns 0.490 ns 1000000000
BM_std_abs/-1152921504606846976 0.484 ns 0.484 ns 1000000000
BM_std_abs/-36028797018963968 0.490 ns 0.490 ns 1000000000
BM_std_abs/-1125899906842624 0.487 ns 0.486 ns 1000000000
BM_std_abs/-35184372088832 0.493 ns 0.493 ns 1000000000
BM_std_abs/-1099511627776 0.486 ns 0.485 ns 1000000000
BM_std_abs/-34359738368 0.491 ns 0.491 ns 1000000000
BM_std_abs/-1073741824 0.487 ns 0.486 ns 1000000000
BM_std_abs/-33554432 0.490 ns 0.490 ns 1000000000
BM_std_abs/-1048576 0.489 ns 0.488 ns 1000000000
BM_std_abs/-32768 0.494 ns 0.493 ns 1000000000
BM_std_abs/-1024 0.490 ns 0.489 ns 1000000000
BM_std_abs/-32 0.491 ns 0.490 ns 1000000000
BM_std_abs/-1 0.486 ns 0.486 ns 1000000000
BM_std_abs/0 0.492 ns 0.492 ns 1000000000
BM_std_abs/1 0.487 ns 0.486 ns 1000000000
BM_std_abs/32 0.487 ns 0.486 ns 1000000000
BM_std_abs/1024 0.493 ns 0.492 ns 1000000000
BM_std_abs/32768 0.489 ns 0.489 ns 1000000000
BM_std_abs/1048576 0.492 ns 0.491 ns 1000000000
BM_std_abs/33554432 0.487 ns 0.486 ns 1000000000
BM_std_abs/1073741824 0.493 ns 0.492 ns 1000000000
BM_std_abs/34359738368 0.486 ns 0.486 ns 1000000000
BM_std_abs/1099511627776 0.493 ns 0.492 ns 1000000000
BM_std_abs/35184372088832 0.489 ns 0.488 ns 1000000000
BM_std_abs/1125899906842624 0.491 ns 0.491 ns 1000000000
BM_std_abs/36028797018963968 0.490 ns 0.490 ns 1000000000
BM_std_abs/1152921504606846976 0.494 ns 0.493 ns 1000000000
BM_std_abs/9223372036854775807 0.491 ns 0.490 ns 1000000000