Homing So

Homing So

🎉 哈喽啊~各位
github

Absolute Value — "C/C++ Bit Manipulation Black Technology 01"

Principle#

To find the absolute value of a number, you simply convert a negative number to a positive one, which can be achieved by calculating its two's complement (one's complement plus one).

Code#

template <typename T, class = typename std::enable_if_t<!std::is_unsigned_v<T>>>
inline typename std::make_unsigned_t<T> mabs(T _val) { 
    const T mask = _val >> (sizeof(T) * 8 - 1);
    return (_val ^ mask) - mask;
}

Principle Analysis#

Taking -12 as an example: the two's complement of -12 is 10100 (assuming the machine word length is 5 bits).

The steps are as follows:

  • First, obtain the mask mask:

    const T mask = _val >> (sizeof(T) * 8 - 1);
    

    Shifting 10100 left by 4 bits gives 11111. Note that 11111 is a very clever design.

    The reason we choose to right shift to obtain the mask instead of directly right shifting four bits and using & with 0x1 to get the sign bit is that: the mask 11111 is actually entirely composed of the sign bit. If the sign bit is 1, the generated mask can be directly XORed with the original data to get the one's complement, and then adding 1 gives the two's complement.

    If the sign bit is 0, the resulting mask will be all 0s, and XORing 0 with any number yields the number itself.

  • Calculate the one's complement:

    _val ^= mask;
    

    As shown in the previous step, 10100 XOR 11111 equals 01011, which gives the one's complement.

    If a positive number were here, the mask would be 00000, and XORing would leave it unchanged.

  • Add 1:

    _val - mask;
    

    01011 - 11111 = 01100, resulting in the absolute value of -12, which is 12.

    The reason adding one becomes subtracting the mask is simple: the mask we obtained earlier is 11111, which is actually the two's complement of -1. To add 1, we just need to subtract the mask.

    If a positive number were here, the mask would be 00000, and subtracting 0 would have no effect.

In this way, we have implemented the algorithm for calculating the absolute value. A negative number will be converted to a positive one, while a positive number remains unchanged. Isn't it simple?

Benchmark#

#include "benchmark/benchmark.h"

template<typename T, class = typename std::enable_if_t<!std::is_unsigned_v<T>>>
inline typename std::make_unsigned_t<T> mabs(T _val) {
  const T mask = _val >> (sizeof(T) * 8 - 1);
  return (_val + mask) ^ mask;
}

static void BM_mabs(benchmark::State &state) {
  for (auto _: state) {
    benchmark::DoNotOptimize(mabs(state.range(0)));
  }
}

static void BM_std_abs(benchmark::State &state) {
  for (auto _: state) {
    benchmark::DoNotOptimize(std::abs(state.range(0)));
  }
}

BENCHMARK(BM_mabs)->RangeMultiplier(32)->Range(INT64_MIN, INT64_MAX);
BENCHMARK(BM_std_abs)->RangeMultiplier(32)->Range(INT64_MIN, INT64_MAX);

BENCHMARK_MAIN();

Below are the results obtained using a MacBook Air (M1, 2020) and Apple clang 13.1.6

/Users/hominsu/CLionProjects/bit-hacks-bench/cmake-build-release-appleclang/bench/abs
Unable to determine clock rate from sysctl: hw.cpufrequency: No such file or directory
2022-03-26T13:00:40+08:00
Running /Users/hominsu/CLionProjects/bit-hacks-bench/cmake-build-release-appleclang/bench/abs
Run on (8 X 24.1207 MHz CPU s)
CPU Caches:
  L1 Data 64 KiB (x8)
  L1 Instruction 128 KiB (x8)
  L2 Unified 4096 KiB (x2)
Load Average: 2.42, 2.09, 2.29
--------------------------------------------------------------------------
Benchmark                                Time             CPU   Iterations
--------------------------------------------------------------------------
BM_mabs/-9223372036854775808         0.340 ns        0.340 ns   1000000000
BM_mabs/-1152921504606846976         0.344 ns        0.344 ns   1000000000
BM_mabs/-36028797018963968           0.342 ns        0.342 ns   1000000000
BM_mabs/-1125899906842624            0.336 ns        0.336 ns   1000000000
BM_mabs/-35184372088832              0.343 ns        0.343 ns   1000000000
BM_mabs/-1099511627776               0.342 ns        0.342 ns   1000000000
BM_mabs/-34359738368                 0.343 ns        0.343 ns   1000000000
BM_mabs/-1073741824                  0.343 ns        0.343 ns   1000000000
BM_mabs/-33554432                    0.344 ns        0.344 ns   1000000000
BM_mabs/-1048576                     0.345 ns        0.345 ns   1000000000
BM_mabs/-32768                       0.344 ns        0.344 ns   1000000000
BM_mabs/-1024                        0.343 ns        0.343 ns   1000000000
BM_mabs/-32                          0.347 ns        0.347 ns   1000000000
BM_mabs/-1                           0.343 ns        0.343 ns   1000000000
BM_mabs/0                            0.344 ns        0.344 ns   1000000000
BM_mabs/1                            0.346 ns        0.346 ns   1000000000
BM_mabs/32                           0.341 ns        0.341 ns   1000000000
BM_mabs/1024                         0.346 ns        0.346 ns   1000000000
BM_mabs/32768                        0.347 ns        0.347 ns   1000000000
BM_mabs/1048576                      0.348 ns        0.348 ns   1000000000
BM_mabs/33554432                     0.344 ns        0.344 ns   1000000000
BM_mabs/1073741824                   0.340 ns        0.340 ns   1000000000
BM_mabs/34359738368                  0.350 ns        0.350 ns   1000000000
BM_mabs/1099511627776                0.343 ns        0.343 ns   1000000000
BM_mabs/35184372088832               0.339 ns        0.339 ns   1000000000
BM_mabs/1125899906842624             0.341 ns        0.341 ns   1000000000
BM_mabs/36028797018963968            0.341 ns        0.341 ns   1000000000
BM_mabs/1152921504606846976          0.348 ns        0.348 ns   1000000000
BM_mabs/9223372036854775807          0.346 ns        0.346 ns   1000000000
BM_std_abs/-9223372036854775808      0.340 ns        0.340 ns   1000000000
BM_std_abs/-1152921504606846976      0.345 ns        0.345 ns   1000000000
BM_std_abs/-36028797018963968        0.347 ns        0.347 ns   1000000000
BM_std_abs/-1125899906842624         0.340 ns        0.340 ns   1000000000
BM_std_abs/-35184372088832           0.348 ns        0.348 ns   1000000000
BM_std_abs/-1099511627776            0.345 ns        0.345 ns   1000000000
BM_std_abs/-34359738368              0.347 ns        0.347 ns   1000000000
BM_std_abs/-1073741824               0.344 ns        0.344 ns   1000000000
BM_std_abs/-33554432                 0.342 ns        0.342 ns   1000000000
BM_std_abs/-1048576                  0.347 ns        0.347 ns   1000000000
BM_std_abs/-32768                    0.345 ns        0.345 ns   1000000000
BM_std_abs/-1024                     0.347 ns        0.347 ns   1000000000
BM_std_abs/-32                       0.348 ns        0.348 ns   1000000000
BM_std_abs/-1                        0.341 ns        0.341 ns   1000000000
BM_std_abs/0                         0.352 ns        0.352 ns   1000000000
BM_std_abs/1                         0.346 ns        0.346 ns   1000000000
BM_std_abs/32                        0.348 ns        0.348 ns   1000000000
BM_std_abs/1024                      0.344 ns        0.344 ns   1000000000
BM_std_abs/32768                     0.342 ns        0.342 ns   1000000000
BM_std_abs/1048576                   0.348 ns        0.348 ns   1000000000
BM_std_abs/33554432                  0.346 ns        0.346 ns   1000000000
BM_std_abs/1073741824                0.346 ns        0.346 ns   1000000000
BM_std_abs/34359738368               0.347 ns        0.347 ns   1000000000
BM_std_abs/1099511627776             0.343 ns        0.343 ns   1000000000
BM_std_abs/35184372088832            0.347 ns        0.347 ns   1000000000
BM_std_abs/1125899906842624          0.351 ns        0.351 ns   1000000000
BM_std_abs/36028797018963968         0.340 ns        0.340 ns   1000000000
BM_std_abs/1152921504606846976       0.346 ns        0.346 ns   1000000000
BM_std_abs/9223372036854775807       0.343 ns        0.343 ns   1000000000

Below are the results obtained using i5-9500 and gcc 8.5.0 (Red Hat 8.5.0-10)

/tmp/tmp.CtmwmpTLjC/cmake-build-release-1104/bench/abs
2022-03-26T13:10:38+08:00
Running /tmp/tmp.CtmwmpTLjC/cmake-build-release-1104/bench/abs
Run on (6 X 4138.24 MHz CPU s)
CPU Caches:
  L1 Data 32 KiB (x6)
  L1 Instruction 32 KiB (x6)
  L2 Unified 256 KiB (x6)
  L3 Unified 9216 KiB (x1)
Load Average: 0.94, 0.61, 0.49
--------------------------------------------------------------------------
Benchmark                                Time             CPU   Iterations
--------------------------------------------------------------------------
BM_mabs/-9223372036854775808         0.366 ns        0.366 ns   1000000000
BM_mabs/-1152921504606846976         0.367 ns        0.367 ns   1000000000
BM_mabs/-36028797018963968           0.366 ns        0.365 ns   1000000000
BM_mabs/-1125899906842624            0.368 ns        0.367 ns   1000000000
BM_mabs/-35184372088832              0.370 ns        0.370 ns   1000000000
BM_mabs/-1099511627776               0.364 ns        0.364 ns   1000000000
BM_mabs/-34359738368                 0.370 ns        0.370 ns   1000000000
BM_mabs/-1073741824                  0.364 ns        0.364 ns   1000000000
BM_mabs/-33554432                    0.367 ns        0.367 ns   1000000000
BM_mabs/-1048576                     0.368 ns        0.368 ns   1000000000
BM_mabs/-32768                       0.371 ns        0.370 ns   1000000000
BM_mabs/-1024                        0.365 ns        0.365 ns   1000000000
BM_mabs/-32                          0.365 ns        0.365 ns   1000000000
BM_mabs/-1                           0.364 ns        0.363 ns   1000000000
BM_mabs/0                            0.368 ns        0.368 ns   1000000000
BM_mabs/1                            0.366 ns        0.365 ns   1000000000
BM_mabs/32                           0.364 ns        0.363 ns   1000000000
BM_mabs/1024                         0.368 ns        0.368 ns   1000000000
BM_mabs/32768                        0.370 ns        0.369 ns   1000000000
BM_mabs/1048576                      0.368 ns        0.367 ns   1000000000
BM_mabs/33554432                     0.366 ns        0.366 ns   1000000000
BM_mabs/1073741824                   0.367 ns        0.366 ns   1000000000
BM_mabs/34359738368                  0.371 ns        0.371 ns   1000000000
BM_mabs/1099511627776                0.368 ns        0.367 ns   1000000000
BM_mabs/35184372088832               0.370 ns        0.369 ns   1000000000
BM_mabs/1125899906842624             0.363 ns        0.362 ns   1000000000
BM_mabs/36028797018963968            0.368 ns        0.367 ns   1000000000
BM_mabs/1152921504606846976          0.372 ns        0.372 ns   1000000000
BM_mabs/9223372036854775807          0.367 ns        0.366 ns   1000000000
BM_std_abs/-9223372036854775808      0.491 ns        0.490 ns   1000000000
BM_std_abs/-1152921504606846976      0.484 ns        0.484 ns   1000000000
BM_std_abs/-36028797018963968        0.490 ns        0.490 ns   1000000000
BM_std_abs/-1125899906842624         0.487 ns        0.486 ns   1000000000
BM_std_abs/-35184372088832           0.493 ns        0.493 ns   1000000000
BM_std_abs/-1099511627776            0.486 ns        0.485 ns   1000000000
BM_std_abs/-34359738368              0.491 ns        0.491 ns   1000000000
BM_std_abs/-1073741824               0.487 ns        0.486 ns   1000000000
BM_std_abs/-33554432                 0.490 ns        0.490 ns   1000000000
BM_std_abs/-1048576                  0.489 ns        0.488 ns   1000000000
BM_std_abs/-32768                    0.494 ns        0.493 ns   1000000000
BM_std_abs/-1024                     0.490 ns        0.489 ns   1000000000
BM_std_abs/-32                       0.491 ns        0.490 ns   1000000000
BM_std_abs/-1                        0.486 ns        0.486 ns   1000000000
BM_std_abs/0                         0.492 ns        0.492 ns   1000000000
BM_std_abs/1                         0.487 ns        0.486 ns   1000000000
BM_std_abs/32                        0.487 ns        0.486 ns   1000000000
BM_std_abs/1024                      0.493 ns        0.492 ns   1000000000
BM_std_abs/32768                     0.489 ns        0.489 ns   1000000000
BM_std_abs/1048576                   0.492 ns        0.491 ns   1000000000
BM_std_abs/33554432                  0.487 ns        0.486 ns   1000000000
BM_std_abs/1073741824                0.493 ns        0.492 ns   1000000000
BM_std_abs/34359738368               0.486 ns        0.486 ns   1000000000
BM_std_abs/1099511627776             0.493 ns        0.492 ns   1000000000
BM_std_abs/35184372088832            0.489 ns        0.488 ns   1000000000
BM_std_abs/1125899906842624          0.491 ns        0.491 ns   1000000000
BM_std_abs/36028797018963968         0.490 ns        0.490 ns   1000000000
BM_std_abs/1152921504606846976       0.494 ns        0.493 ns   1000000000
BM_std_abs/9223372036854775807       0.491 ns        0.490 ns   1000000000
Loading...
Ownership of this post data is guaranteed by blockchain and smart contracts to the creator alone.