What Does a Lock Actually Cost? Benchmarking Concurrent Counters in C++
Four concurrent counter implementations (mutex, atomic, and LongAdder-style striped) measured across thread counts on x86 and ARM. False sharing costs 31x. relaxed vs seq_cst costs nothing for RMW, and the disassembly explains why. Real numbers, real hardware.






