Atomic Test And Set Of Disk Block Returned False For Equality -

Two threads tried to write at the exact same nanosecond. Thread A won. Thread B performed the test, saw that Thread A already wrote data, and threw the error. This is actually good—it prevents corruption. But if this happens constantly, you have a locking contention problem.

For Pacemaker/Corosync:

pcs status
crm_verify -L -V
pcs cluster cib | grep reservation

Scenario: A node caches disk block values but fails to invalidate the cache after a write from another node.
Result: The node issues a test-and-set based on stale data, causing an unexpected failure.
Solution: Disable aggressive caching for shared block devices; use O_DIRECT or O_SYNC where appropriate. Two threads tried to write at the exact same nanosecond

Reorder writes so that the TAS block is the last write in a critical section. Use fdatasync() or O_SYNC to ensure the TAS write is persisted before proceeding. This prevents scenarios where a crash leaves the block in an unexpected state after recovery. Scenario: A node caches disk block values but

Scenario: Two processes attempt to acquire the same disk-based lock.
Result: One succeeds; the other receives the "false for equality" error and should retry or fail gracefully.
Solution: Implement exponential backoff and retry logic. Environment: 10-node Ceph cluster

To detect corruption before test-and-set:

uint32_t stored_crc = read_crc(block);
uint32_t computed_crc = crc32(block_data);
if (stored_crc != computed_crc) 
    repair_block_from_replica();
ret = atomic_compare_and_swap(block, expected, new);

Environment: 10-node Ceph cluster, BlueStore backend, NVMe-over-Fabrics.
Error: OSD logs repeated: bluestore/StupidAllocator.cc: atomic test and set of disk block 0x4a20b returned false for equality.
Root cause: A network partition caused two OSDs to believe they held the same allocation bitmap lock. The storage array (NVMe target) correctly rejected the second OSD’s compare-and-write.
Fix: Reduced osd_heartbeat_grace from 20s to 5s, enabled faster fencing, and implemented retry logic with jitter.