summaryrefslogtreecommitdiff
path: root/compiler/optimizing/graph_visualizer.cc
diff options
context:
space:
mode:
authorHans Boehm <hboehm@google.com>2019-08-30 16:14:32 -0700
committerHans Boehm <hboehm@google.com>2019-11-18 18:06:50 +0000
commit0d508a01106746e0d8865752850f4f03bcce1e01 (patch)
tree727e9b6a420c3ff1206e97a9ab0369b9947cd797 /compiler/optimizing/graph_visualizer.cc
parent25d536e67fc69e0413298989f1b21b6fdcece682 (diff)
Add spin loop to mutex, overhaul monitor
Since Linux context switch overhead is typically larger than a microsecond, this may greatly reduce the overhead of waiting for a mutex that is only briefly held by another thread. Rather than going to sleep and having to be woken back up again, at a cost of several microseconds, we just spin, hopefully for much less than microsecond, until the mutex becomes available. It does waste some CPU cycles when spinning fails, either because the lock is held too long, or we are being scheduled against the thread holding the lock. But we expect those to be unlikely. We test the lock only a few times, with pauses in between. It's unclear that's beneficial; we should perhaps just loop reading the variable. In general, this needs further tuning. Add a test that mutual exclusion works, which can also be run as lock microbenchmark. The old monitor implementation did not benefit much from this, because it used mutex only as a low-level lock to protect the monitor data structure. Instead use monitor_lock_ as the actual lock providing mutual exclusion for the monitor, i.e. hold onto it while the monitor is fatlocked. Among other things, this requires that the monitor_lock_ always be acquired by, or explicitly on behalf of, the thread holding the monitor. This in turn makes it really hard to deflate a monitor held by another thread. Just stop doing that, since it was unclear whether that's actually beneficial. The main advantages of the monitor change are: - Half the number of mutex acquisitions. - Easier to effectively spin. - No possibility of blocking while trying to release a monitor. No longer compute owner method and dex pc values on monitor entry unless we're tracing. This was expensive and increased lock hold times sufficiently that it often made spinning ineffective. Have mutex acquisition call futex wait in a loop between updating waiter count. The old way resulted in extra futex wakeups in highly contended situations. Conditionally disable frame size warning for Heap::PreZygoteFork(). Otherwise the platform doesn't build with ART_USE_FUTEXES = 0, which we needed for testing. Based on the new test, this appears to get us about a decimal order of magnitude in inflated contended locking performance. Single-threaded or scalable applications (i.e. most) should be unaffected. But it should prevent applications that do encounter contention from "falling off a cliff", or at least greatly reduce the height of the cliff. And it should make performance more repeatable by making it less dependent on whether a monitor happens to get inflated. Bug: 111835365 Bug: 140590186 Test: Successfully built and ran monitor tests. Boots AOSP. Test: Build platform with ART_USE_FUTEXES = 0. Test: Check contention messages in log after booting AOSP. Test: Check systrace output while partially running new test. Change-Id: Iff7457fff59efcb24e25d35a4ef71b67b8a9082a
Diffstat (limited to 'compiler/optimizing/graph_visualizer.cc')
0 files changed, 0 insertions, 0 deletions