Age | Commit message (Collapse) | Author |
|
This a forward-looking change intended to allow simpleperf to
reliably correlate samples and native debug information.
I have added the timestamps to both JIT and DEX, and refactored
the code in the process to avoid code duplication.
Test: testrunner.py -t 137
Change-Id: I45fa4310305aff540e036db9af15a86c5b8b7aff
|
|
This fixes unwinds after recent changes (oob apks; cdex data sharing).
Bug: 72520014
Test: m test-art-host-gtest
Change-Id: Ie2a02657b2afbe899acd2e61f0a57d207e688b99
|
|
Change-Id: I0ffa6fb466b1635e724b0e782702303b92355408
|
|
Make it possible to store more then one method per entry,
and ref-count the number of live methods per entry.
Test: m test-art-host-gtest
Change-Id: I45d69185e85e47fbee88a8d1f549ede9875a3c0a
|
|
Add a compact side table for figuring out the debug info offsets
for a given method index. This reduces dex size by ~1.2%.
The debug table is keyed by method index and has leb encoded
offsets for the offsets. This means the table is smaller if debug
infos are encoded by method index order.
To prevent expansion for method indicies without debug info, there
is a bitmap that specifies if a method index has a debug info offset.
Motivation: Reduce code item size and allow more deduping in the
future.
Test: test-art-host
Bug: 63756964
Change-Id: Ib983e85c1727f58c97676bde275f4a9756314da0
|
|
Bug: 71605148
Bug: 63756964
Test: test-art-target on angler
This reverts commit 6716941120ae9f47ba1b8ef8e79820c4b5640350.
Change-Id: Ic01ea4e8bb2c1de761fab354c5bbe27290538631
|
|
Bug: 71605148
Bug: 63756964
Seems to fail on armv7.
This reverts commit f5245188d9c61f6b90eb30cca0875fbdcc493b15.
Change-Id: I37786c04a8260ae3ec4a2cd73710126783c3ae7e
|
|
Added a table that is indexed by dex method index. To prevent size
overhead, there is only one slot for each 16 method indices. This
means there is up to 15 loop iterations to get the quickening info
for a method. The quickening infos are now prefixed by a leb
encoded length. This allows methods that aren't quickened to only
have 1.25 bytes of space overhead.
The value was picked arbitrarily, there is little advantage to
increasing the value since the table only takes 1 byte per 4 method
indices currently. JIT benchmarks do not regress with the change.
There is a net space saving from removing 8 bytes from each
quickening info since most scenarios have more quickened methods than
compiled methods.
For getting quick access to the table, a 4 byte preheader was added
to each dex in the vdex file
Removed logic that stored the quickening info in the CodeItem
debug_info_offset field.
The change adds a small quicken table for each method index, this
means that filters that don't quicken will have a slight increase in
size. The worst case scenario is compiling all the methods, this
results in 0.3% larger vdex for this case. The change also disables
deduping since the quicken infos need to be in dex method index
order.
For filters that don't compile most methods like quicken and
speed-profile, there is space savings. For quicken, the vdex is 2%
smaller.
Bug: 71605148
Bug: 63756964
Test: test-art-host
Change-Id: I89cb679538811369c36b6ac8c40ea93135f813cd
|
|
Change constructor to use a reference to a dex file.
Remove duplicated logic for GetCodeItemSize.
Bug: 63756964
Test: test-art-host
Change-Id: I69af8b93abdf6bdfa4454e16db8f4e75883bca46
|
|
Move all the DexFile related source to a common subdirectory dex/ of
runtime.
Bug: 71361973
Test: make -j 50 test-art-host
Change-Id: I59e984ed660b93e0776556308be3d653722f5223
|
|
Make code item fields private and use accessors. Added a hand full of
friend classes to reduce the size of the change.
Changed default to be nullable and removed CreateNullable.
CreateNullable was a bad API since it defaulted to the unsafe, may
add a CreateNonNullable if it's important for performance.
Motivation:
Have a different layout for code items in cdex.
Bug: 63756964
Test: test-art-host-gtest
Test: test/testrunner/testrunner.py --host
Test: art/tools/run-jdwp-tests.sh '--mode=host' '--variant=X32' --debug
Change-Id: I42bc7435e20358682075cb6de52713b595f95bf9
|
|
This excludes everything that is not needed for backtraces and
compresses the resulting ELF file (wrapped in another ELF file).
This approximately halves the size of the debug data for JIT.
The vast majority of the data is the overhead of ELF header.
We could amortize this by storing more methods per ELF file.
It also adds NOBITS .text section to all debug ELF files,
as that seems necessary for gdb to find the symbols.
On the other hand, it removes .rodata from debug ELF files.
Test: Manually tested that gdb can use this data to unwind.
Test: m test-art-host-gtest
Test: testrunner.py --optimizing --host
Test: testrunner.py -t 137-cfi
Change-Id: Ic0a2dfa953cb79973a7b2ae99d32018599e61171
|
|
The original CL,
https://android-review.googlesource.com/513417 ,
has a bug fixed in the Revert^2,
https://android-review.googlesource.com/550579 ,
and this Revert^4 adds two more fixes:
- fix obsolete native method getting interpreter
entrypoint in 980-redefine-object,
- fix random JIT GC flakiness in 667-jit-jni-stub.
Test: testrunner.py --host --prebuild --no-relocate \
--no-image --jit -t 980-redefine-object
Bug: 65574695
Bug: 69843562
This reverts commit 056d7756152bb3ced81dd57781be5028428ce2bd.
Change-Id: Ic778686168b90e29816fd526e23141dcbe5ea880
|
|
Still seeing occasional failures on 667-jit-jni-stub
Bug: 65574695
Bug: 69843562
This reverts commit e7441631a11e2e07ce863255a59ee4de29c6a56f.
Change-Id: I3db751679ef7bdf31c933208aaffe4fac749a14b
|
|
The original CL,
https://android-review.googlesource.com/513417 ,
had a bug for class unloading where a read barrier was
executed at the wrong time from
ConcurrentCopying::MarkingPhase() ->
ClassLinker::CleanupClassLoaders() ->
ClassLinker::DeleteClassLoader() ->
JitCodeCache::RemoveMethodsIn() ->
JitCodeCache::JniStubKey::UpdateShorty() ->
ArtMethod::GetShorty().
This has been fixed by removing sources of the read barrier
from ArtMethod::GetShorty().
Test: testrunner.py --host --prebuild --jit --no-relocate \
--no-image -t 998-redefine-use-after-free
Bug: 65574695
Bug: 69843562
This reverts commit 47d31853e16a95393d760e6be2ffeeb0193f94a1.
Change-Id: I06e7a15b09d9ff11cde15a7d1529644bfeca15e0
|
|
|
|
Seems to break 998-redefine-use-after-free in
some --no-image configuration.
Bug: 65574695
Bug: 69843562
This reverts commit 3417eaefe4e714c489a6fb0cb89b4810d81bdf4d.
Change-Id: I2dd157b931c17c791522ea2544c1982ed3519b86
|
|
Remove dump-passes inherited from Quick days,
and move dump-timings and dump-stats to CompilerStats.
Test: test.py
Change-Id: Ie79be858a141e59dc0b2a87d8cb5a5248a5bc7af
|
|
Allow the JIT compiler to compile JNI stubs and make sure
they can be collected once they are not in use anymore.
Test: 667-jit-jni-stub
Test: Pixel 2 XL boots.
Test: m test-art-host-gtest
Test: testrunner.py --host --jit
Test: testrunner.py --target --jit
Bug: 65574695
Change-Id: Idf81f50bcfa68c0c403ad2b49058be62b21b7b1f
|
|
Introduce JniCompiledMethod to avoid JNI compiler dependency
on CompiledMethod. This is done in preparation for compiling
JNI stubs in JIT as the CompiledMethod class should be used
exclusively for AOT compilation.
Test: m test-art-host-gtest
Bug: 65574695
Change-Id: I1d047d4aebc55057efb7ed3d39ea65600f5fb6ab
|
|
Add statistics for intrinsic and native stub compilation
and JIT failing to allocate memory for committing the
code. Clean up recording of compilation statistics.
New statistics when building aosp_taimen-userdebug boot
image with --dump-stats:
Attempted compilation of 94304 methods: 99.99% (94295) compiled.
OptStat#AttemptBytecodeCompilation: 89487
OptStat#AttemptIntrinsicCompilation: 160
OptStat#CompiledNativeStub: 4733
OptStat#CompiledIntrinsic: 84
OptStat#CompiledBytecode: 89478
...
where 94304=89487+4733+84 and 94295=89478+4733+84.
Test: testrunner.py -b --host --optimizing
Test: Manually inspect output of building boot image
with --dump-stats.
Bug: 69627511
Change-Id: I15eb2b062a96f09a7721948bcc77b83ee4f18efd
|
|
Rationale:
Refactors the way we set up optimization passes
in the compiler into a more centralized approach.
The refactoring also found some "holes" in the
existing mechanism (missing string lookup in
the debugging mechanism, or inablity to set
alternative name for optimizations that may repeat).
Bug: 64538565
Test: test-art-host test-art-target
Change-Id: Ie5e0b70f67ac5acc706db91f64612dff0e561f83
|
|
When compiling an intrinsic method, generate a graph that
invokes the same method and try to compile it. If the call
is actually intrinsified (or simplified to other HIR) and
yields a leaf method, use the result of this compilation
attempt, otherwise compile the actual code or JNI stub.
Note that CodeGenerator::CreateThrowingSlowPathLocations()
actually marks the locations as kNoCall if the throw is not
in a catch block, thus considering some throwing methods
(for example, String.charAt()) as leaf methods.
We would ideally want to use the intrinsic codegen for all
intrinsics that do not generate a slow-path call to the
default implementation. Relying on the leaf method is
suboptimal as we're missing out on methods that do other
types of calls, for example runtime calls. This shall be
fixed in a subsequent CL.
Test: m test-art-host-gtest
Test: testrunner.py --host --optimizing
Bug: 67717501
Change-Id: I640fda7c22d4ff494b5ff77ebec3b7f5f75af652
|
|
It is now always assumed there is one.
Test: test.py
Change-Id: I8f3f5c722fb8c4a0f9ad8ea685d1a956bd0ac9ae
|
|
Repurpose the old kAccFastNative flag (which wasn't actually
used for some time) and define a new kAccCriticalNative flag
to record the native method's annotation-based kind. This
avoids repeated determination of the kind from GenericJNI.
And making two transitions to runnable and back (using the
ScopedObjectAccess) from GenericJniMethodEnd() for normal
native methods just to determine that we need to transition
to runnable was really weird.
Since the IsFastNative() function now records the presence
of the @FastNative annotation, synchronized @FastNative
method calls now avoid thread state transitions.
When initializing the Runtime without a boot image, the
WellKnowClasses may not yet be initialized, so relax the
DCheckNativeAnnotation() to take that into account.
Also revert
https://android-review.googlesource.com/509715
as the annotation checks are now much faster.
Bug: 65574695
Bug: 35644369
Test: m test-art-host-gtest
Test: testrunner.py --host
Change-Id: I2fc5ba192b9ce710a0e9202977b4f9543e387efe
|
|
Adding InstructionSet::kLast shall make it easier to encode
the InstructionSet in fewer bits using BitField<>. However,
introducing `kLast` into the `art` namespace is not a good
idea, so we change the InstructionSet to an enum class.
This also uncovered a case of InstructionSet::kNone being
erroneously used instead of vixl32::Condition::None(), so
it's good to remove `kNone` from the `art` namespace.
Test: m test-art-host-gtest
Test: testrunner.py --host --optimizing
Change-Id: I6fa6168dfba4ed6da86d021a69c80224f09997a6
|
|
Reuse the memory previously allocated on the ArenaStack by
optimization passes.
This CL handles only the architecture-independent codegen
and slow paths, architecture-dependent codegen allocations
shall be moved to the ScopedArenaAllocator in a follow-up.
Memory needed to compile the two most expensive methods for
aosp_angler-userdebug boot image:
BatteryStats.dumpCheckinLocked() : 19.6MiB -> 18.5MiB (-1189KiB)
BatteryStats.dumpLocked(): 39.3MiB -> 37.0MiB (-2379KiB)
Also move definitions of functions that use bit_vector-inl.h
from bit_vector.h also to bit_vector-inl.h .
Test: m test-art-host-gtest
Test: testrunner.py --host --optimizing
Bug: 64312607
Change-Id: I84688c3a5a95bf90f56bd3a150bc31fedc95f29c
|
|
Memory needed to compile the two most expensive methods for
aosp_angler-userdebug boot image:
BatteryStats.dumpCheckinLocked() : 21.1MiB -> 20.2MiB
BatteryStats.dumpLocked(): 42.0MiB -> 40.3MiB
This is because all the memory previously used by the graph
builder is reused by later passes.
And finish the "arena"->"allocator" renaming; make renamed
allocator pointers that are members of classes const when
appropriate (and make a few more members around them const).
Test: m test-art-host-gtest
Test: testrunner.py --host
Bug: 64312607
Change-Id: Ia50aafc80c05941ae5b96984ba4f31ed4c78255e
|
|
Memory needed to compile the two most expensive methods for
aosp_angler-userdebug boot image:
BatteryStats.dumpCheckinLocked() : 25.1MiB -> 21.1MiB
BatteryStats.dumpLocked(): 49.6MiB -> 42.0MiB
This is because all the memory previously used by Scheduler
is reused by the register allocator; the register allocator
has a higher peak usage of the ArenaStack.
And continue the "arena"->"allocator" renaming.
Test: m test-art-host-gtest
Test: testrunner.py --host
Bug: 64312607
Change-Id: Idfd79a9901552b5147ec0bf591cb38120de86b01
|
|
Passes using local ArenaAllocator were hiding their memory
usage from the allocation counting, making it difficult to
track down where memory was used. Using ScopedArenaAllocator
reveals the memory usage.
This changes the HGraph constructor which requires a lot of
changes in tests. Refactor these tests to limit the amount
of work needed the next time we change that constructor.
Test: m test-art-host-gtest
Test: testrunner.py --host
Test: Build with kArenaAllocatorCountAllocations = true.
Bug: 64312607
Change-Id: I34939e4086b500d6e827ff3ef2211d1a421ac91a
|
|
For array accesses the element address has the following structure:
Address = CONST_OFFSET + base_addr + index << ELEM_SHIFT
The address part (index << ELEM_SHIFT) can be shared across array
accesses with the same data type and index.
For example, in the following loop 5 accesses can share address
computation:
void foo(int[] a, int[] b, int[] c) {
for (i...) {
a[i] = a[i] + 5;
b[i] = b[i] + c[i];
}
}
Test: test-art-host, test-art-target
Change-Id: Id09fa782934aad4ee47669275e7e1a4d7d23b0fa
|
|
Rationale:
As decided after the MIPS change, this change unifies our
six code generators again a bit (we cannot move it into
the generic path, since arm likes to run the simplifier
first). Generally the GVN does some last minute cleanup
(such as finding CSE in the runtime tests generated
by dynamic BCE). I started a golem run to find impact.
Test: test-art-host test-art-target
Change-Id: Ib4098c5bae2269e71fee95cc31e3662d3aa47f6a
|
|
|
|
Replace most uses of the runtime's Primitive in compiler
with a new class DataType. This prepares for introducing
new types, such as Uint8, that the runtime does not need
to know about.
Test: m test-art-host-gtest
Test: testrunner.py --host
Bug: 23964345
Change-Id: Iec2ad82454eec678fffcd8279a9746b90feb9b0c
|
|
Test: mma test-art-host-gtest
Test: mma test-art-target-gtest in QEMU
Test: ./testrunner.py --target --optimizing in QEMU
Change-Id: Ie3c6b29b9125ff8aef888c3574bdb0ab96574bd4
|
|
Move LinkerPatch to compiler/linker/linker_patch.h .
Move SrcMapElem to compiler/debug/src_map_elem.h .
Introduce compiled_method-inl.h to reduce the number
of `#include`s in compiled_method.h .
Test: m test-art-host-gtest
Test: testrunner.py --host
Change-Id: Id211cdf94a63ad265bf4709f1a5e06dffbe30f64
|
|
This shifts some code from the libart-compiler.so to dex2oat
and reduces memory needed for JIT. We also avoid loading the
libart-dexlayout.so for JIT but the memory savings are
minimal (one shared clean page, two shared dirty pages and
some per-app kernel mmap data) as the code has never been
needed in memory by JIT.
aosp_angler-userdebug file sizes (stripped):
lib64/libart-compiler.so: 2989112 -> 2671888 (-310KiB)
lib/libart-compiler.so: 2160816 -> 1939276 (-216KiB)
bin/dex2oat: 141868 -> 368808 (+222KiB)
LOAD/executable elf mapping sizes:
lib64/libart-compiler.so: 2866308 -> 2555500 (-304KiB)
lib/libart-compiler.so: 2050960 -> 1834836 (-211KiB)
bin/dex2oat: 129316 -> 345916 (+212KiB)
Test: m test-art-host-gtest
Test: testrunner.py --host
Test: cd art/; mma; cd -
Change-Id: If62f02847a6cbb208eaf7e1f3e91af4663fa4a5f
|
|
Remove unused Quick compiler flag.
Remove support for arm32 soft-float code (which is no longer
supported by our compiler).
Test: m
Change-Id: I38b16291d90094dbf26776923a46afbf8de53f20
|
|
Add debug info for method call thunks (currently unused) and
Baker read barrier thunks. Refactor debug info generation
for trampolines and record their sizes; change their names
to start with upper-case letters, so that they can be easily
generated as `#fn_name`.
This improved debug info must be generated by `dex2oat -g`,
the debug info generated by `oatdump --symbolize` remains
the same as before, except for the renamed trampolines and
an adjustment for "code delta", i.e. the Thumb mode bit.
Cortex-A53 erratum 843419 workaround thunks are not covered
by this CL.
Test: Manual; run-test --gdb -Xcompiler-option -g 160, pull
symbols for gdbclient, break in the introspection
entrypoint, check that gdb knows the new symbols
(and disassembles them) and `backtrace` works when
setting $pc to an address in the thunk.
Bug: 36141117
Change-Id: Id224b72cfa7a0628799c7db65e66e24c8517aabf
|
|
|
|
Introduce a new "Constructor Fence Redundancy Elimination" pass.
The pass currently performs local optimization only, i.e. within instructions
in the same basic block.
All constructor fences preceding a publish (e.g. store, invoke) get
merged into one instruction.
==============
OptStat#ConstructorFenceGeneratedNew: 43825
OptStat#ConstructorFenceGeneratedFinal: 17631 <+++
OptStat#ConstructorFenceRemovedLSE: 164
OptStat#ConstructorFenceRemovedPFRA: 9391
OptStat#ConstructorFenceRemovedCFRE: 16133 <---
Removes ~91.5% of the 'final' constructor fences in RitzBenchmark:
(We do not distinguish the exact reason that a fence was created, so
it's possible some "new" fences were also removed.)
==============
Test: art/test/run-test --host --optimizing 476-checker-ctor-fence-redun-elim
Bug: 36656456
Change-Id: I8020217b448ad96ce9b7640aa312ae784690ad99
|
|
Test: market scan.
Change-Id: I58b23b8d254883f30619ea3602d34bf93618d432
|
|
|
|
* changes:
optimizing: Add statistics for # of constructor fences added/removed
optimizing: Refactor statistics to use OptimizingCompilerStats helper
|
|
Statistics are attributed as follows:
Added because:
* HNewInstances requires a HConstructorFence following it.
* HReturn requires a HConstructorFence (for final fields) preceding it.
Removed because:
* Optimized in Load-Store-Elimination.
* Optimized in Prepare-For-Register-Allocation.
Test: art/test.py
Bug: 36656456
Change-Id: Ic119441c5151a5a840fc6532b411340e2d68e5eb
|
|
Remove all copies of 'MaybeRecordStat', replacing them with a single
OptimizingCompilerStats::MaybeRecordStat helper.
Change-Id: I83b96b41439dccece3eee2e159b18c95336ea933
|
|
Generate run-time code in the Optimizing compiler checking that
the Marking Register's value matches `self.tls32_.is.gc_marking`
in debug mode (on target; and on host with JIT, or with AOT when
compiling the core image). If a check fails, abort.
Test: m test-art-target
Test: m test-art-target with tree built with ART_USE_READ_BARRIER=false
Test: ARM64 device boot test with libartd.
Bug: 37707231
Change-Id: Ie9b322b22b3d26654a06821e1db71dbda3c43061
|
|
The select generator currently only inserts select instructions
if there is a diamond shape with a phi.
This change extends the select generator to also deal with the
pattern:
if (condition) {
movable instruction 0
return value0
} else {
movable instruction 1
return value1
}
which it turns into:
moveable instruction 0
moveable instruction 1
return select (value0, value1, condition)
Test: 592-checker-regression-bool-input
Change-Id: Iac50fb181dc2c9b7619f28977298662bc09fc0e1
|
|
Flakiness observed on the bots.
Revert "Jit Code Cache instruction pipeline flushing"
This reverts commit 56fe32eecd4f25237e66811fd766355a07908d22.
Revert "ARM64: More JIT Code Cache maintenace"
This reverts commit 17272ab679c9b5f5dac8754ac070b78b15271c27.
Revert "ARM64: JIT Code Cache maintenance"
This reverts commit 3ecac070ad55d433bbcbe11e21f4b44ab178effe.
Revert "Change flush order in JIT code cache"
This reverts commit 43ce5f82dae4dc5eebcf40e54b81ccd96eb5fba3.
Revert "Separate rw from rx views of jit code cache"
This reverts commit d1dbb74e5946fe6c6098a541012932e1e9dd3115.
Test: art/test.py --target --64
Bug: 64527643
Bug: 62356545
Change-Id: Ifa10ac77a60ee96e8cb68881bade4d6b4f828714
|
|
|