Age | Commit message (Collapse) | Author |
|
Rationale:
With simple dex bytecode analysis, the inliner marks methods
that always throw to help subsequent code sinking. This reduces
overhead of non-nullable enforcing calls found in e.g the Kotlin
runtime library (1%-2% improvement on tree microbenchmark, about
5% on Denis' benchmark).
Test: test-art-host test-art-target
Change-Id: I45348f049721476828eb5443738021720d2857c0
|
|
And reverse the order of fields in the Class::status_. This
avoids generated code size increase:
- ClassStatus in high bits allows class initialization
check using "status_high_byte < (kInitialized << 4)"
which is unaffected by the low 4 bits of LHS instead of
needing to extract the status bits,
- the type check bit string in the bottom bits instead of
somewehere in the middle allows the comparison on ARM
to be done using the same code size as with the old
layout in most cases (except when the compared value is
9-16 bits and not a modified immediate: 2 bytes less for
9-12 bits and sometimes 2 bytes more for 13-16 bits; the
latter could be worked around using LDRH if the second
character's boundary is at 16 bits).
Add one of the extra bits to the 2nd character to push its
boundary to 16 bits so that we can test an implementation
using 16-bit loads in a subsequent CL, arbitrarily add the
other three bits to the 3rd character. This CL is only
about making those bits available and allowing testing, the
determination of how to use the additonal bits for the best
impact (whether to have a 4th character or distribute them
differently among the three characters) shall be done later.
Test: m test-art-host-gtest
Test: testrunner.py --host --optimizing
Test: Pixel 2 XL boots.
Test: testrunner.py --target --optimizing
Bug: 64692057
Change-Id: I38c59837e3df3accb813fb1e04dc42e9afcd2d73
|
|
In preparation for extending the type check bit string from
24 to 28 bits, rewrite ClassStatus to fit into 4 bits. Also
perform a proper cleanup of the ClassStatus, i.e. change it
to an enum class, remove the "Status" word from enumerator
names, replace "Max" with "Last" in line with other
enumerations and remove aliases from mirror::Class.
Test: m test-art-host-gtest
Test: testrunner.py --host --optimizing
Test: Pixel 2 XL boots.
Test: testrunner.py --target --optimizing
Bug: 64692057
Bug: 65318848
Change-Id: Iec1610ba5dac2c527b36c12819f132e1a77f2d45
|
|
Replace [d]sll+[d]srl with [d]ins on R2+.
Change-Id: I7587e46c47c8ce413d81a5c6c29d91e32a14d855
|
|
Add support for swaps between two SIMDStackSlots, two
VectorRegisters (extended FpuRegister) and between a
SIMDStackSlot and a VectorRegister.
This fixes test 623-checker-loop-regressions for
MIPS64R6 and MIPS32R6.
Test: ./testrunner.py --optimizing --target in QEMU (MIPS64R6)
Test: ./testrunner.py --optimizing --target in QEMU (MIPS32R6)
Change-Id: I36aa209f79790fb6c08b9a171f810769a6b40afc
|
|
Test: ./testrunner.py --optimizing --target
Change-Id: I35154a85f16b4f46d3b4d5827b130b1e20153461
|
|
Note: All tests were executed on CI20 (MIPS32R2) and in
QEMU (MIPS32R6 and MIPS64R6).
Test: ./testrunner.py --optimizing --target
Test: mma test-art-target-gtest
Change-Id: I012fb1013af43d5669a9b0080d481da28ffa7ef2
|
|
Shift the responsibility for filling Class and String .bss
slots from compiled code to runtime. This reduces the size
of the compiled code.
Make oatdump list .bss slot mappings (ArtMethod, Class and
String) for each dex file.
aosp_taimen-userdebug boot image size:
- before:
arm boot*.oat: 36534524
arm64 boot*.oat: 42723256
- after:
arm boot*.oat: 36431448 (-101KiB, -0.3%)
arm64 boot*.oat: 42645016 (-76KiB, -0.2%)
Test: m test-art-host-gtest
Test: testrunner.py --host --optimizing
Test: Pixel 2 XL boots.
Test: testrunner.py --target --optimizing
Test: m dump-oat, manually inspect output.
Bug: 65737953
Change-Id: I1330d070307410107e12c309d4c7f8121baba83c
|
|
Use conditional moves in
InstructionCodeGeneratorMIPS::HandleShift()'s 64-bit variable
shifts to avoid conditional branches (Beqz(TMP, &done)).
Also, on R6 use Beqzc(TMP, &done, /* is_bare */ true) in place of
Beqz(TMP, &done).
Test: Boot & run tests on MIPS32r6 QEMU & on CI-20 hardware (MIPS32r2).
Test: test/testrunner/testrunner.py --target --optimizing
Change-Id: I4d34a51cd2397c845f936af853cb5f30e82de438
|
|
|
|
Integrate the previous CLs into ART Runtime. Subsequent CLs to add
optimizing compiler support.
Use spare 24-bits from "Class#status_" field to
implement faster subtype checking in the runtime. Does not incur any extra memory overhead,
and (when in compiled code) this is always as fast or faster than the original check.
The new subtype checking is O(1) of the form:
src <: target :=
(*src).status >> #imm_target_mask == #imm_target_shifted
Based on the original prototype CL by Zhengkai Wu:
https://android-review.googlesource.com/#/c/platform/art/+/440996/
Test: art/test.py -b -j32 --host
Bug: 64692057
Change-Id: Iec3c54af529055a7f6147eebe5611d9ecd46942b
|
|
|
|
Relax the only back-edge restriction. Implement optimization for
MIPS32/MIPS64 which has already been done for the ARM & x86
architectures in
https://android-review.googlesource.com/#/c/platform/art/+/149370/.
Test: Boot & run tests on 32- & 64-bit version of QEMU.
Test: test/testrunner/testrunner.py --target --optimizing
Test: test-art-host-gtest
Test: test-art-target-gtest
Change-Id: Ie0a4c19ee50ad532fe53933d5808f9d7a4f89b8e
|
|
|
|
Fix a bug in LSA where it doesn't take IntermediateAddress
into account during hunting for original reference.
In following example, original reference i0 can be transformed
by NullCheck, BoundType, IntermediateAddress, etc.
i0 NewArray
i1 HInstruction(i0)
i2 ArrayGet(i1, index)
Test: test-art-host
Test: test-art-target
Test: load_store_analysis_test
Test: 706-checker-scheduler
Change-Id: I162dd8a86fcd31daee3517357c6af638c950b31b
|
|
Adding InstructionSet::kLast shall make it easier to encode
the InstructionSet in fewer bits using BitField<>. However,
introducing `kLast` into the `art` namespace is not a good
idea, so we change the InstructionSet to an enum class.
This also uncovered a case of InstructionSet::kNone being
erroneously used instead of vixl32::Condition::None(), so
it's good to remove `kNone` from the `art` namespace.
Test: m test-art-host-gtest
Test: testrunner.py --host --optimizing
Change-Id: I6fa6168dfba4ed6da86d021a69c80224f09997a6
|
|
- Ensure that SP is a multiple of 16 at all times, and
- Use ldc1/sdc1 to load/store FPU registers from/to 8-byte-aligned
locations wherever possible.
Use `export ART_MIPS32_CHECK_ALIGNMENT=true` when building Android
to enable the new runtime alignment checks.
Test: Boot & run tests on 32-bit version of QEMU, and CI-20.
Test: test/testrunner/testrunner.py --target --optimizing --32
Test: test-art-host-gtest
Test: test-art-target-gtest
Change-Id: Ia667004573f419fd006098fcfadf5834239cb485
|
|
This fixes 122-npe test failure in debuggable mode for MIPS32.
Test: ./testrunner.py --target --optimizing --debuggable --ndebuggable on CI20
Change-Id: I7c5c1e72a92f29e750265b612079ab0bac2a1dc0
|
|
|
|
|
|
Some vectorization patterns are not recognized anymore.
This shall be fixed later.
Test: m test-art-host-gtest
Test: testrunner.py --host --optimizing
Test: testrunner.py --target --optimizing on Nexus 5X
Test: Nexus 5X boots.
Bug: 23964345
Bug: 67935418
Change-Id: I587a328d4799529949c86fa8045c6df21e3a8617
|
|
Reuse the memory previously allocated on the ArenaStack by
optimization passes.
This CL handles only the architecture-independent codegen
and slow paths, architecture-dependent codegen allocations
shall be moved to the ScopedArenaAllocator in a follow-up.
Memory needed to compile the two most expensive methods for
aosp_angler-userdebug boot image:
BatteryStats.dumpCheckinLocked() : 19.6MiB -> 18.5MiB (-1189KiB)
BatteryStats.dumpLocked(): 39.3MiB -> 37.0MiB (-2379KiB)
Also move definitions of functions that use bit_vector-inl.h
from bit_vector.h also to bit_vector-inl.h .
Test: m test-art-host-gtest
Test: testrunner.py --host --optimizing
Bug: 64312607
Change-Id: I84688c3a5a95bf90f56bd3a150bc31fedc95f29c
|
|
Test: test-art-host-gtest
Test: booted MIPS32R2 in QEMU
Test: testrunner.py --target --optimizing --32
Test: repeat all of the above with suppressed generation
of HMipsPackedSwitch
Change-Id: Ic8a27d88cd2d7eebaf5826ce8fd1a5607a024844
|
|
Fixes a bug introduced by
https://android-review.googlesource.com/504041
Test: test-art-host-gtest
Test: testrunner.py --host --optimizing
Bug: 64312607
Change-Id: I7fd2d55c2a657f736eaed7c94c41d1237ae2ec0b
|
|
Passes using local ArenaAllocator were hiding their memory
usage from the allocation counting, making it difficult to
track down where memory was used. Using ScopedArenaAllocator
reveals the memory usage.
This changes the HGraph constructor which requires a lot of
changes in tests. Refactor these tests to limit the amount
of work needed the next time we change that constructor.
Test: m test-art-host-gtest
Test: testrunner.py --host
Test: Build with kArenaAllocatorCountAllocations = true.
Bug: 64312607
Change-Id: I34939e4086b500d6e827ff3ef2211d1a421ac91a
|
|
For array accesses the element address has the following structure:
Address = CONST_OFFSET + base_addr + index << ELEM_SHIFT
The address part (index << ELEM_SHIFT) can be shared across array
accesses with the same data type and index.
For example, in the following loop 5 accesses can share address
computation:
void foo(int[] a, int[] b, int[] c) {
for (i...) {
a[i] = a[i] + 5;
b[i] = b[i] + c[i];
}
}
Test: test-art-host, test-art-target
Change-Id: Id09fa782934aad4ee47669275e7e1a4d7d23b0fa
|
|
This CL adds all the necessary codegen for the Uint8 type
but does not add code transformations that use that code.
Vectorization codegens are modified to use Uint8 as the
packed type when appropriate. The side effects are now
disconnected from the instruction's type after the graph has
been built to allow changing HArrayGet/H*FieldGet/HVecLoad
to use a type different from the underlying field or array.
Note: HArrayGet for String.charAt() is modified to have
no side effects whatsoever; Strings are immutable.
Test: m test-art-host-gtest
Test: testrunner.py --host --optimizing --jit
Test: testrunner.py --target --optimizing on Nexus 6P
Test: Nexus 6P boots.
Bug: 23964345
Change-Id: If2dfffedcfb1f50db24570a1e9bd517b3f17bfd0
|
|
Replace most uses of the runtime's Primitive in compiler
with a new class DataType. This prepares for introducing
new types, such as Uint8, that the runtime does not need
to know about.
Test: m test-art-host-gtest
Test: testrunner.py --host
Bug: 23964345
Change-Id: Iec2ad82454eec678fffcd8279a9746b90feb9b0c
|
|
|
|
Force generating nal instruction before PC-relative addressing in the
presence of irreducible loops (HMipsComputeBaseMethodAddress is not
used in those situations).
This patch fixes a lot of JIT tests failures.
Test: ./testrunner.py --target --optimizing --jit (CI20 and QEMU)
Test: mma test-art-target-gtest (CI20 and QEMU)
Change-Id: I1815a6bb5783f439c8263612abff557f797bfef1
|
|
Move LinkerPatch to compiler/linker/linker_patch.h .
Move SrcMapElem to compiler/debug/src_map_elem.h .
Introduce compiled_method-inl.h to reduce the number
of `#include`s in compiled_method.h .
Test: m test-art-host-gtest
Test: testrunner.py --host
Change-Id: Id211cdf94a63ad265bf4709f1a5e06dffbe30f64
|
|
Remove mostly-unused include and move it to its users.
Test: m
Change-Id: Ibb40f919db64a490290c6e18cf1123aaf44199fc
|
|
Make ClassReference, TypeReference, and MethodReference extend
DexFileReference. This enables using all of these types as the key
for AtomicDexRefMap.
Test: test-art-host
Bug: 63851220
Bug: 63756964
Change-Id: Ida3c94cadb53272cb5057e5cebc5971c1ab4d366
|
|
Implement new HLoadClass load kind for boot image classes
referenced by PIC-compiled apps (i.e. prebuilts) that uses
PC-relative load from a boot image ClassTable mmapped into
the apps .bss. This reduces the size of the PIC prebuilts
that reference boot image classes compared to the kBssEntry
as we can completely avoid the slow path and stack map
unless we need to do the class initialization check.
Prebuilt services.odex for aosp_angler-userdebug (arm64):
- before: 20312800
- after: 19775352 (-525KiB)
Test: m test-art-host-gtest
Test: testrunner.py --host
Test: testrunner.py --host --pictest
Test: testrunner.py --target on Nexus 6P.
Test: testrunner.py --target --pictest on Nexus 6P.
Test: Nexus 6P boots.
Bug: 31951624
Change-Id: I13adb19a1fa7d095a72a41f09daa6101876e77a8
|
|
Implement new HLoadString load kind for boot image strings
referenced by PIC-compiled apps (i.e. prebuilts) that uses
PC-relative load from a boot image InternTable mmapped into
the apps .bss. This reduces the size of the PIC prebuilts
that reference boot image strings compared to the kBssEntry
as we can completely avoid the slow path and stack map.
We separate the InternedStrings and ClassTable sections of
the boot image (.art) file from the rest, aligning the
start of the InternedStrings section to a page boundary.
This may actually increase the size of the boot image file
by a page but it also allows mprotecting() these tables as
read-only. The ClassTable section is included in
anticipation of a similar load kind for HLoadClass.
Prebuilt services.odex for aosp_angler-userdebug (arm64):
- before: 20862776
- after: 20308512 (-541KiB)
Note that 92KiB savings could have been achieved by simply
avoiding the read barrier, similar to the HLoadClass flag
IsInBootImage(). Such flag is now unnecessary.
Test: m test-art-host-gtest
Test: testrunner.py --host
Test: testrunner.py --host --pictest
Test: testrunner.py --target on Nexus 6P.
Test: testrunner.py --target --pictest on Nexus 6P.
Test: Nexus 6P boots.
Bug: 31951624
Change-Id: I5f2bf1fc0bb36a8483244317cfdfa69e192ef6c5
|
|
Test: test-art-host-gtest
Test: booted MIPS64 (with 2nd arch MIPS32R6) in QEMU
Test: test-art-target-gtest32
Test: testrunner.py --target --optimizing --32
Test: same tests as above on CI20
Test: booted MIPS32R2 in QEMU
Change-Id: I7e1ba59993008014d0115ae20c56e0a71fef0fb0
|
|
The bulk of the change is in the assemblers and their
tests.
The main goal is to introduce "bare" branches to labels
(as opposed to the existing bare branches with relative
offsets, whose direct use we want to eliminate).
These branches' delay/forbidden slots are filled
manually and these branches do not promote to long (the
branch target must be within reach of the individual
branch instruction).
The secondary goal is to add more branch tests (mainly
for bare vs non-bare branches and a few extra) and
refactor and reorganize the branch test code a bit.
The third goal is to improve idiom recognition in the
disassembler, including branch idioms and a few others.
Further details:
- introduce bare branches (R2 and R6) to labels, making
R2 branches available for use on R6
- make use of the above in the code generators
- align beqz/bnez with their GNU assembler encoding to
simplify and shorten the test code
- update the CFI test because of the above
- add trivial tests for bare and non-bare branches
(addressing existing debt as well)
- add MIPS32R6 tests for long beqc/beqzc/bc (debt)
- add MIPS64R6 long beqzc test (debt)
- group branch tests together
- group constant/literal/address-loading tests together
- make the disassembler recognize:
- b/beqz/bnez (beq/bne with $zero reg)
- nal (bltzal with $zero reg)
- bal/bgezal (bal = bgezal with $zero reg)
- move (or with $zero reg)
- li (ori/addiu with $zero reg)
- dli (daddiu with $zero reg)
- disassemble 16-bit immediate operands (in andi, ori,
xori, li, dli) as signed or unsigned as appropriate
- drop unused instructions (bltzl, bltzall, addi) from
the disassembler as there are no plans to use them
Test: test-art-host-gtest
Test: booted MIPS64 (with 2nd arch MIPS32R6) in QEMU
Test: test-art-target-gtest
Test: testrunner.py --target --optimizing
Test: same tests as above on CI20
Test: booted MIPS32R2 in QEMU
Change-Id: I62b74a6c00ce0651528114806ba24a59ba564a73
|
|
Test: booted MIPS64 (with 2nd arch MIPS32R6) in QEMU
Test: test-art-target-gtest
Test: testrunner.py --target --optimizing
Test: same tests as above on CI20
Test: booted MIPS32 and MIPS64 in QEMU with poisoning
in configurations:
- with Baker read barrier thunks
- without Baker read barrier thunks
- ART_READ_BARRIER_TYPE=TABLELOOKUP
Change-Id: I79f320bf8862a04215c76cfeff3118ebc87f7ef2
|
|
Add fast paths for TLAB allocation entrypoints for MIPS32 and MIPS64.
Also improve rosalloc entrypoints.
Note: All tests are executed on CI20 (MIPS32R2) and in QEMU (MIPS32R6
and MIPS64R6), with and without ART_TEST_DEBUG_GC=true.
Test: ./testrunner.py --optimizing --target
Test: mma test-art-target-gtest
Test: mma test-art-host-gtest
Change-Id: I92195d2d318b26a19afc5ac46a1844b13b2d5191
|
|
When generating code for ARM64, maintain the status of
Thread::Current()->GetIsGcMarking() in register X20,
dubbed MR (Marking Register), and check the value of that
register (instead of loading and checking a read barrier
marking entrypoint) in read barriers.
Test: m test-art-target
Test: m test-art-target with tree built with ART_USE_READ_BARRIER=false
Test: ARM64 device boot test
Bug: 37707231
Change-Id: Ibe9bc5c99a2176b0a0476e9e9ad7fcc9f745017b
|
|
We need to save 128 bits of data. This is only done for vector
registers that are live, so overhead is not too big.
Test: mma test-art-host-gtest
Test: ./testrunner.py --optimizing --target in QEMU (MIPS)
Change-Id: I0f792e9c98011be3e24d5fad35a8244faafcb9a0
|
|
|
|
Move32 and Move64 are removed so MoveLocation now handles all cases.
Reason for this are 128-bit (SIMDStackSlot, VectorRegister) moves
which will be added in follow-up patch.
Test: mma test-art-host-gtest
Test: ./testrunner.py --optimizing --target in QEMU
Change-Id: I93496e74874f77337b11b2265aa4b470bc7c6ce2
|
|
This is a follow-up to
https://android-review.googlesource.com/#/c/384033/.
Test: booted MIPS64 (with 2nd arch MIPS32R6) in QEMU
Test: testrunner.py --target --optimizing
Test: same tests as above on CI20
Test: booted MIPS32R2 and MIPS64 in QEMU in configurations:
ART_USE_READ_BARRIER=false,
ART_READ_BARRIER_TYPE=TABLELOOKUP
Change-Id: I4cb2f4ded13c0d9fc960c7eac55396f7931c1e38
|
|
Test: m test-art-host-gtest
Test: testrunner.py --host
Test: testrunner.py --target
Test: Nexus 6P boots.
Test: Build aosp_mips64-userdebug.
Bug: 30627598
Change-Id: I0e54fdd2e91e983d475b7a04d40815ba89ae3d4f
|
|
|
|
In preparation for replacing the dex cache method array
with a hash-based array, get rid of one unnecessary use.
This method load kind is currently used only on mips for
irreducible loops and OSR, so this should have no impact
on x86/x86-64/arm/arm64.
Test: m test-art-host-gtest
Test: testrunner.py --host
Test: Repeat the above tests with manually changing
kDexCachePcRelative to kRuntimeCall in sharpening.cc.
(Ignore failures in 552-checker-sharpening.)
Bug: 30627598
Change-Id: Ifce42645f2dcc350bbb88c2f4642e88fc5f98152
|
|
The old name does not reflect the actual code anymore.
Test: testrunner.py --host
Change-Id: I2e13cf727bba9d901c4d3fc821bb526d38a775b8
|
|
This makes MIPS32 boot again.
The issue was introduced in commit
6597946d29be9108e2cc51223553d3db9290a3d9:
Static invokes in slow paths would sometimes get
HMipsComputeBaseMethodAddress from the stack into the
same register where the art method pointer would later
be loaded (A0) with the former being overwritten in the
process of loading the latter.
Test: booted MIPS32R2 in QEMU
Change-Id: Ib584cf66795574175650f42b191c797fb3b3965f
|
|
In preparation for adding ArtMethod entries to the .bss
section, add direct PC-relative pointers to methods so that
the number of needed .bss entries for boot image is small.
Test: m test-art-host-gtest
Test: testrunner.py --host
Test: testrunner.py --target on Nexus 6P
Test: Nexus 6P boots.
Test: Build aosp_mips64-userdebug
Bug: 30627598
Change-Id: Ia89f5f9975b741ddac2816e1570077ba4b4c020f
|