| Age | Commit message (Collapse) | Author | 
 | 
Test: Run ART test suite on host and Nexus 6.
Change-Id: Ie2ad70f1e3f125eae5dad53a6384d405e0311505
 | 
 | 
 | 
 | 
If possible, use full pass name provided in --run-passes rather
than its base version.
Test: m test-art-host -j32
1. Prepare a run-passes file with content:
dead_code_elimination$initial
instruction_simplifier
x86_memory_operand_generation
2. Run art for a dex file like:
art -Xcompiler-option --run-passes=run-passes -Xcompiler-option
--dump-passes -classpath classes.dex Test
3. Verify that dead_code_elimination$initial string is present in
dump-passes output.
Change-Id: I92d9ed0c8b919ea03f625f549123f546dffe546b
 | 
 | 
Use Thumb2Assembler always. This originated from finding out that
the JNI tests are run using the Arm32Assembler however in real
world Thumb2Assembler is used for JNI. Therefore Arm32Assembler
code is dead except its own tests and the illegitimate use in
JNI tests.
Change-Id: I9ca6b83582bf97149a46690518ccb9312b1a3b68
 | 
 | 
 | 
 | 
Fixes compiler not building when some of the codegen paths
are disabled.
Test: mmma -j art ART_TARGET_CODEGEN_ARCHS=svelte
m -j32 test-art-host
BUG=30928847
Change-Id: I52c78e8a4e507f74b1f2a39352970079721b737e
 | 
 | 
Introduce verbose logging of optimization passes run during
compilation.
Test: m test-art-host -j32
art -Xcompiler-option --runtime-arg -Xcompiler-option -verbose:compiler
-classpath classes.dex Test
Change-Id: Iae98ce9dcafc252f2d0eec138aa05b34e424bd2a
 | 
 | 
Adds a faster path for java methods annotated with
dalvik.annotation.optimization.FastNative .
Intended to replace usage of fast JNI (registering with "!(FOO)BAR" descriptors).
Performance Microbenchmark Results (Angler):
* Regular JNI cost in nanoseconds: 115
* Fast JNI cost in nanoseconds: 60
* @FastNative cost in nanoseconds: 36
Summary: Up to 67% faster (vs fast jni) JNI transition cost
Change-Id: Ic23823ae0f232270c068ec999fd89aa993894b0e
 | 
 | 
This change introduces new dex2oat switch --run-passes=. This switch
accepts path to a text file with names of passes to run.
Compiler will run optimization passes specified in the file rather
then the default ones.
There is no verification implemented on the compiler side. It is user's
responsibility to provide a list of passes that leads to successful
generation of correct code. Care should be taken to prepare a list
that satisfies all dependencies between optimizations.
We only take control of the optional optimizations. Codegen (builder),
and all passes required for register allocation will run unaffected
by this mechanism.
Change-Id: Ic3694e53515fefcc5ce6f28d9371776b5afcbb4f
 | 
 | 
Test: manually, on device.
Change-Id: If007a1657dd5769ddef03691e0a19dbbe6ba1a29
 | 
 | 
Tested:
- MIPS32 Android boots in QEMU
- test-art-host-gtest
- test-art-target-run-test-optimizing in QEMU, on CI20
- test-art-target-gtest on CI20
Change-Id: I70fd5d5267f8594c3b29d5a4ccf66b8ca8b09df3
 | 
 | 
 | 
 | 
Make RunOptimizations, MaybeRunInliner and RunArchOptimizations member
functions of OptimizingCompiler class.
Both versions of RunOptimizations are protected in preparation for
bisection bug search CL.
Change-Id: I596efa9ed3fccd1ed3798c6427cc166e2a5d28bd
 | 
 | 
 | 
 | 
After changing the addressing mode for array accesses (in
https://android-review.googlesource.com/248406) the 'add'
instruction that calculates the base address for the array can be
shared across accesses to the same array.
Before https://android-review.googlesource.com/248406:
    add IP, r[Array], r[Index0], LSL #2
    ldr r0, [IP, #12]
    add IP, r[Array], r[Index1], LSL #2
    ldr r0, [IP, #12]
Before this CL:
    add IP. r[Array], #12
    ldr r0, [IP, r[Index0], LSL #2]
    add IP. r[Array], #12
    ldr r0, [IP, r[Index1], LSL #2]
After this CL:
    add IP. r[Array], #12
    ldr r0, [IP, r[Index0], LSL #2]
    ldr r0, [IP, r[Index1], LSL #2]
Link to the original optimization:
    https://android-review.googlesource.com/#/c/127310/
Test: Run ART test suite on Nexus 6.
Change-Id: Iee26f9a0a7ca46abb90e3f60d19d22dc8dee4d8f
 | 
 | 
Allow alternate register allocation strategies to be implemented
in subclasses of a common register allocation base class.
Test: m test-art-host
Change-Id: I7c5866aa9ddff8f53fcaf721bad47654ab221b4f
 | 
 | 
This will allow a cleaner commit in an upcoming
refactoring of register allocation.
Test: m test-art-host
Change-Id: If420c97b088b3c934411ff83373e024003120746
 | 
 | 
Currently, an HBoundsCheck is fed by an HArrayLength, causing a load of
the array length, followed by a register compare.
Avoid the load when we can by comparing directly with the array length
in memory.  Implement this by marking the HArrayLength as 'emitted at
use site', and then generating the code in the HBoundsCheck.
Only do this replacement when we are the only user of the ArrayLength
and it isn't visible to the environment.
Handle the special case where the array is 'null' and where an implicit
null check can't be eliminated.
This code moves the load of the length to the slow code for the failed
check, which is what we want.
Test: 609-checker-x86-bounds-check
Change-Id: I9cdb183301e048234bb0ffeda940eedcf4a655bd
Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
 | 
 | 
Improvements include:
- CodeGeneratorMIPS::GenerateStaticOrDirectCall() supports:
  - MethodLoadKind::kDirectAddressWithFixup (via literals)
  - CodePtrLocation::kCallDirectWithFixup (via literals)
  - MethodLoadKind::kDexCachePcRelative
- 32-bit literals to support the above (not ready for general-
  purpose applications yet because RA is not saved in leaf
  methods, but is clobbered on MIPS32R2 when simulating
  PC-relative addressing (MIPS32R6 is OK because it has
  PC-relative addressing with the lwpc instruction))
- shorter instruction sequences for recursive static/direct
  calls
Tested:
- test-art-host-gtest
- test-art-target-gtest and test-art-target-run-test-optimizing on:
  - MIPS32R2 QEMU
  - CI20 board
  - MIPS32R6 (2nd arch) QEMU
Change-Id: Id5b137ad32d5590487fd154c9a01d3b3e7e044ff
 | 
 | 
bug:29089267
bug:27521545
We were hitting a compiler DCHECK that a class would never require to
do access checks on itself. The reason was that the compiler driver
was not trying to resolve a type, but instead relied on the verifier
for pre-populating the dex cache. However, the verifier doesn't
necessarily run in JIT mode.
This reverts commit 12abcbd950bd0ff4528e2e0d27ca5e881c7b0467.
Change-Id: I59204c16927084f6605a2a3f999ca529f949e1ad
 | 
 | 
bug:29089267
bug:27521545
Fails some DHECKs.
This reverts commit 808067335b228d7b50ad84123d3c8ecb7aeeb200.
Change-Id: I0e768ce85be593e3f50fd02abc29aa34f2be3562
 | 
 | 
And avoid calling ResolveMethod in the JIT, since it already
knows that method.
bug:29089267
bug:27521545
(cherry picked from commit 2dc77ecf375882f51ff7c09712c05b80e58abb6b)
Change-Id: I36084b1f207317452c42fdfc8ffa4d8c721d2f76
 | 
 | 
To ensure even the JIT will not try to compile methods with
soft failures a runtime_throw.
bug:28293819
bug:28313047
Change-Id: Ie3fd71ded0b77de8dab1c3c825b867cb321b8873
 | 
 | 
And clean up some APIs to return std::unique_ptr<> instead
of raw pointers that don't communicate ownership.
Change-Id: I3017302307a0253d661240750298802fb0d9585e
 | 
 | 
This patch merges the instruction-building phases from HGraphBuilder
and SsaBuilder into a single HInstructionBuilder class. As a result,
it is not necessary to generate HLocal, HLoadLocal and HStoreLocal
instructions any more, as the builder produces SSA form directly.
Saves 5-15% of arena-allocated memory (see bug for more data):
  GMS      20.46MB  =>  19.26MB  (-5.86%)
  Maps     24.12MB  =>  21.47MB  (-10.98%)
  YouTube  28.60MB  =>  26.01MB  (-9.05%)
This CL fixed an issue with parsing quickened instructions.
Bug: 27894376
Bug: 27998571
Bug: 27995065
Change-Id: I20dbe1bf2d0fe296377478db98cb86cba695e694
 | 
 | 
 | 
 | 
Bug: 27995065
This reverts commit e3ff7b293be2a6791fe9d135d660c0cffe4bd73f.
Change-Id: I5363c7ce18f47fd422c15eed5423a345a57249d8
 | 
 | 
This reduces the size of the pre-header by 8 bytes, reducing
oat file size and mmapped .text section size. The memory
needed to store a CompiledMethod by dex2oat is also reduced,
for 32-bit dex2oat by 8B and for 64-bit dex2oat by 16B. The
aosp_flounder-userdebug 32-bit and 64-bit boot.oat are each
about 1.1MiB smaller.
Disable the broken StubTest.IMT, b/27991555 .
Change-Id: I05fe45c28c8ffb7a0fa8b1117b969786748b1039
 | 
 | 
This patch merges the instruction-building phases from HGraphBuilder
and SsaBuilder into a single HInstructionBuilder class. As a result,
it is not necessary to generate HLocal, HLoadLocal and HStoreLocal
instructions any more, as the builder produces SSA form directly.
Saves 5-15% of arena-allocated memory (see bug for more data):
  GMS      20.46MB  =>  19.26MB  (-5.86%)
  Maps     24.12MB  =>  21.47MB  (-10.98%)
  YouTube  28.60MB  =>  26.01MB  (-9.05%)
Bug: 27894376
Change-Id: Iefe28d40600c169c5d306fd2c77034ae19476d90
 | 
 | 
Second CL in the series of merging HGraphBuilder and SsaBuilder. This
patch refactors the builders so that dominator tree can be built
before any HInstructions are generated. This puts the SsaBuilder
removal of HLoadLocals/HStoreLocals straight after HGraphBuilder's
HInstruction generation phase. Next CL will therefore be able to
merge them.
This patch also adds util classes for iterating bytecode and switch
tables which allowed to simplify the code.
Bug: 27894376
Change-Id: Ic425d298b2e6e7980481ed697230b1a0b7904526
 | 
 | 
 | 
 | 
Change-Id: I309411b0fffaaed1e218e2c34394bdf6e2f75b48
 | 
 | 
This reverts commit 845e5064580bd37ad5014f7aa0d078be7265464d.
Add an option to change what OatFileManager considers up-to-date.
In our tests we're allowed to write to the dalvik-cache, so it
cannot be kSpeed.
Bug: 27689078
Change-Id: I0c578705a9921114ed1fb00d360cc7448addc93a
 | 
 | 
Bots are red. Tentative reverting as this is likely the offender.
Bug: 27689078
This reverts commit a62d2f04a6ecf804f8a78e722a6ca8ccb2dfa931.
Change-Id: I3ec6947a5a4be878ff81f26f17dc36a209734e2a
 | 
 | 
Record the compiler filter in the oat header. Use that to determine
when the oat file is up-to-date with respect to a target compiler
filter level.
New xxx-profile filter levels are added to specify if a profile should
be used instead of testing for the presence of a profile file.
This change should allow for different compiler-filters to be set for
different package manager use cases.
Bug: 27689078
Change-Id: Id6706d0ed91b45f307142692ea4316aa9713b023
 | 
 | 
Remove unused allocation types, mostly from removed Quick.
Move logging one level up to capture memory used by stack
maps during AOT compilation. Raise the reporting threshold
to 8MiB to limit the output to the worst offenders.
Change-Id: I8c7a01bfa90bc8ec5eab66187eb6850a022f3543
 | 
 | 
 | 
 | 
Collect data for stack maps, profiling info, and compiled code.
bug:27520994
Change-Id: Ic87361230c96ce0090027a37d750e948d806c597
 | 
 | 
This removes redundant code from the generators and allows for easier
stat recording.
Change-Id: Iccd4368f9e9d87a6fecb863dee4e2145c97851c4
 | 
 | 
bug:27520994
Change-Id: I67b0c5b822001bfde8738a988c1ade69f1a26e3f
 | 
 | 
The debugger needs them to unwind through the trampolines and to
understand what is happening in the call stack.
Change-Id: Ia554058c3796788adcd7336d620a7734eb366905
 | 
 | 
 | 
 | 
 | 
 | 
This includes stack operations and, on x86, call/pop to read PC.
bug=26997690
Rationale:
(1) If method is fully intrinsified, and makes no calls in slow
    path or uses special input, no need to require current method.
(2) Invoke instructions with HasPcRelativeDexCache() generate code
    that reads the PC (call/pop) on x86. However, if the invoke is
    an intrinsic that is later replaced with actual code, this PC
    reading code may be dead.
Example X86 (before/after):
0x0000108c: 83EC0C      sub esp, 12
0x0000108f: 890424      mov [esp], eax       <-- not needed
0x00001092: E800000000  call +0 (0x00001097)
0x00001097: 58          pop eax              <-- dead code to read PC
0x00001098: F30FB8C1    popcnt eax, ecx
0x0000109c: F30FB8DA    popcnt ebx, edx
0x000010a0: 03D8        add ebx, eax
0x000010a2: 89D8        mov eax, ebx
0x000010a4: 83C40C      add esp, 12          <-- not needed
0x000010a7: C3          ret
0x0000103c: F30FB8C1    popcnt eax, ecx
0x00001040: F30FB8DA    popcnt ebx, edx
0x00001044: 03D8        add ebx, eax
0x00001046: 89D8        mov eax, ebx
0x00001048: C3          ret
Example ARM64 (before/after):
0x0000103c: f81e0fe0      str x0, [sp, #-32]!
0x00001040: f9000ffe      str lr, [sp, #24]
0x00001044: dac01020      clz x0, x1
0x00001048: f9400ffe      ldr lr, [sp, #24]
0x0000104c: 910083ff      add sp, sp, #0x20 (32)
0x00001050: d65f03c0      ret
0x0000103c: dac01020      clz x0, x1
0x00001040: d65f03c0      ret
Change-Id: I8377db80c9a901a08fff4624927cf4a6e585da0c
 | 
 | 
Slight tweak in the API which I will need in the near future.
Change-Id: I45954ae16710bc941a9a06a3a17c70798315ca53
 | 
 | 
Do not pass CompiledMethod pointer through since it is only available
during AOT compile but not during JIT compile or at runtime. Creating
mock CompiledMethod just pass data is proving increasingly tricky, so
copy the fields that we need to MethodDebugInfo instead.
Change-Id: I820297b41e769fcac488c0ff2d2ea0492bb13ed8
 | 
 | 
This is a hint to the debugger that breakpoints and stepping
might not function as intended (since we have limited information).
Change-Id: I23c4a816182cc7548fcd69fbd00112225e7b1710
 | 
 | 
This is subset of CL171665 and it separates it into two.
It will be needed to generate .MIPS.abiflags ELF section.
Change-Id: I5557e7cb98d0fa1dc57c85cf6161e119c6d50a1a
 | 
 | 
This pass does transform the graph so make it part of cfg-dumping.
Change-Id: I42e361382c85c97b974faad8bb0fcf2cb0750355
 | 
 | 
Sharing it with the verifier and the class loader is not ideal,
especially at startup time.
bug:27398183
bug:23128949
Change-Id: I1b91663a13f6c5b33ad3b4be780d93eb7fe445b4
 |