Age | Commit message (Collapse) | Author |
|
Replace String.charAt() with HArrayLength, HBoundsCheck and
HArrayGet. This allows GVN on the HArrayLength and BCE on
the HBoundsCheck as well as using the infrastructure for
HArrayGet, i.e. better handling of constant indexes than
the old intrinsic and using the HArm64IntermediateAddress.
Bug: 28330359
Change-Id: I32bf1da7eeafe82537a60416abf6ac412baa80dc
|
|
Introduce HInstruction::GetInputRecords(), a new virtual
function that returns an ArrayRef<> to all input records.
Implement all other functions dealing with input records as
wrappers around GetInputRecords(). Rewrite functions that
previously used multiple virtual calls to deal with input
records, especially in loops, to prefetch the ArrayRef<>
only once for each instruction. Besides avoiding all the
extra calls, this also allows the compiler (clang++) to
perform additional optimizations.
This speeds up the Nexus 5 boot image compilation by ~0.5s
(4% of "Compile Dex File", 2% of dex2oat time) on AOSP ToT.
Change-Id: Id8ebe0fb9405e38d918972a11bd724146e4ca578
|
|
This reverts commit 0997d24e67d78f2146ebae2888eda0d7d254789a.
ART_HEAP_POISONING=true mode is fixed.
Change-Id: I83f6d5c101ea6a86802753f81b3e4348a263fb21
Signed-off-by: Serguei Katkov <serguei.i.katkov@intel.com>
|
|
|
|
Fails heap poisoning configuration.
This reverts commit afdc97ebcb4e58afb7cf54d846d30314e6499d83.
Change-Id: I50e53756a2b85059b89cfb8950f8c9e2b032743c
|
|
|
|
Change-Id: I7a7ac9244847dd80d9fa4e4b5ebc5bf451c628ff
Signed-off-by: Serguei Katkov <serguei.i.katkov@intel.com>
|
|
Use HArrayLength for String.length() in anticipation of
changing the String.charAt() to HBoundsCheck+HArrayGet to
allow the existing BCE to seamlessly work for strings.
Use HArrayLength+HEqual for String.isEmpty().
We previously relied on inlining but we now want to apply
the new intrinsics even when we do not inline, i.e. when
compiling debuggable (as is currently the case for boot
image) or when we hit inlining limits, i.e. depth, size,
or the number of accumulated dex registers.
Bug: 28330359
Change-Id: Iab9d2f6d2967bdd930a72eb461f27efe8f37c103
|
|
Bug: 27505766
Change-Id: I077465e3d308f4331e7a861902e05865f9d99835
|
|
Change-Id: If8cf0ee43711f6e13171443e3c057ff370ccfbaa
|
|
This patch merges the instruction-building phases from HGraphBuilder
and SsaBuilder into a single HInstructionBuilder class. As a result,
it is not necessary to generate HLocal, HLoadLocal and HStoreLocal
instructions any more, as the builder produces SSA form directly.
Saves 5-15% of arena-allocated memory (see bug for more data):
GMS 20.46MB => 19.26MB (-5.86%)
Maps 24.12MB => 21.47MB (-10.98%)
YouTube 28.60MB => 26.01MB (-9.05%)
This CL fixed an issue with parsing quickened instructions.
Bug: 27894376
Bug: 27998571
Bug: 27995065
Change-Id: I20dbe1bf2d0fe296377478db98cb86cba695e694
|
|
|
|
Bug: 27995065
This reverts commit e3ff7b293be2a6791fe9d135d660c0cffe4bd73f.
Change-Id: I5363c7ce18f47fd422c15eed5423a345a57249d8
|
|
This reduces the size of the pre-header by 8 bytes, reducing
oat file size and mmapped .text section size. The memory
needed to store a CompiledMethod by dex2oat is also reduced,
for 32-bit dex2oat by 8B and for 64-bit dex2oat by 16B. The
aosp_flounder-userdebug 32-bit and 64-bit boot.oat are each
about 1.1MiB smaller.
Disable the broken StubTest.IMT, b/27991555 .
Change-Id: I05fe45c28c8ffb7a0fa8b1117b969786748b1039
|
|
This patch merges the instruction-building phases from HGraphBuilder
and SsaBuilder into a single HInstructionBuilder class. As a result,
it is not necessary to generate HLocal, HLoadLocal and HStoreLocal
instructions any more, as the builder produces SSA form directly.
Saves 5-15% of arena-allocated memory (see bug for more data):
GMS 20.46MB => 19.26MB (-5.86%)
Maps 24.12MB => 21.47MB (-10.98%)
YouTube 28.60MB => 26.01MB (-9.05%)
Bug: 27894376
Change-Id: Iefe28d40600c169c5d306fd2c77034ae19476d90
|
|
Second CL in the series of merging HGraphBuilder and SsaBuilder. This
patch refactors the builders so that dominator tree can be built
before any HInstructions are generated. This puts the SsaBuilder
removal of HLoadLocals/HStoreLocals straight after HGraphBuilder's
HInstruction generation phase. Next CL will therefore be able to
merge them.
This patch also adds util classes for iterating bytecode and switch
tables which allowed to simplify the code.
Bug: 27894376
Change-Id: Ic425d298b2e6e7980481ed697230b1a0b7904526
|
|
|
|
Use only the minimum number of bits required to store stack map data.
For example, if native_pc needs 5 bits and dex_pc needs 3 bits, they
will share the first byte of the stack map entry.
The header is changed to store bit offsets of the fields rather than
byte sizes. Offsets also make it easier to access later fields without
calculating sum of all previous sizes.
All of the header fields are byte sized or encoded as ULEB128 instead
of the previous fixed size encoding. This shrinks it by about half.
It saves 3.6 MB from non-debuggable boot.oat (AOSP).
It saves 3.1 MB from debuggable boot.oat (AOSP).
It saves 2.8 MB (of 99.4 MB) from /system/framework/arm/ (GOOG).
It saves 1.0 MB (of 27.8 MB) from /system/framework/oat/arm/ (GOOG).
Field loads from stackmaps seem to get around 10% faster.
(based on the time it takes to load all stackmap entries from boot.oat)
Bug: 27640410
Change-Id: I8bf0996b4eb24300c1b0dfc6e9d99fe85d04a1b7
|
|
Clean up verifier post-Quick.
Change-Id: I0b05e10dd06edd228fe2068c8afffc4b7d7fdffa
|
|
Also attribute ArenaBitVector allocations to appropriate
passes. This was used to track down the source of the
excessive memory alloactions.
Bug: 27690481
Change-Id: Ib895984cb7c04e24cbc7abbd8322079bab8ab100
|
|
|
|
The debugger looks up PC of the call instruction, so the runtime's
stackmap is not sufficient since it is at PC after the instruction.
Change-Id: I0dd06c0b52e8079ea5d064ea10beb12c93584092
|
|
|
|
This removes redundant code from the generators and allows for easier
stat recording.
Change-Id: Iccd4368f9e9d87a6fecb863dee4e2145c97851c4
|
|
All our back ends implement all comparisons without making a
runtime call, so we can mark art::HCompare as a side effect
free instruction unconditionally.
Change-Id: I9a9e7c09156c642edb6af1fe84408f887e762f2e
|
|
This includes stack operations and, on x86, call/pop to read PC.
bug=26997690
Rationale:
(1) If method is fully intrinsified, and makes no calls in slow
path or uses special input, no need to require current method.
(2) Invoke instructions with HasPcRelativeDexCache() generate code
that reads the PC (call/pop) on x86. However, if the invoke is
an intrinsic that is later replaced with actual code, this PC
reading code may be dead.
Example X86 (before/after):
0x0000108c: 83EC0C sub esp, 12
0x0000108f: 890424 mov [esp], eax <-- not needed
0x00001092: E800000000 call +0 (0x00001097)
0x00001097: 58 pop eax <-- dead code to read PC
0x00001098: F30FB8C1 popcnt eax, ecx
0x0000109c: F30FB8DA popcnt ebx, edx
0x000010a0: 03D8 add ebx, eax
0x000010a2: 89D8 mov eax, ebx
0x000010a4: 83C40C add esp, 12 <-- not needed
0x000010a7: C3 ret
0x0000103c: F30FB8C1 popcnt eax, ecx
0x00001040: F30FB8DA popcnt ebx, edx
0x00001044: 03D8 add ebx, eax
0x00001046: 89D8 mov eax, ebx
0x00001048: C3 ret
Example ARM64 (before/after):
0x0000103c: f81e0fe0 str x0, [sp, #-32]!
0x00001040: f9000ffe str lr, [sp, #24]
0x00001044: dac01020 clz x0, x1
0x00001048: f9400ffe ldr lr, [sp, #24]
0x0000104c: 910083ff add sp, sp, #0x20 (32)
0x00001050: d65f03c0 ret
0x0000103c: dac01020 clz x0, x1
0x00001040: d65f03c0 ret
Change-Id: I8377db80c9a901a08fff4624927cf4a6e585da0c
|
|
Almost all slow paths already know the instruction they belong to,
this CL just moves the knowledge to the base class as well.
This is needed to be be able to get the corresponding dex pc for
slow path, which allows us generate better native line numbers,
which in turn fixes some native debugging stepping issues.
Change-Id: I568dbe78a7cea6a43a4a71a014b3ad135782c270
|
|
We do not require full environment at the start of basic block.
The dex pc contained in basic block is sufficient for line mapping.
Change-Id: I5ba9e5f5acbc4a783ad544769f9a73bb33e2bafa
|
|
Change-Id: I21b984224370a9ce7a4a13a9652503cfb03c5f03
|
|
This reverts commit bd89a5c556324062b7d841843b039392e84cfaf4.
Change-Id: I08d190431520baa7fcec8fbdb444519f25ac8d44
|
|
We don't need Baseline any more and it hasn't been maintained for
a while anyway. Let's remove it.
Change-Id: I442ed26855527be2df3c79935403a25b1ee55df6
|
|
Change-Id: I35beab2777a8c83bd508d56966afa1ceff9ee24f
|
|
Change-Id: I5740ec958a20d236634b66df0e675382ed5c16fc
|
|
Stack maps contain pc to dex mapping.
Reuse them instead of maintaining separate map.
Change-Id: Iaaec9a6bd2603eace1dfc8f4344087883d88cce3
|
|
Change-Id: I4042aefbdac1a8c236d00e2e7145349a64f6486b
|
|
This first implementation uses slow paths to instrument heap
reference loads and GC root loads for the concurrent copying
collector, respectively calling the artReadBarrierSlow and
artReadBarrierForRootSlow (new) runtime entry points.
Notes:
- This implementation does not instrument HInvokeVirtual
nor HInvokeInterface instructions (for class reference
loads), as the corresponding read barriers are not stricly
required with the current concurrent copying collector.
- Intrinsics which may eventually call (on slow path) are
disabled when read barriers are enabled, as the current
slow path infrastructure does not support this case.
- When read barriers are enabled, the code generated for a
HArraySet instruction always go into the array set slow
path for object arrays (delegating the operation to the
runtime), as we are lacking a mechanism to keep a
temporary register live accross a runtime call (needed for
the instrumentation of type checking code, which requires
two successive read barriers).
Bug: 12687968
Change-Id: I14cd6107233c326389120336f93955b28ffbb329
|
|
Add PC-relative dex cache array addressing for X86 and use
it for better invoke-static/-direct dispatch. Also delay
the initialization to the PC-relative base until needed.
Change-Id: Ib8634d5edce4920cd70172fd13211809cf6948d1
|
|
Avoids allocating a CompiledMethod.
Change-Id: I35b4aa0d7c74daba68e827a01e71c300fce3b3bf
|
|
Determine the dispatch type of invoke-static/-direct in a
special pass right after the type inference. This allows the
inliner to pass the "needs dex cache" check and inline more.
It also allows the code generator to avoid requesting a
register location for the ArtMethod* for kDexCachePcRelative
and direct methods.
The supported dispatch check handles also situations that
the CompilerDriver currently doesn't allow. The cleanup of
the CompilerDriver and required changes to Quick will come
in a separate change.
Change-Id: I3f8e903a119949e95871d8ab0a995f4731a13a07
|
|
Change-Id: I370388e8d5de52c7001552b513877ef5833aa621
|
|
Change-Id: I58ae1af5103e281fe59fbe022b718d6d8f293a5e
|
|
Implement dchecked_vector<> template that DCHECK()s element
access and insert()/emplace()/erase() positions. Change the
ArenaVector<> and ScopedArenaVector<> aliases to use the new
template instead of std::vector<>. Remove DCHECK()s that
have now become unnecessary from the Optimizing compiler.
Change-Id: Ib8506bd30d223f68f52bd4476c76d9991acacadc
|
|
Don't request a register for the current method if we're gonna call the
runtime.
Change-Id: I9760d15108bd95efb2a34e6eacd84b60841781d7
|
|
Change-Id: I0e299a81e560eb9cb0737ec46125dffc99333b54
|
|
The CL also changes the calling convetion for 64bit static field set
to use kArg2 instead of kArg1. This allows optimizing to keep
the asumptions:
- arm pairs are always of form (even_reg, odd_reg)
- ecx_edx is not used as a register on x86.
This reverts commit e6f49b47b6a4dc9c7684e4483757872cfc7ff1a1.
Change-Id: I93159917565824084abc96775f31be1a4249f2f3
|
|
And completely remove the deprecated GrowableArray.
Replace GrowableArray with ArenaVector in code generators
and related classes and tag arena allocations.
Label arrays use direct allocations from ArenaAllocator
because Label is non-copyable and non-movable and as such
cannot be really held in a container. The GrowableArray
never actually constructed them, instead relying on the
zero-initialized storage from the arena allocator to be
correct. We now actually construct the labels.
Also avoid StackMapStream::ComputeDexRegisterMapSize() being
passed null references, even though unused.
Change-Id: I26a46fdd406b23a3969300a67739d55528df8bf4
|
|
breaks debuggable tests.
This reverts commit 23a8e35481face09183a24b9d11e505597c75ebb.
Change-Id: I8e60b5c8f48525975f25d19e5e8066c1c94bd2e5
|
|
Change-Id: I9941fa5fcb6ef0a7a253c7a0b479a44a0210aad4
|
|
Change-Id: If2da02b50d2fa668cd58f134a005f1752e7746b1
|
|
|