Age | Commit message (Collapse) | Author |
|
And some other cleanup after
https://android-review.googlesource.com/230742
Test: No new tests. ART test suite passed (tested on host).
Change-Id: I4743bf17544d0234c6ccb46dd0c1b9aae5c93e17
|
|
|
|
Replace String.charAt() with HArrayLength, HBoundsCheck and
HArrayGet. This allows GVN on the HArrayLength and BCE on
the HBoundsCheck as well as using the infrastructure for
HArrayGet, i.e. better handling of constant indexes than
the old intrinsic and using the HArm64IntermediateAddress.
Bug: 28330359
Change-Id: I32bf1da7eeafe82537a60416abf6ac412baa80dc
|
|
For classes in the boot image, use either direct pointers
or PC-relative addresses. For other classes, use PC-relative
access to the dex cache arrays for AOT and direct address of
the type's dex cache slot for JIT.
For aosp_flounder-userdebug:
- 32-bit boot.oat: -252KiB (-0.3%)
- 64-bit boot.oat: -412KiB (-0.4%)
- 32-bit dalvik cache total: -392KiB (-0.4%)
- 64-bit dalvik-cache total: -2312KiB (-1.0%)
(contains more files than the 32-bit dalvik cache)
For aosp_flounder-userdebug forced to compile PIC:
- 32-bit boot.oat: -124KiB (-0.2%)
- 64-bit boot.oat: -420KiB (-0.5%)
- 32-bit dalvik cache total: -136KiB (-0.1%)
- 64-bit dalvik-cache total: -1136KiB (-0.5%)
(contains more files than the 32-bit dalvik cache)
Bug: 27950288
Change-Id: I4da991a4b7e53c63c92558b97923d18092acf139
|
|
Introduce HInstruction::GetInputRecords(), a new virtual
function that returns an ArrayRef<> to all input records.
Implement all other functions dealing with input records as
wrappers around GetInputRecords(). Rewrite functions that
previously used multiple virtual calls to deal with input
records, especially in loops, to prefetch the ArrayRef<>
only once for each instruction. Besides avoiding all the
extra calls, this also allows the compiler (clang++) to
perform additional optimizations.
This speeds up the Nexus 5 boot image compilation by ~0.5s
(4% of "Compile Dex File", 2% of dex2oat time) on AOSP ToT.
Change-Id: Id8ebe0fb9405e38d918972a11bd724146e4ca578
|
|
information."
|
|
Also adds 16 bit literal information.
Rationale:
When "run-away" instructions are disassembled, the literal
addresses may go out of range, causing oatdump to crash.
This CL guards memory access against the full memory range
allocated to assembly instructions and data (it is possible
but not really necessary to refine this a bit). Out of range
arguments are now displayed as (?) to denote the issue, which
is a lot nicer than crashing.
BUG=28670871
Change-Id: I51e9b6a6a99162546fe31059f14278e8980451c2
|
|
GVN was implicitly extending the liveness of an instruction across
an irreducible loop.
Fix this problem by clearing the value set at loop entries that contain
an irreducible loop.
bug:28252896
(cherry picked from commit 77ce6430af2709432b22344ed656edd8ec80581b)
Change-Id: Ie0121e83b2dfe47bcd184b90a69c0194d13fce54
|
|
Use HArrayLength for String.length() in anticipation of
changing the String.charAt() to HBoundsCheck+HArrayGet to
allow the existing BCE to seamlessly work for strings.
Use HArrayLength+HEqual for String.isEmpty().
We previously relied on inlining but we now want to apply
the new intrinsics even when we do not inline, i.e. when
compiling debuggable (as is currently the case for boot
image) or when we hit inlining limits, i.e. depth, size,
or the number of accumulated dex registers.
Bug: 28330359
Change-Id: Iab9d2f6d2967bdd930a72eb461f27efe8f37c103
|
|
Create a new template class IntrusiveForwardList<> that
mimicks std::forward_list<> except that all allocations
are handled externally. This is essentially the same as
boost::intrusive::slist<> but since we're not using Boost
we have to reinvent the wheel.
Use the new container to replace the HUseList and use the
iterators to "before" use nodes in HUserRecord<> to avoid
the extra pointer to the previous node which was used
exclusively for removing nodes from the list. This reduces
the size of the HUseListNode by 25%, 32B to 24B in 64-bit
compiler, 16B to 12B in 32-bit compiler. This translates
directly to overall memory savings for the 64-bit compiler
but due to rounding up of the arena allocations to 8B, we
do not get any improvement in the 32-bit compiler.
Compiling the Nexus 5 boot image with the 64-bit dex2oat
on host this CL reduces the memory used for compiling the
most hungry method, BatteryStats.dumpLocked(), by ~3.3MiB:
Before:
MEM: used: 47829200, allocated: 48769120, lost: 939920
Number of arenas allocated: 345,
Number of allocations: 815492, avg size: 58
...
UseListNode 13744640
...
After:
MEM: used: 44393040, allocated: 45361248, lost: 968208
Number of arenas allocated: 319,
Number of allocations: 815492, avg size: 54
...
UseListNode 10308480
...
Note that while we do not ship the 64-bit dex2oat to the
device, the JIT compilation for 64-bit processes is using
the 64-bit libart-compiler.
Bug: 28173563
Change-Id: I985eabd4816f845372d8aaa825a1489cf9569208
|
|
|
|
We have seen Checker tests timing out on debug-GC configurations after
having switched to Optimizing because its GraphVisualizer makes too
many syscalls which the configuration keeps track of.
This patch replaces std::endl with "\n" across GraphVisualizer so as
to not flush the stream after every line of output.
Bug: 27826765
Change-Id: I5e3f1e92f8a84f36d324d56945e2d420b2d36a5d
|
|
For strings in the boot image, use either direct pointers
or pc-relative addresses. For other strings, use PC-relative
access to the dex cache arrays for AOT and direct address of
the string's dex cache slot for JIT.
For aosp_flounder-userdebug:
- 32-bit boot.oat: -692KiB (-0.9%)
- 64-bit boot.oat: -948KiB (-1.1%)
- 32-bit dalvik cache total: -900KiB (-0.9%)
- 64-bit dalvik cache total: -3672KiB (-1.5%)
(contains more files than the 32-bit dalvik cache)
For aosp_flounder-userdebug forced to compile PIC:
- 32-bit boot.oat: -380KiB (-0.5%)
- 64-bit boot.oat: -928KiB (-1.0%)
- 32-bit dalvik cache total: -468KiB (-0.4%)
- 64-bit dalvik cache total: -1928KiB (-0.8%)
(contains more files than the 32-bit dalvik cache)
Bug: 26884697
Change-Id: Iec7266ce67e6fedc107be78fab2e742a8dab2696
|
|
GraphChecker tries to verify that Boolean inputs are properly typed.
This is non-trivial in the presence of simplifying optimizations
which capitalize on the fact that a Boolean value is internally
represented as an integer.
This patch removes the test from GraphChecker.
Bug: 27625564
Change-Id: Ic61ea2193765b4578550538e965ca4f80fa4b287
|
|
The art::PrepareForRegisterAllocation visitor can remove an
art::BoundType instruction as value input of an
art::ArraySet instruction, possibly replacing it with an
art::NullConstant. If this happens, remove the need for a
type check in this art::ArraySet.
Bug: 27638110
Change-Id: I6270f8a8e22822a24d8a5919df427ca9c64d121b
|
|
Share implementation between arm and arm64.
Change-Id: I0dd12e772cb23b4c181fd0b1e2a447470b1d8702
|
|
Use negated instructions on ARM64 to replace [bitwise operation + not]
patterns, that is:
a & ~b (BIC)
a | ~b (ORN)
a ^ ~b (EON)
The simplification only happens if the Not is only used by the bitwise
operation. It does not happen if both inputs are Not's (this should be
handled by a generic simplification applying De Morgan's laws).
Change-Id: I0e112b23fd8b8e10f09bfeff5994508a8ff96e9c
|
|
This reverts commit 6b5afdd144d2bb3bf994240797834b5666b2cf98.
Change-Id: Ic27a10f02e21109503edd64e6d73d1bb0c6a8ac6
|
|
Change-Id: I2837064b2ceea587bc171fc520507f13355292c6
|
|
First step towards merging the two passes, which will later result in
HGraphBuilder directly producing SSA form. This CL mostly just updates
tests broken by not being able to inspect the pre-SSA form.
Using HLocals outside the HGraphBuilder is now deprecated.
Bug: 27150508
Change-Id: I00fb6050580f409dcc5aa5b5aa3a536d6e8d759e
|
|
Use an art::x86_64::Label instead of an
art::x86_64::NearLabel as end label when emitting code for a
HCheckCast instruction, as the range of the latter may
sometimes be too short when Baker's read barriers are
enabled.
Bug: 12687968
Change-Id: Ia9742dce65be7d4fb104688f3c4717b65df1fb54
|
|
This reverts commit 2818dbcd75ea9beadcba9d18e2f68523108d0cf5.
Change-Id: I92b2b60b4f08f50cacfea4132f1c28cfbd628f1a
|
|
Background:
This is actually a resubmit of an earlier cl that was
reverted because was test was less robust against
inlining changes (it assumed a virtual call would
never be inlined).
original cl: If8ada79dfd70bea991c11d2b18661b951b6c4cd4
revert cl: I739aaaccd0509d02a62ef01e797a6d45bfe941df
Change-Id: I952680d60ff488874907f066bfdf156a45b409ba
|
|
This reverts commit 79e9f43951c3cfa9ab3b0fea93e5bfdfa7aa5950.
Change-Id: I0670618b4076e06bd3f6bf8c385abfd1b651393c
|
|
It was not intended to have it this way anyway. This also
required to fix GetSiblingAt, to take into account interval
holes, and ConnectSplitSibling to re-materialize a constant
or a method.
Change-Id: Ia5534a93a5413cd0458a251c022d0b655369502b
|
|
Fails 530-checker-loops on arm
This reverts commit bf03fcd10a3ffa15468d335f26697b0473e45b36.
Change-Id: I739aaaccd0509d02a62ef01e797a6d45bfe941df
|
|
Rationale: fell through the cracks of previous "intrinsics" CL.
Change-Id: If8ada79dfd70bea991c11d2b18661b951b6c4cd4
|
|
So we don't fallback to the interpreter in the presence of
irreducible loops.
Implications:
- A loop pre-header does not necessarily dominate a loop header.
- Non-constant redundant phis will be kept in loop headers, to
satisfy our linear scan register allocation algorithm.
- while-graph optimizations, such as gvn, licm, lse, and dce
need to know when they are dealing with irreducible loops.
Change-Id: I2cea8934ce0b40162d215353497c7f77d6c9137e
|
|
This reverts commit 68289a531484d26214e09f1eadd9833531a3bc3c.
Now uses Primitive::Is64BitType instead of Primitive::ComponentSize
because it was incorrectly optimized by GCC.
Bug: 26208284
Bug: 24252151
Bug: 24252100
Bug: 22538329
Bug: 25786318
Change-Id: Ib39f3da2b92bc5be5d76f4240a77567d82c6bebe
|
|
This reverts commit d9510dfc32349eeb4f2145c801f7ba1d5bccfb12.
Bug: 26208284
Bug: 24252151
Bug: 24252100
Bug: 22538329
Bug: 25786318
Change-Id: I5f491becdf076ff51d437d490405ec4e1586c010
|
|
This patch refactors the SsaBuilder to do the following:
1) All phis are constructed live and marked dead if not used or proved
to be conflicting.
2) Primitive type propagation, now not a separate pass, identifies
conflicting types and marks corresponding phis dead.
3) When compiling --debuggable, DeadPhiHandling used to revive phis
which had only environmental uses but did not attempt to resolve
conflicts. This pass was removed as obsolete and is now superseded
by primitive type propagation (identifying conflicting phis) and
SsaDeadPhiEliminiation (keeping phis live if debuggable + env use).
4) Resolving conflicts requires correct primitive type information
on all instructions. This was not the case for ArrayGet instructions
which can have ambiguous types in the bytecode. To this end,
SsaBuilder now runs reference type propagation and types ArrayGets
from the type of the input array.
5) With RTP being run inside the SsaBuilder, it is not necessary to
run it as a separate optimization pass. Optimizations can now assume
that all instructions of type kPrimNot have reference type info after
SsaBuilder (with the exception of NullConstant).
6) Graph now contains a reference type to be assigned to NullConstant.
All reference type instructions therefore have RTI, as now enforced
by the SsaChecker.
Bug: 24252151
Bug: 24252100
Bug: 22538329
Bug: 25786318
Change-Id: I7a3aee1ff66c82d64b4846611c547af17e91d260
|
|
|
|
This reverts commit c88ef3a10c474045a3476a02ae75d07ddd3230b7.
Change-Id: I0ed88a48b313a8d28bc39fae40631123aadb13ef
|
|
This is a follow-up to
https://android-review.googlesource.com/184116 .
Change-Id: Ib03c424fb673afc5ccce15d7d072b7572b47799a
|
|
Fails 425 in debuggable mode.
This reverts commit 4db0bf9c4db6a09716c3388b7d2f88d534470339.
Change-Id: I346df8f75674564fc4fb241c60f23e250fc7f0a7
|
|
The compiler driver makes assumptions that don't hold for
the optimizing compiler, and will for example always go to
slow path for an invoke-super when there's no verified method.
Also fix GenerateInvokeVirtual in the presence of intrinsics.
Next change will address some of the TODOs in sharpening.cc.
Change-Id: I2b0e543ee9b9bebcadb2d26de29e850c59ad58b9
|
|
This introduces architecture-specific instruction simplification.
On ARM64 we try to merge shifts and sign-extension operations into
arithmetic and logical instructions.
For example for the Java code
int res = a + (b << 5);
we would generate
lsl w3, w2, #5
add w0, w1, w3
and we now generate
add w0, w1, w2, lsl #5
Change-Id: Ic03bdff44a1c12e21ddff1b0513bd32a730742b7
|
|
Change-Id: I88dc313df520480f3fd16bbabda27f9435d25368
|
|
|
|
The NullConstant may be added to the graph during other passes that
happen between ReferenceTypePropagation and Inliner (e.g.
InstructionSimplifier). If the inliner doesn't run or doesn't inline
anything, the NullConstant remains untyped.
The infrastructure to properly type NullConstants everywhere is to
complex to add for the benefits
Bug: 25786318
Change-Id: I904a3e605b57f8cac9936e82f19a4994c7b1a82a
Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
|
|
Make sure we merge the ClinitCheck only with LoadClass and
HInvokeStaticOrDirect that is a part of the very same dex
instruction. This fixes incorrect stack traces from class
initializers (wrong dex pcs).
Rewrite the pruning to do all the ClinitCheck merging when
we see the ClinitCheck, instead of merging ClinitCheck into
LoadClass and then LoadClass into HInvokeStaticOrDirect.
When we later see an HInvokeStaticOrDirect with an explicit
check (i.e. not merged), we know that some other instruction
is doing the check and the invoke doesn't need to, so we
mark it as not requiring the check at all. (Previously it
would have been marked as having an implicit check.)
Remove the restriction on merging with inlined invoke static
as this is not necessary anymore. This was a workaround for
X.test():
invoke-static C.foo() [1]
C.foo():
invoke-static C.bar() [2]
After inlining and GVN we have
X.test():
LoadClass C (from [1])
ClinitCheck C (from [1], to be merged to LoadClass)
InvokeStaticOrDirect C.bar() (from [2])
and the LoadClass must not be merged into the invoke as this
would cause the resolution trampoline to see an inlined
frame from the not-yet-loaded class C during the stack walk
and try to load the class. However, we're not allowed to
load new classes at that point, so an attempt to do so leads
to an assertion failure. With this CL, LoadClass is not
merged when it comes from a different instruction, so we can
guarantee that all inlined frames seen by the stack walk in
the resolution trampoline belong to already loaded classes.
Change-Id: I2b8da8d4f295355dce17141f0fab2dace126684d
|
|
This reverts commit 271743601650308c7ac5c7a3ec35025d8130a298.
Change-Id: I173e27a0a4d7d54f90ca459eb48d280d1d40ab70
|
|
Add helper methods on HBasicBlock which return ArrayRef with the
suitable sub-array of the `successors_` list.
Change-Id: I66c83bb56f2984d7550bf77c48110af4087515a8
|
|
This reverts commit 4e5dd521063beae1706410419f19c7e224db50fe.
Change-Id: I0de261d14dd3f71abe05f9bc71744820cf23b937
|
|
Currently we run a type propagation pass unconditionally after the
inliner. This change looks at the returned value (if any) and runs a
minimal type propagation only if its type has changed.
Change-Id: I0dd72bd481219081e8a978d2632426afc980d73a
|
|
|
|
Change-Id: I89f72fef3be35a4dd9585d97d03a3150386e0891
|
|
Implement dchecked_vector<> template that DCHECK()s element
access and insert()/emplace()/erase() positions. Change the
ArenaVector<> and ScopedArenaVector<> aliases to use the new
template instead of std::vector<>. Remove DCHECK()s that
have now become unnecessary from the Optimizing compiler.
Change-Id: Ib8506bd30d223f68f52bd4476c76d9991acacadc
|
|
Change-Id: I0e299a81e560eb9cb0737ec46125dffc99333b54
|
|
The CL also changes the calling convetion for 64bit static field set
to use kArg2 instead of kArg1. This allows optimizing to keep
the asumptions:
- arm pairs are always of form (even_reg, odd_reg)
- ecx_edx is not used as a register on x86.
This reverts commit e6f49b47b6a4dc9c7684e4483757872cfc7ff1a1.
Change-Id: I93159917565824084abc96775f31be1a4249f2f3
|