Age | Commit message (Collapse) | Author |
|
Move away from size_t to dedicated enum (class).
Bug: 30373134
Bug: 30419309
Test: m test-art-host
Change-Id: Id453c330f1065012e7d4f9fc24ac477cc9bb9269
|
|
|
|
As entry points ReadBarrierMarkReg30 and
ReadBarrierMarkReg31 are undefined on all architectures
supporting the read barrier configuration (ARM, ARM64, x86
and x86-64), remove them from the entry point list.
Test: ART host and target (ARM, ARM64) tests.
Bug: 29506760
Bug: 12687968
Change-Id: I500626e54f00aebfc095b4ef5f81b49fa43f7768
|
|
* Boot image code size variation on Nexus 5X
(aosp_bullhead-userdebug build):
- total ARM64 framework Oat files size change:
115584120 bytes -> 109124728 bytes (-5.59%)
- total ARM framework Oat files size change:
97387728 bytes -> 92517584 (-5.00%)
Test: ART host and target (ARM, ARM64) tests.
Bug: 29506760
Bug: 12687968
Change-Id: I979d9fb2b4e09f4c0c7bf33af2cd91750a67f989
|
|
Instead of saving/restoring live caller-save registers
before/after the call to read barrier mark entry points
ReadBarrierMarkRegX, have these entry points save/restore
all the caller-save registers themselves (except register
rX, which contains the return value).
Also refactor the assembly code of these entry points
using macros.
* Boot image code size variation on Nexus 5X
(aosp_bullhead-userdebug build):
- total ARM64 framework Oat files size change:
119196792 bytes -> 115575920 bytes (-3.04%)
- total ARM framework Oat files size change:
100435212 bytes -> 97621188 bytes (-2.80%)
* Benchmarks (ARM64) score variations on Nexus 5X
(aosp_bullhead-userdebug build):
- RitzPerf (lower is better)
- average score difference: -2.71%
- CaffeineMark (higher is better)
- no real difference for most tests
(absolute variation lower than 1%)
- better score on the "Method" benchmark:
score variation 41253 -> 44891 (+8.82%)
Test: ART host and target (ARM, ARM64) tests.
Bug: 29506760
Bug: 12687968
Change-Id: I881bf73139a3f1c2bee9ffc6fc8c00f9a392afa6
|
|
Aligning the accesses allows generating better code.
Before:
add x16, sp, #0x44 (68)
stp x0, x1, [x16, #-16]
After:
stp x0, x1, [sp, #56]
Change-Id: I3e20ad3fa59d00aee4b4d14ea9d59c7cd546509e
|
|
Replace entry point ReadBarrierMark with 32
ReadBarrierMarkRegX entry points, using register
number X as input and output (instead of the standard
runtime calling convention) to save two moves in Baker's
read barrier mark slow-path code.
Test: ART host and target (ARM, ARM64) tests.
Bug: 29506760
Bug: 12687968
Change-Id: I73cfb82831cf040b8b018e984163c865cc44ed87
|
|
|
|
Replace String.charAt() with HArrayLength, HBoundsCheck and
HArrayGet. This allows GVN on the HArrayLength and BCE on
the HBoundsCheck as well as using the infrastructure for
HArrayGet, i.e. better handling of constant indexes than
the old intrinsic and using the HArm64IntermediateAddress.
Bug: 28330359
Change-Id: I32bf1da7eeafe82537a60416abf6ac412baa80dc
|
|
For classes in the boot image, use either direct pointers
or PC-relative addresses. For other classes, use PC-relative
access to the dex cache arrays for AOT and direct address of
the type's dex cache slot for JIT.
For aosp_flounder-userdebug:
- 32-bit boot.oat: -252KiB (-0.3%)
- 64-bit boot.oat: -412KiB (-0.4%)
- 32-bit dalvik cache total: -392KiB (-0.4%)
- 64-bit dalvik-cache total: -2312KiB (-1.0%)
(contains more files than the 32-bit dalvik cache)
For aosp_flounder-userdebug forced to compile PIC:
- 32-bit boot.oat: -124KiB (-0.2%)
- 64-bit boot.oat: -420KiB (-0.5%)
- 32-bit dalvik cache total: -136KiB (-0.1%)
- 64-bit dalvik-cache total: -1136KiB (-0.5%)
(contains more files than the 32-bit dalvik cache)
Bug: 27950288
Change-Id: I4da991a4b7e53c63c92558b97923d18092acf139
|
|
This reverts commit 0997d24e67d78f2146ebae2888eda0d7d254789a.
ART_HEAP_POISONING=true mode is fixed.
Change-Id: I83f6d5c101ea6a86802753f81b3e4348a263fb21
Signed-off-by: Serguei Katkov <serguei.i.katkov@intel.com>
|
|
|
|
Fails heap poisoning configuration.
This reverts commit afdc97ebcb4e58afb7cf54d846d30314e6499d83.
Change-Id: I50e53756a2b85059b89cfb8950f8c9e2b032743c
|
|
|
|
Change-Id: I7a7ac9244847dd80d9fa4e4b5ebc5bf451c628ff
Signed-off-by: Serguei Katkov <serguei.i.katkov@intel.com>
|
|
Use HArrayLength for String.length() in anticipation of
changing the String.charAt() to HBoundsCheck+HArrayGet to
allow the existing BCE to seamlessly work for strings.
Use HArrayLength+HEqual for String.isEmpty().
We previously relied on inlining but we now want to apply
the new intrinsics even when we do not inline, i.e. when
compiling debuggable (as is currently the case for boot
image) or when we hit inlining limits, i.e. depth, size,
or the number of accumulated dex registers.
Bug: 28330359
Change-Id: Iab9d2f6d2967bdd930a72eb461f27efe8f37c103
|
|
Bug: 27505766
Change-Id: I077465e3d308f4331e7a861902e05865f9d99835
|
|
Change-Id: If8cf0ee43711f6e13171443e3c057ff370ccfbaa
|
|
This patch merges the instruction-building phases from HGraphBuilder
and SsaBuilder into a single HInstructionBuilder class. As a result,
it is not necessary to generate HLocal, HLoadLocal and HStoreLocal
instructions any more, as the builder produces SSA form directly.
Saves 5-15% of arena-allocated memory (see bug for more data):
GMS 20.46MB => 19.26MB (-5.86%)
Maps 24.12MB => 21.47MB (-10.98%)
YouTube 28.60MB => 26.01MB (-9.05%)
This CL fixed an issue with parsing quickened instructions.
Bug: 27894376
Bug: 27998571
Bug: 27995065
Change-Id: I20dbe1bf2d0fe296377478db98cb86cba695e694
|
|
Bug: 27995065
This reverts commit e3ff7b293be2a6791fe9d135d660c0cffe4bd73f.
Change-Id: I5363c7ce18f47fd422c15eed5423a345a57249d8
|
|
This patch merges the instruction-building phases from HGraphBuilder
and SsaBuilder into a single HInstructionBuilder class. As a result,
it is not necessary to generate HLocal, HLoadLocal and HStoreLocal
instructions any more, as the builder produces SSA form directly.
Saves 5-15% of arena-allocated memory (see bug for more data):
GMS 20.46MB => 19.26MB (-5.86%)
Maps 24.12MB => 21.47MB (-10.98%)
YouTube 28.60MB => 26.01MB (-9.05%)
Bug: 27894376
Change-Id: Iefe28d40600c169c5d306fd2c77034ae19476d90
|
|
For strings in the boot image, use either direct pointers
or pc-relative addresses. For other strings, use PC-relative
access to the dex cache arrays for AOT and direct address of
the string's dex cache slot for JIT.
For aosp_flounder-userdebug:
- 32-bit boot.oat: -692KiB (-0.9%)
- 64-bit boot.oat: -948KiB (-1.1%)
- 32-bit dalvik cache total: -900KiB (-0.9%)
- 64-bit dalvik cache total: -3672KiB (-1.5%)
(contains more files than the 32-bit dalvik cache)
For aosp_flounder-userdebug forced to compile PIC:
- 32-bit boot.oat: -380KiB (-0.5%)
- 64-bit boot.oat: -928KiB (-1.0%)
- 32-bit dalvik cache total: -468KiB (-0.4%)
- 64-bit dalvik cache total: -1928KiB (-0.8%)
(contains more files than the 32-bit dalvik cache)
Bug: 26884697
Change-Id: Iec7266ce67e6fedc107be78fab2e742a8dab2696
|
|
|
|
The debugger looks up PC of the call instruction, so the runtime's
stackmap is not sufficient since it is at PC after the instruction.
Change-Id: I0dd06c0b52e8079ea5d064ea10beb12c93584092
|
|
This removes redundant code from the generators and allows for easier
stat recording.
Change-Id: Iccd4368f9e9d87a6fecb863dee4e2145c97851c4
|
|
Almost all slow paths already know the instruction they belong to,
this CL just moves the knowledge to the base class as well.
This is needed to be be able to get the corresponding dex pc for
slow path, which allows us generate better native line numbers,
which in turn fixes some native debugging stepping issues.
Change-Id: I568dbe78a7cea6a43a4a71a014b3ad135782c270
|
|
We do not require full environment at the start of basic block.
The dex pc contained in basic block is sufficient for line mapping.
Change-Id: I5ba9e5f5acbc4a783ad544769f9a73bb33e2bafa
|
|
Change-Id: I21b984224370a9ce7a4a13a9652503cfb03c5f03
|
|
This reverts commit bd89a5c556324062b7d841843b039392e84cfaf4.
Change-Id: I08d190431520baa7fcec8fbdb444519f25ac8d44
|
|
We don't need Baseline any more and it hasn't been maintained for
a while anyway. Let's remove it.
Change-Id: I442ed26855527be2df3c79935403a25b1ee55df6
|
|
Rationale:
Sharing identical slow path code reduces code size.
Background:
Currently, slow paths with the same dex-pc, same physical register
spilling code, and identical stack maps are shared (making this
only useful for deopt slow paths). The newly introduced mechanism
is sufficiently general to allow future improvements by e.g.
allowing different dex-pc (by passing this to runtime) or even
the kind of slow paths (by passing runtime addresses to the slowpath).
Change-Id: I819615c47b4fd98440a241f681f93e4fc22d12e0
|
|
Change-Id: I5740ec958a20d236634b66df0e675382ed5c16fc
|
|
Stack maps contain pc to dex mapping.
Reuse them instead of maintaining separate map.
Change-Id: Iaaec9a6bd2603eace1dfc8f4344087883d88cce3
|
|
This first implementation uses slow paths to instrument heap
reference loads and GC root loads for the concurrent copying
collector, respectively calling the artReadBarrierSlow and
artReadBarrierForRootSlow (new) runtime entry points.
Notes:
- This implementation does not instrument HInvokeVirtual
nor HInvokeInterface instructions (for class reference
loads), as the corresponding read barriers are not stricly
required with the current concurrent copying collector.
- Intrinsics which may eventually call (on slow path) are
disabled when read barriers are enabled, as the current
slow path infrastructure does not support this case.
- When read barriers are enabled, the code generated for a
HArraySet instruction always go into the array set slow
path for object arrays (delegating the operation to the
runtime), as we are lacking a mechanism to keep a
temporary register live accross a runtime call (needed for
the instrumentation of type checking code, which requires
two successive read barriers).
Bug: 12687968
Change-Id: I14cd6107233c326389120336f93955b28ffbb329
|
|
Add PC-relative dex cache array addressing for X86 and use
it for better invoke-static/-direct dispatch. Also delay
the initialization to the PC-relative base until needed.
Change-Id: Ib8634d5edce4920cd70172fd13211809cf6948d1
|
|
Avoids allocating a CompiledMethod.
Change-Id: I35b4aa0d7c74daba68e827a01e71c300fce3b3bf
|
|
Determine the dispatch type of invoke-static/-direct in a
special pass right after the type inference. This allows the
inliner to pass the "needs dex cache" check and inline more.
It also allows the code generator to avoid requesting a
register location for the ArtMethod* for kDexCachePcRelative
and direct methods.
The supported dispatch check handles also situations that
the CompilerDriver currently doesn't allow. The cleanup of
the CompilerDriver and required changes to Quick will come
in a separate change.
Change-Id: I3f8e903a119949e95871d8ab0a995f4731a13a07
|
|
Change-Id: I58ae1af5103e281fe59fbe022b718d6d8f293a5e
|
|
Change-Id: Id6ee1fe44c4c57d373db7a39530f29a5ca9aee18
|
|
Change-Id: I0e299a81e560eb9cb0737ec46125dffc99333b54
|
|
The CL also changes the calling convetion for 64bit static field set
to use kArg2 instead of kArg1. This allows optimizing to keep
the asumptions:
- arm pairs are always of form (even_reg, odd_reg)
- ecx_edx is not used as a register on x86.
This reverts commit e6f49b47b6a4dc9c7684e4483757872cfc7ff1a1.
Change-Id: I93159917565824084abc96775f31be1a4249f2f3
|
|
Tag previously "Misc" arena allocations with more specific
allocation types. Move some native heap allocations to the
arena in BCE.
Bug: 23736311
Change-Id: If8ef15a8b614dc3314bdfb35caa23862c9d4d25c
|
|
And completely remove the deprecated GrowableArray.
Replace GrowableArray with ArenaVector in code generators
and related classes and tag arena allocations.
Label arrays use direct allocations from ArenaAllocator
because Label is non-copyable and non-movable and as such
cannot be really held in a container. The GrowableArray
never actually constructed them, instead relying on the
zero-initialized storage from the arena allocator to be
correct. We now actually construct the labels.
Also avoid StackMapStream::ComputeDexRegisterMapSize() being
passed null references, even though unused.
Change-Id: I26a46fdd406b23a3969300a67739d55528df8bf4
|
|
Refactor slow paths so that there is a default implementation for
common cases (only arm64 with vixl is special). Write a generic
intrinsic slow-path that can be reused for the specific architectures.
Move helper functions into CodeGenerator so that they are accessible.
Change-Id: Ibd788dce432601c6a9f7e6f13eab31f28dcb8550
|
|
breaks debuggable tests.
This reverts commit 23a8e35481face09183a24b9d11e505597c75ebb.
Change-Id: I8e60b5c8f48525975f25d19e5e8066c1c94bd2e5
|
|
Change-Id: I9941fa5fcb6ef0a7a253c7a0b479a44a0210aad4
|
|
Change-Id: If2da02b50d2fa668cd58f134a005f1752e7746b1
|
|
|
|
Replace GrowableArray with ArenaVector in HGraph and related
classes HEnvironment, HLoopInformation, HInvoke and HPhi,
and tag allocations with new arena allocation types.
Change-Id: I3d79897af405b9a1a5b98bfc372e70fe0b3bc40d
|
|
The original CL triggered b/24084144 which has been fixed
by Ib72e12a018437c404e82f7ad414554c66a4c6f8c.
This reverts commit 659562aaf133c41b8d90ec9216c07646f0f14362.
Change-Id: Id8980436172457d0fcb276349c4405f7c4110a55
|