Age | Commit message (Collapse) | Author |
|
Also fix the side effects for <Static/Instance>Field<Get/Set>.
Test: test-art-target
Change-Id: Ia4284ccd9d0c88210eaa4458f74728c805e2e076
|
|
Test: booted MIPS32 in QEMU
Test: test-art-host-gtest
Test: test-art-target-gtest
Test: test-art-target-run-test-optimizing on CI20
Change-Id: I727e80753395ab99fff004cb5d2e0a06409150d7
|
|
Rationale: on MIPS32 64-bit loads and stores may be performed
as pairs of 32-bit loads/stores. Implicit null checks must be
associated with the first 32-bit load/store in a pair and not
the last. This change ensures proper association of said checks
(a few were done after the last 32-bit load/store in a pair)
and lays ground for further improvements in array/field get/set.
Test: booted MIPS32 in QEMU
Test: test-art-host-gtest
Test: test-art-target-run-test-optimizing in QEMU
Change-Id: I3674947c00bb17930790a7a47c9b7aadc0c030b8
|
|
Previously, the string dex cache was dex_file->NumStringIds() size, and
@ruhler found that only ~1% of that cache was ever getting filled. Since
many of these string dex caches were previously 100,000+ indices in
length, we're wasting a few hundred KB per app by storing null pointers.
The intent of this project was to reduce the space the string dex cache
is using, while not regressing on time that much. This is the first of a
few CLs, which implements the new fixed size array and disables the
compiled code so it always goes slow path. In four other CLs, I
implemented a "medium path" that regresses from the previous "fast path"
only a bit in assembly in the entrypoints. @vmarko will introduce new
compiled code in the future so that we ultimately won't be regressing on
time at all. Overall, space savings have been confirmed as on the order
of 100 KB per application.
A 4-5% slow down in art-opt on Golem, and no noticeable slow down in the
interpreter. The opt slow down should be diminished once the new
compiled code is introduced.
Test: m test-art-host
Bug: 20323084
Change-Id: Ic654a1fb9c1ae127dde59290bf36a23edb55ca8e
|
|
Note that neither clang-tidy nor cpplint.py complain about
these style "issues", precisely because of the NOLINT
comments.
Test: WITH_TIDY=1 WITH_TIDY_CHECKS='-*,misc-macro-parentheses' mmma art
Change-Id: Id692fd394ffbd4fe208cbbe4407b4d5e208462bb
|
|
|
|
We avoid the need to save/restore registers in slow paths
and get significant code size savings. On Nexus 9, AOSP:
- 32-bit boot.oat: -1.4MiB (-1.9%)
- 64-bit boot.oat: -2.0MiB (-2.3%)
- other 32-bit oat files in dalvik-cache: -200KiB (-1.7%)
- other 64-bit oat files in dalvik-cache: -2.3MiB (-2.1%)
Test: Run ART test suite on host and Nexus 9 with gc stress.
Bug: 30212852
Change-Id: I7015afc1e7d30341618c9200a3dc9ae277afd134
|
|
Move away from size_t to dedicated enum (class).
Bug: 30373134
Bug: 30419309
Test: m test-art-host
Change-Id: Id453c330f1065012e7d4f9fc24ac477cc9bb9269
|
|
|
|
Tested:
- MIPS32 Android boots in QEMU
- test-art-host-gtest
- test-art-target-run-test-optimizing in QEMU, on CI20
- test-art-target-gtest on CI20
Change-Id: I70fd5d5267f8594c3b29d5a4ccf66b8ca8b09df3
|
|
This fixes tests 454-get-vreg and 457-regs in debuggable.
Additional changes in LocationsBuilderMIPS::HandleFieldGet/Set
to prevent situations when running out of available fp registers.
Test: mma -j2 ART_TEST_RUN_TEST_DEBUGGABLE=true test-art-target-run-test
Change-Id: Iaad6a6e414ff747b39209780c21aeddc225a04c1
|
|
|
|
Ensure that return types guarantee full-width data as the compiled
code and mterp expect by using size_t and ssize_t.
This fixes Clang no longer sign-/zero-extending small return types.
Bug: 30232671
Test: m ART_TEST_RUN_TEST_NDEBUG=true ART_TEST_INTERPRETER=true test-art-host-run-test
Change-Id: Ic505befc6c94e2dccbc8abf2b13d4c2d662e68d1
|
|
Originally reverted in order to revert
https://android-review.googlesource.com/#/c/244190/
but can now be merged again.
This reverts commit d4ceecc85a5aab2ec23ea1bd010692ba8c8aaa0c.
Test: m test-art-host
Change-Id: Id9205f2b77a378fc0f06088e78c66e81a49f712d
|
|
Introduced by:
https://android-review.googlesource.com/#/c/244980/
test:566-polymorphic-inling for fixing x86 crash. Also
fixes a performance regression.
bug:29188168
Change-Id: Id90cb819c88e7ba3db1cb3c50c517a112ab7d784
|
|
|
|
This patch renames kCall to kCallOnMainOnly in preparation for
the next patch in this series which will be adding kCallOnMainAndSlowPath.
Note: With this patch there will be places where we use kCallOnMainOnly
even though we call on the slow path too. The next patch in this series
will fix that.
Test: ART host tests.
Change-Id: Iabfdb0901990d163be5d780f3bdd2fab6fa17b32
|
|
|
|
This reverts commit 88f288e3564d79d87c0cd8bb831ec5a791ba4861.
Change-Id: I49605d53692cbec1e2622e23ff2893fc51ed4115
|
|
Improvements include:
- CodeGeneratorMIPS::GenerateStaticOrDirectCall() supports:
- MethodLoadKind::kDirectAddressWithFixup (via literals)
- CodePtrLocation::kCallDirectWithFixup (via literals)
- MethodLoadKind::kDexCachePcRelative
- 32-bit literals to support the above (not ready for general-
purpose applications yet because RA is not saved in leaf
methods, but is clobbered on MIPS32R2 when simulating
PC-relative addressing (MIPS32R6 is OK because it has
PC-relative addressing with the lwpc instruction))
- shorter instruction sequences for recursive static/direct
calls
Tested:
- test-art-host-gtest
- test-art-target-gtest and test-art-target-run-test-optimizing on:
- MIPS32R2 QEMU
- CI20 board
- MIPS32R6 (2nd arch) QEMU
Change-Id: Id5b137ad32d5590487fd154c9a01d3b3e7e044ff
|
|
Bug: 29188168 (for initial CL)
Bug: 29778499 (reason for revert)
This reverts commit badee9820fcf5dca5f8c46c3215ae1779ee7736e.
Change-Id: I32b8463122c3521e233c34ca95c96a5078e88848
|
|
|
|
I need to revert this to get https://android-review.googlesource.com/#/c/244190/ to cleanly revert. Matthew, do you mind rewriting it?
This reverts commit 50706437d8216e41f0fea1e413cda7891324d397.
Change-Id: I5c1435f5dffb46dbb5b613b22adb88c7770304f2
|
|
Change-Id: I0e3f393d87538eb9e35b3012ea36e81c8b7a225e
|
|
|
|
Replace String.charAt() with HArrayLength, HBoundsCheck and
HArrayGet. This allows GVN on the HArrayLength and BCE on
the HBoundsCheck as well as using the infrastructure for
HArrayGet, i.e. better handling of constant indexes than
the old intrinsic and using the HArm64IntermediateAddress.
Bug: 28330359
Change-Id: I32bf1da7eeafe82537a60416abf6ac412baa80dc
|
|
For classes in the boot image, use either direct pointers
or PC-relative addresses. For other classes, use PC-relative
access to the dex cache arrays for AOT and direct address of
the type's dex cache slot for JIT.
For aosp_flounder-userdebug:
- 32-bit boot.oat: -252KiB (-0.3%)
- 64-bit boot.oat: -412KiB (-0.4%)
- 32-bit dalvik cache total: -392KiB (-0.4%)
- 64-bit dalvik-cache total: -2312KiB (-1.0%)
(contains more files than the 32-bit dalvik cache)
For aosp_flounder-userdebug forced to compile PIC:
- 32-bit boot.oat: -124KiB (-0.2%)
- 64-bit boot.oat: -420KiB (-0.5%)
- 32-bit dalvik cache total: -136KiB (-0.1%)
- 64-bit dalvik-cache total: -1136KiB (-0.5%)
(contains more files than the 32-bit dalvik cache)
Bug: 27950288
Change-Id: I4da991a4b7e53c63c92558b97923d18092acf139
|
|
This allows us to more easily maintain and experiment with
interface method table indexing and hashing.
Change-Id: I719920fae7490dcedcda7c1c36db225c2b8b16df
|
|
* Remove IMT for classes which do not implement interfaces
* Remove IMT for array classes
* Share same IMT
Saved memory (measured on hammerhead):
boot.art:
Total number of classes: 3854
Number of affected classes: 1637
Saved memory: 409kB
Chrome (excluding classes in boot.art):
Total number of classes: 2409
Number of affected classes: 1259
Saved memory: 314kB
Google Maps (excluding classes in boot.art):
Total number of classes: 6988
Number of affected classes: 2574
Saved memory: 643kB
Performance regression on benchmarks/InvokeInterface.java benchmark
(measured timeCall10Interface)
1st launch: 9.6%
2nd launch: 6.8%
Change-Id: If07e45390014a6ee8f3c1c4ca095b43046f0871f
|
|
|
|
Improvements:
- the stack frame is (de)allocated in one step instead of two
- callee-saved FPU registers are 8-byte aligned within the frame,
allowing a single ldc1/sdc1 instruction to load/store an FPU
register without causing exceptions due to misaligned accesses
- the return address register, RA, is restored early for better
instruction scheduling
Change-Id: I556b139c62839490a9fdbce8c5e6e3e2d1cc7bb7
|
|
Introduce HInstruction::GetInputRecords(), a new virtual
function that returns an ArrayRef<> to all input records.
Implement all other functions dealing with input records as
wrappers around GetInputRecords(). Rewrite functions that
previously used multiple virtual calls to deal with input
records, especially in loops, to prefetch the ArrayRef<>
only once for each instruction. Besides avoiding all the
extra calls, this also allows the compiler (clang++) to
perform additional optimizations.
This speeds up the Nexus 5 boot image compilation by ~0.5s
(4% of "Compile Dex File", 2% of dex2oat time) on AOSP ToT.
Change-Id: Id8ebe0fb9405e38d918972a11bd724146e4ca578
|
|
* Add parentheses to fix warnings.
* Use NOLINT to suppress wrong clang-tidy warnings.
Bug: 28705665
Change-Id: Icc8bc9b59583dee0ea17ab83e0ff0383b8599c3e
|
|
Use HArrayLength for String.length() in anticipation of
changing the String.charAt() to HBoundsCheck+HArrayGet to
allow the existing BCE to seamlessly work for strings.
Use HArrayLength+HEqual for String.isEmpty().
We previously relied on inlining but we now want to apply
the new intrinsics even when we do not inline, i.e. when
compiling debuggable (as is currently the case for boot
image) or when we hit inlining limits, i.e. depth, size,
or the number of accumulated dex registers.
Bug: 28330359
Change-Id: Iab9d2f6d2967bdd930a72eb461f27efe8f37c103
|
|
And clean up some APIs to return std::unique_ptr<> instead
of raw pointers that don't communicate ownership.
Change-Id: I3017302307a0253d661240750298802fb0d9585e
|
|
This patch merges the instruction-building phases from HGraphBuilder
and SsaBuilder into a single HInstructionBuilder class. As a result,
it is not necessary to generate HLocal, HLoadLocal and HStoreLocal
instructions any more, as the builder produces SSA form directly.
Saves 5-15% of arena-allocated memory (see bug for more data):
GMS 20.46MB => 19.26MB (-5.86%)
Maps 24.12MB => 21.47MB (-10.98%)
YouTube 28.60MB => 26.01MB (-9.05%)
This CL fixed an issue with parsing quickened instructions.
Bug: 27894376
Bug: 27998571
Bug: 27995065
Change-Id: I20dbe1bf2d0fe296377478db98cb86cba695e694
|
|
Use the original method index instead of the target method
index because the target method may point to a different dex
file.
No regression test: this currently happens only if the
codegen uses the kDexCacheViaMethod as a fallback for
another load kind and we aim to avoid that fallback, so it
would be difficult to write a reliable regression test. We
could try and exploit current fallbacks for irreducible
loops on x86 and arm but those fallbacks will eventually
disappear anyway.
Bug: 28036230
Change-Id: I4cc9e046480d3d60a7fb521f0ca6a98914625cdc
|
|
Bug: 27995065
This reverts commit e3ff7b293be2a6791fe9d135d660c0cffe4bd73f.
Change-Id: I5363c7ce18f47fd422c15eed5423a345a57249d8
|
|
This patch merges the instruction-building phases from HGraphBuilder
and SsaBuilder into a single HInstructionBuilder class. As a result,
it is not necessary to generate HLocal, HLoadLocal and HStoreLocal
instructions any more, as the builder produces SSA form directly.
Saves 5-15% of arena-allocated memory (see bug for more data):
GMS 20.46MB => 19.26MB (-5.86%)
Maps 24.12MB => 21.47MB (-10.98%)
YouTube 28.60MB => 26.01MB (-9.05%)
Bug: 27894376
Change-Id: Iefe28d40600c169c5d306fd2c77034ae19476d90
|
|
For strings in the boot image, use either direct pointers
or pc-relative addresses. For other strings, use PC-relative
access to the dex cache arrays for AOT and direct address of
the string's dex cache slot for JIT.
For aosp_flounder-userdebug:
- 32-bit boot.oat: -692KiB (-0.9%)
- 64-bit boot.oat: -948KiB (-1.1%)
- 32-bit dalvik cache total: -900KiB (-0.9%)
- 64-bit dalvik cache total: -3672KiB (-1.5%)
(contains more files than the 32-bit dalvik cache)
For aosp_flounder-userdebug forced to compile PIC:
- 32-bit boot.oat: -380KiB (-0.5%)
- 64-bit boot.oat: -928KiB (-1.0%)
- 32-bit dalvik cache total: -468KiB (-0.4%)
- 64-bit dalvik cache total: -1928KiB (-0.8%)
(contains more files than the 32-bit dalvik cache)
Bug: 26884697
Change-Id: Iec7266ce67e6fedc107be78fab2e742a8dab2696
|
|
|
|
- Define maximum int and long shift & rotate distances as
int32_t constants, as shift & rotate distances are 32-bit
integer values.
- Consider the (long, long) inputs case as invalid for
static evaluation of shift & rotate rotations.
- Add more checks in shift & rotate operations constructors
as well as in art::GraphChecker.
Change-Id: I754b326c3a341c9cc567d1720b327dad6fcbf9d6
|
|
- Make the difference between arithmetic zero and zero-bit
pattern non ambiguous.
- Introduce Boolean predicates in art::HIntConstant for when
they are used as Booleans.
- Introduce aritmetic positive and negative zero predicates
for floating-point constants.
Bug: 27639313
Change-Id: Ia04ecc6f6aa7450136028c5362ed429760c883bd
|
|
|
|
The debugger looks up PC of the call instruction, so the runtime's
stackmap is not sufficient since it is at PC after the instruction.
Change-Id: I0dd06c0b52e8079ea5d064ea10beb12c93584092
|
|
Also extend tests covering the IntegerSignum, LongSignum,
IntegerCompare and LongCompare intrinsics and their
translation into an art::HCompare instruction.
Bug: 27629913
Change-Id: I0afc75ee6e82602b01ec348bbb36a08e8abb8bb8
|
|
|
|
This removes redundant code from the generators and allows for easier
stat recording.
Change-Id: Iccd4368f9e9d87a6fecb863dee4e2145c97851c4
|
|
All our arithmetic operations accept it.
bug:27624718
Change-Id: I1f6bb95dc77ecb3fb2fcabb35a93b31c524bfa0a
|
|
Pack narrow fields and flags into a single 32-bit field.
Change-Id: Ib2f7abf987caee0339018d21f0d498f8db63542d
|