Age | Commit message (Collapse) | Author |
|
Create "Phi placeholders" for tracking heap values that can
merge from different values and try to match existing Phis
or create new Phis to replace loads. For Phi placeholders
from loop headers we do not know whether they are fed by
unknown values through back-edges when processing the loop
header, so we delay processing loads that depend on them
until we walked the entire graph. We then try to match them
with existing instructions (when the location is unchanged
in the loop) or Phis or create new Phis if needed. If we
find a loop Phi placeholder fed with unknown value from a
back-edge, we mark the Phi placeholder unreplaceable and
reprocess loads and stores to propagate the unknown value.
This can sometimes allow other loads to be replaced. At the
end we re-calculate the heap values to find stores that can
be eliminated because they write over the same value.
Golem results:
art-opt-cc arm arm64 x86 x86-64
CaffeineFloat +6.7% +3.0% +5.9% +3.8%
KotlinMicroWhen +33.7% +4.8% +1.8% +0.6%
art-opt (more noisy than art-opt-cc)
CaffeineFloat +4.1% +4.4% +7.8% +10.5%
KotlinMicroWhen +33.6% +2.0% +1.8% +1.8%
The MoveLiteralColumn benchmark seems to gain significantly
(up to 22% on art-opt-cc but under 10% on art-opt) but it is
very noisy and the results are therefore unreliable.
Insignificant code size changes for aosp_blueline-userdebug:
- before:
arm boot*.oat: 15303468
arm64 boot*.oat: 18184736
services.odex: 25195944
grep -c pAllocObject boot.arm64.oatdump.txt: 27213
grep -c pAllocArray boot.arm64.oatdump.txt: 3620
- after:
arm boot*.oat: 15299524 (-4KiB, -0.03%)
arm64 boot*.oat: 18176528 (-8KiB, -0.05%)
services.odex: 25191832 (-4KiB, -0.02%)
grep -c pAllocObject boot.arm64.oatdump.txt: 27206 (-7)
grep -c pAllocArray boot.arm64.oatdump.txt: 3615 (-5)
Test: New tests in 530-checker-lse.
Test: m test-art-host-gtest
Test: testrunner.py --host --optimizing
Test: blueline-userdebug boots.
Bug: 77906240
Change-Id: Ia9fe0cd3530f9d3941650dfefc00a7f7fd821994
|
|
If a float or double argument needs to be passed in core
register to a @CriticalNative method due to soft-float
native ABI, insert a fake call to Float.floatToRawIntBits()
or Double.doubleToRawLongBits() to satisfy type checks in
the compiler.
We cannot do that for intrinsics that expect those inputs in
actual FP registers, so we still prevent such intrinsics
from using `kCallCriticalNative`. This should be irrelevant
if an actual intrinsic implementation is emitted. There are
currently two unimplemented intrinsics that are affected by
the carve-out, namely MathRoundDouble and FP16ToHalf, and
four intrinsics implemented only when ARMv8A is supported,
namely MathRint, MathRoundFloat, MathCeil and MathFloor.
Test: testrunner.py --target --32 -t 178-app-image-native-method
Bug: 112189621
Change-Id: Id14ef4f49f8a0e6489f97dc9588c0e6a5c122632
|
|
Test: m
Bug: 161896447
Bug: 161850439
Bug: 161336379
Change-Id: Iabc29fa43b4b5a403699d6bca95e9a2cb8945d77
|
|
A pattern seen in libcore and SPECjvm2008 workloads is a pair of HRem/HDiv
having the same dividend and divisor. The code generator processes
them separately and generates duplicated instructions calculating HDiv.
This CL adds detection of such a pattern to the instruction simplifier.
This optimization affects HInductionVarAnalysis and HLoopOptimization
preventing some loop optimizations. To avoid this the instruction simplifier
has the loop_friendly mode which means not to optimize HRems if they are in a loop.
A microbenchmark run on Pixel 3 shows the following improvements:
| little cores | big cores
arm32 Int32 | +21% | +40%
arm32 Int64 | +46% | +44%
arm64 Int32 | +27% | +14%
arm64 Int64 | +33% | +27%
Test: 411-checker-instruct-simplifier-hrem
Test: test.py --host --optimizing --jit --gtest --interpreter
Test: test.py --target --optimizing --jit --interpreter
Test: run-gtests.sh
Change-Id: I376a1bd299d7fe10acad46771236edd5f85dfe56
|
|
Make LSA a helper class, not an optimization pass. Move all
its allocations to ScopedArenaAllocator to reduce the peak
memory usage a little bit.
Test: m test-art-host-gtest
Test: testrunner.py --host --optimizing
Change-Id: I7fc634abe732d22c99005921ffecac5207bcf05f
|
|
This avoids passing the `VariableSizedHandleScope*` argument
around and eliminates HGraph::inexact_object_rti_ and its
initialization. The latter shall allow running Optimizing
gtests that do not require type information without creating
a Runtime in future. (To be implemented in a separate CL.)
Test: m test-art-host-gtest
Test: testrunner.py --host --optmizing
Test: aosp_taimen-userdebug boots.
Change-Id: I36fe9bc556c6d610d644c8c14cc74c9985a14d64
|
|
ART vectorizer assumes that there is single size of SIMD
register used for the whole program. Make this assumption explicit
and refactor the code.
Note: This is a base for the future introduction of SIMD slots of
size other than 8 or 16 bytes.
Test: test-art-target, test-art-host.
Change-Id: Id699d5e3590ca8c655ecd9f9ed4e63f49e3c4f9c
|
|
Test: aosp_taimen-userdebug boots.
Test: m test-art-host-gtest
Test: testrunner.py --host --optimizing
Bug: 147346243
Change-Id: I97fdc15e568ae3fe390efb1da690343025f84944
|
|
This reverts commit e2727154f25e0db9a5bb92af494d8e47b181dfcf.
Reason for revert: Breaks ASAN tests (ODR violation).
Bug: 142365358
Change-Id: I38103d74a1297256c81d90872b6902ff1e9ef7a4
|
|
Make symbols in compiler/optimizing hidden by a namespace
attribute. The unit intrinsic_objects.{h,cc} is excluded as
it is needed by dex2oat.
As the symbols are no longer exported, gtests are now linked
with the static version of the libartd-compiler library.
libart-compiler.so size:
- before:
arm: 2396152
arm64: 3345280
- after:
arm: 2016176 (-371KiB, -15.9%)
arm64: 2874480 (-460KiB, -14.1%)
Test: m test-art-host-gtest
Test: testrunner.py --host --optimizing --jit
Bug: 142365358
Change-Id: I1fb04a33351f53f00b389a1642e81a68e40912a8
|
|
Handles compiler.
Bug: 116054210
Test: WITH_TIDY=1 mmma art
Change-Id: I5cdfe73c31ac39144838a2736146b71de037425e
|
|
Treat verification results and image classes as mutable
only in CompilerDriver::PreCompile(), and treat them as
immutable during compilation, accessed through the
CompilerOptions. This severs the dependency of the inliner
on the CompilerDriver.
Test: m test-art-host-gtest
Test: testrunner.py --host --optimizing
Change-Id: I594a0213ca6a5003c19b4bd488af98db4358d51d
|
|
This patch performs instruction simplification to
generate instructions andn, blsmsk and blsr on
cpus that have avx2.
Test: test.py --host --64, test-art-host-gtest
Change-Id: Ie41a1b99ac2980f1e9f6a831a7d639bc3e248f0f
Signed-off-by: Shalini Salomi Bodapati <shalini.salomi.bodapati@intel.com>
|
|
Instead just recognize the intrinsic when creating an invoke
instruction.
Also remove some old code related to compiler driver sharpening.
Test: test.py
Change-Id: Iecb668f30e95034970fcf57160ca12092c9c610d
|
|
Make the last sharpening helper (methods) like the other
helpers: being invoked by the instruction builder.
Test: test.py
Change-Id: Ic80a454f9b59b0b4ef7825590b24402500ba851c
|
|
|
|
This reverts commit 61908880e6565acfadbafe93fa64de000014f1a6.
Reason for revert: By failing to round multiply results, it does not follow Java rounding rules.
Change-Id: Ic0ef08691bef266c9f8d91973e596e09ff3307c6
|
|
|
|
This patch adds a new cpu vaiant named kabylake and performs
instruction simplification to generate VectorMulitplyAccumulate.
Test: ./test.py --host --64
Change-Id: Ie6cc882dadf1322dd4d3ae49bfdb600b0c447765
Signed-off-by: Gupta Kumar, Sanjiv <sanjiv.kumar.gupta@intel.com>
|
|
Check for non-PIC boot image as a testing config instead.
Honor the config for HInvokeStaticOrDirect sharpening.
Test: m test-art-host-gtest
Test: testrunner.py --host --optimizing
Change-Id: I3645f4fefe322f1fd64ea88a2b41a35ceccea688
|
|
Removes CompilerDriver dependency from ImageWriter and
several other classes.
Test: m test-art-host-gtest
Test: testrunner.py --host --optimizing
Test: Pixel 2 XL boots.
Test: m test-art-target-gtest
Test: testrunner.py --target --optimizing
Change-Id: I3c5b8ff73732128b9c4fad9405231a216ea72465
|
|
Rationale:
The change introduces actual conditional passes
(dependence on inliner). This ensures more
cases are optimized downstream without
needlessly introducing compile-time.
NOTE:
Some checker tests needed to be rewritten
due to subtle changes in the phase ordering.
No optimizations were harmed in the process,
though.
Bug: b/78171933, b/74026074
Test: test-art-host,target
Change-Id: I335260df780e14ba1f22499ad74d79060c7be44d
|
|
Change constructor to use a reference to a dex file.
Remove duplicated logic for GetCodeItemSize.
Bug: 63756964
Test: test-art-host
Change-Id: I69af8b93abdf6bdfa4454e16db8f4e75883bca46
|
|
Move all the DexFile related source to a common subdirectory dex/ of
runtime.
Bug: 71361973
Test: make -j 50 test-art-host
Change-Id: I59e984ed660b93e0776556308be3d653722f5223
|
|
Make code item fields private and use accessors. Added a hand full of
friend classes to reduce the size of the change.
Changed default to be nullable and removed CreateNullable.
CreateNullable was a bad API since it defaulted to the unsafe, may
add a CreateNonNullable if it's important for performance.
Motivation:
Have a different layout for code items in cdex.
Bug: 63756964
Test: test-art-host-gtest
Test: test/testrunner/testrunner.py --host
Test: art/tools/run-jdwp-tests.sh '--mode=host' '--variant=X32' --debug
Change-Id: I42bc7435e20358682075cb6de52713b595f95bf9
|
|
This helps save memory by avoiding the allocation of
HEnvironment and related objects for AOT references to
boot image strings and classes (kBootImage* load kinds)
and also for JIT references (kJitTableAddress).
Compiling aosp_taimen-userdebug boot image, the most memory
hungry method BatteryStats.dumpLocked() needs
- before:
Used 55105384 bytes of arena memory...
...
UseListNode 10009704
Environment 423248
EnvVRegs 20676560
...
- after:
Used 50559176 bytes of arena memory...
...
UseListNode 8568936
Environment 365680
EnvVRegs 17628704
...
Test: m test-art-host-gtest
Test: testrunner.py --host --optimizing --jit
Bug: 34053922
Change-Id: I68e73a438e6ac8e8908e6fccf53bbeea8a64a077
|
|
Rationale:
Refactors the way we set up optimization passes
in the compiler into a more centralized approach.
The refactoring also found some "holes" in the
existing mechanism (missing string lookup in
the debugging mechanism, or inablity to set
alternative name for optimizations that may repeat).
Bug: 64538565
Test: test-art-host test-art-target
Change-Id: Ie5e0b70f67ac5acc706db91f64612dff0e561f83
|
|
Remove all copies of 'MaybeRecordStat', replacing them with a single
OptimizingCompilerStats::MaybeRecordStat helper.
Change-Id: I83b96b41439dccece3eee2e159b18c95336ea933
|
|
This patch refactors the way GraphChecker is invoked, utilizing the
same scoping mechanism as pass timing and graph visualizer. Therefore,
GraphChecker will now run not just after instances of HOptimization
but after the builders and reg alloc, too.
Change-Id: I8173b98b79afa95e1fcbf3ac9630a873d7f6c1d4
|
|
Adds a new pass which finds all unreachable blocks, typically due to
simplifying an if-condition to a constant, and removes them from the
graph. The patch also slightly generalizes the graph-transforming
operations.
Change-Id: Iff7c97f1d10b52886f3cd7401689ebe1bfdbf456
|
|
- propagate reference types between instructions
- remove checked casts when possible
- add StackHandleScopeCollection to manage an arbitrary number of stack
handles (see comments)
Change-Id: I31200067c5e7375a5ea8e2f873c4374ebdb5ee60
|
|
Dex code can lead to the creation of a phi with one
float input and one integer input. Since the SSA builder trusts
the verifier, it assumes that the integer input must be converted
to float. However, when the register is not used afterwards, the
verifier hasn't ensured that. Therefore, the compiler must remove
the phi prior to doing type propagation.
Change-Id: Idcd51c4dccce827c59d1f2b253bc1c919bc07df5
|
|
Move existing optimizations to it.
Change-Id: I3b43f9997faf4ed8875162e3a3abdf99375478dd
|
|
Move gVerboseMethods to CompilerOptions. Now "--verbose-methods=" option to
dex2oat rather than runtime argument "-verbose-methods:".
Move ToStr and Dumpable out of logging.h, move LogMessageData into logging.cc
except for a forward declaration.
Remove ConstDumpable as Dump methods are all const (and make this so if not
currently true).
Make LogSeverity an enum and improve compile time assertions and type checking.
Remove log_severity.h that's only used in logging.h.
With system headers gone from logging.h, go add to .cc files missing system
header includes.
Also, make operator new in ValueObject private for compile time instantiation
checking.
Change-Id: I3228f614500ccc9b14b49c72b9821c8b0db3d641
|
|
This reverts commit 1ddbf6d4b37979a9f11a203c12befd5ae8b65df4.
Change-Id: I110a14668d1564ee0604dc958b91394b40da89fc
|
|
This reverts commit bf9cd7ba2118a75f5aa9b56241c4d5fa00dedeb8.
Change-Id: I0a483446666c9c24c45925a5fc199debdefd8b3e
|
|
- Add art::HOptimization.
- Rename art::ConstantPropagation to art::HConstantFolding in
compiler/optimizing/constant_folding.h to avoid name
clashes with a class of the same name in
compiler/dex/post_opt_passes.h.
- Rename art::DeadCodeElimination to
art::HDeadCodeElimination for consistency reasons.
- Have art::HDeadCodeElimination and art::HConstantFolding
derive from art::HOptimization.
- Start to use these optimizations in
art:OptimizingCompiler::TryCompile.
Change-Id: Iaab350c122d87b2333b3760312b15c0592d7e010
|