Age | Commit message (Collapse) | Author |
|
Arm SVE register size is not fixed and can be a
multiple of 128 bits. To support that the patch
removes explicit assumptions on the SIMD register
size to be 128 bit from the vectorizer and code
generators and enables configurable SVE vector
length autovectorization, e.g. extends SIMD register
save/restore routines.
Test: art SIMD tests on VIXL simulator.
Test: art tests on FVP (steps in test/README.arm_fvp.md)
with FVP arg:
-C SVE.ScalableVectorExtension.veclen=[2,4]
(SVE vector [128,256] bits wide)
Change-Id: Icb46e7eb17f21d3bd38b16dd50f735c29b316427
|
|
This CL brings support for predicated execution for
auto-vectorizer and implements arm64 SVE vector backend.
This version passes all the VIXL simulator-runnable tests in
SVE mode with checker off (as all VecOp CHECKs need to be
adjusted for an extra input) and all tests in NEON mode.
Test: art SIMD tests on VIXL simulator.
Test: art tests on FVP (steps in test/README.arm_fvp.md)
Change-Id: Ib78bde31a15e6713d875d6668ad4458f5519605f
|
|
Support Loop Versioning in SuberblockCloner as a tool to
enable further optimization (e.g. Dynamic Loop Unrolling).
The patch brings the feature in without enabling it.
Replace std::cout with LOG(INFO) for debug dumps.
Test: superblock_cloner_test.
Test: test-art-target.
Change-Id: I303cabfb752b8c3c8597abfc0ac261e8616e8cee
|
|
ART vectorizer assumes that there is single size of SIMD
register used for the whole program. Make this assumption explicit
and refactor the code.
Note: This is a base for the future introduction of SIMD slots of
size other than 8 or 16 bytes.
Test: test-art-target, test-art-host.
Change-Id: Id699d5e3590ca8c655ecd9f9ed4e63f49e3c4f9c
|
|
IsAddConst2 function tried to extract addition chains
for the halving add idiom: (A + B) >> 1. The problem
was that regular shift right (x >> 1) was accepted for the
idiom (with {A: x, B: 0}) and not processed as a shift - which
broke the assumptions on shifts right and operand signedness.
This CL fixes that.
Test: 646-checker-simd-hadd.
Test: test-art-target.
Change-Id: Icf71e1a8e8c54e68114d7d5d6c4aa8a47ea5234d
|
|
SAD vectorization idiom (and DotProduct which was based on it)
had a bug when some instruction in the loop was visited twice
causing a compiler crash. GenerateVecOp() was called for both
the idiom root instruction and its argument in the idiom
vectorization routine; however the argument could have been
already processed by that time. It happened when two vectorization
idioms' matched patterns had a common sub-expression.
Test: test-art-target.
Test: 623-checker-loop-regressions.
Change-Id: I8823c52f8ef62377c29310f0e335b9728d11068a
|
|
Test: aosp_taimen-userdebug boots.
Test: m test-art-host-gtest
Test: testrunner.py --host --optimizing
Bug: 147346243
Change-Id: I97fdc15e568ae3fe390efb1da690343025f84944
|
|
This reverts commit 8e895008a3e2f2813bb46cb0c6bc76884e46e9ac.
Reason for revert: The test failure seems unrelated.
Bug: 144947842
Change-Id: I7b437f0443d71a5c762e1a8372564ed989971cc9
|
|
This reverts commit 7cf5607f472020711e36eedbbfebb25b40d3f90e.
Bug: 144947842
Reason for revert: Seems to have broken android.jvmti.cts.JvmtiHostTest1936#testJvmt
Change-Id: Ied6ff6ddf1cb2e3e76adcaa0fda5e36af254b7c5
|
|
This reverts commit b8c884e5f22390386b202459ab55ef3046631e42.
And fixes a codegen bug (the reason why the original CL was
reverted).
Test: 684-checker-simd-dotprod
Test: DEX2OAT_HOST_INSTRUCTION_SET_FEATURES="sse4.1" test.py --host
Test: test.py --host --jit --gcstress
Change-Id: Ibef925d1037abc9cb5f3d4dbd79f1d1eceae2f71
Signed-off-by: Shalini Salomi Bodapati <shalini.salomi.bodapati@intel.com>
|
|
This reverts commit 4b7caeee57767f6bce7bb138a1299c0ae84bebf9.
Reason for revert: Test failure in jit-gcstress mode.
+Exception in thread "main" java.lang.Error: Expected: 131072, found: 0
+ at other.TestCharShort.expectEquals(TestCharShort.java:474)
+ at other.TestCharShort.testDotProd(TestCharShort.java:486)
+ at other.TestCharShort.run(TestCharShort.java:525)
+ at Main.main(Main.java:28)
Change-Id: I251cf666e8335499d227910987b2d49629c3f53d
|
|
8% improvement in microbench for integral data types
Test: ./test.py --host --64
Change-Id: I26b584f29d677283195c69b68650651368c656d1
Signed-off-by: Shalini Salomi Bodapati <shalini.salomi.bodapati@intel.com>
|
|
This reverts commit e2727154f25e0db9a5bb92af494d8e47b181dfcf.
Reason for revert: Breaks ASAN tests (ODR violation).
Bug: 142365358
Change-Id: I38103d74a1297256c81d90872b6902ff1e9ef7a4
|
|
Make symbols in compiler/optimizing hidden by a namespace
attribute. The unit intrinsic_objects.{h,cc} is excluded as
it is needed by dex2oat.
As the symbols are no longer exported, gtests are now linked
with the static version of the libartd-compiler library.
libart-compiler.so size:
- before:
arm: 2396152
arm64: 3345280
- after:
arm: 2016176 (-371KiB, -15.9%)
arm64: 2874480 (-460KiB, -14.1%)
Test: m test-art-host-gtest
Test: testrunner.py --host --optimizing --jit
Bug: 142365358
Change-Id: I1fb04a33351f53f00b389a1642e81a68e40912a8
|
|
Instructions that are not used outside of the inner loop were
inadvertently being removed. Make sure this does not happen.
Original author: Georgia Kouveli <georgia.kouveli@linaro.org>
Committed by: David Horstmann <david.horstmann@linaro.org>
Test: 1961-checker-loop-vectorizer
Test: test-art-target
Change-Id: I3af9e861e75669457e5925dd1d655db784a55287
|
|
SuspendCheck environment is incorrectly initialized with
a stale version of the loop induction variable (a pre-loop one)
for vectorized loops. The value can be retrieved from a
corresponding stack maps only in case of asynchronous
deoptimization in debuggable mode. Thus this workaround forbids
loop optimizations on debuggable graphs so the bug is never
triggered.
Test: test-art-target, test-art-host.
Bug: 138601207
Change-Id: Ica9f61f471c024146b7823214ef952e1db2a4663
|
|
This is a follow up for the below patch:
https://android-review.googlesource.com/c/platform/build/+/830841
Test: ./test.py --host --64, test-art-host-gtest
Change-Id: Id2aa473035556ee230e66addeb69707df8530e75
Signed-off-by: Shalini Salomi Bodapati <shalini.salomi.bodapati@intel.com>
|
|
Test: ./test.py --host, test-art-host-gtest
Change-Id: I48d05e6f6befd54657d962119a543b27a8a51d71
Signed-off-by: Shalini Salomi Bodapati <shalini.salomi.bodapati@intel.com>
|
|
Because loop unrolling is part of a general loop optimization pass,
it needs to update induction ranges as it will invalidate its
instruction cache with new instructions.
Bug: 131174583
Test: 696-loop
Change-Id: Id3628efe316b58f69abbd9ebd43e891a8e42529f
|
|
Implement support for vectorization idiom which performs dot
product of two vectors and adds the result to wider precision
components in the accumulator.
viz. DOT_PRODUCT([ a1, .. , am], [ x1, .. , xn ], [ y1, .. , yn ]) =
[ a1 + sum(xi * yi), .. , am + sum(xj * yj) ],
for m <= n, non-overlapping sums,
for either both signed or both unsigned operands x, y.
The patch shows up to 7x performance improvement on a micro
benchmark on Cortex-A57.
Test: 684-checker-simd-dotprod.
Test: test-art-host, test-art-target.
Change-Id: Ibab0d51f537fdecd1d84033197be3ebf5ec4e455
|
|
Performs whole loop unrolling for small loops with small
trip count to eliminate the loop check overhead, to have
more opportunities for inter-iteration optimizations.
caffeinemark/FloatAtom: 1.2x performance on arm64 Cortex-A57.
Test: 530-checker-peel-unroll.
Test: test-art-host, test-art-target.
Change-Id: Idf3fe3cb611376935d176c60db8c49907222e28a
|
|
Refactor scalar loop peeling and unrolling to eliminate repeated
checks and graph traversals, to make the code more readable and
to make it easier to add new scalar loop opts.
This is a prerequisite for full unrolling patch.
Test: 530-checker-peel-unroll.
Test: test-art-target, test-art-host.
Change-Id: If824a95f304033555085eefac7524e59ed540322
|
|
Removes CompilerDriver dependency from ImageWriter and
several other classes.
Test: m test-art-host-gtest
Test: testrunner.py --host --optimizing
Test: Pixel 2 XL boots.
Test: m test-art-target-gtest
Test: testrunner.py --target --optimizing
Change-Id: I3c5b8ff73732128b9c4fad9405231a216ea72465
|
|
Turn on scalar loop peeling and unrolling by default.
Test: 482-checker-loop-back-edge-use, 530-checker-peel-unroll
Test: test-art-host, test-art-target, boot-to-gui
Change-Id: Ibfe1b54f790a97b281e85396da2985e0f22c2834
|
|
Rationale:
This will find blatant violations of asserting a no-change
pass change if the graph size changed nevertheless.
Bug: 78171933
Test: test-art-host,target
Change-Id: I07b38e71c75dd6f728246d096976c8333b363329
|
|
Test: : test-art-host,target
Change-Id: I7f00315c61ed99723236283bc39a4c7fb279df47
|
|
Rationale:
The change adds a return value to Run() in preparation of
conditional pass execution. The value returned by Run() is
best effort, returning false means no optimizations were
applied or no useful information was obtained. I filled
in a few cases with more exact information, others
still just return true. In addition, it integrates inlining
as a regular pass, avoiding the ugly "break" into
optimizations1 and optimziations2.
Bug: b/78171933, b/74026074
Test: test-art-host,target
Change-Id: Ia39c5c83c01dcd79841e4b623917d61c754cf075
|
|
Rationale:
Running GVN earlier allows for better subsequent
instruction simplifation. For example, running GVN
before select generation also finds the MIN in:
if (x > a[i])
x = a[i];
Bug: b/74026074
Test: test-art-host,target
Change-Id: I633046375637c7809a3603fdf7c5cf77e8f21167
|
|
Implement scalar loop peeling for invariant exits elimination
(on arm64). If the loop exit condition is loop invariant then
loop peeling + GVN + DCE can eliminate this exit in the loop
body. Note: GVN and DCE aren't applied during loop optimizations.
Note: this functionality is turned off by default now.
Test: test-art-host, test-art-target, boot-to-gui.
Change-Id: I98d20054a431838b452dc06bd25c075eb445960c
|
|
Bug: b/74026074
Test: test-art-host,target
Change-Id: Ic6ee31be6192fb2b3bae3be8986da261a744be07
|
|
Bug: b/74026074
Looks like this CL is not the culprit :( Apologies Aart.
This reverts commit 7f31326f7956d6a1630e7e53473b0581705796ec.
Change-Id: I15830324bb276129bf44caf232af24d7c022ed9a
|
|
Bug: b/74026074
Fails 551-checker-shifter-operand on target.
This reverts commit 81a1f853925d88d19119e850e22b7f66bddef63b.
Change-Id: If3094f73744bbb5f9ab1df4509f757df24af0047
|
|
Rationale:
Slightly more general detection of + and - with
constants ensures less cases are undetected.
Bug: b/74026074
Test: test-art-host,target
Change-Id: Ie5bb2dd10294436a27487e5a1ddc77d9e2dd2303
|
|
|
|
Implement scalar loop unrolling for small loops
(on arm64) with known trip count to reduce loop check
and branch penalty and to provide more opportunities
for instruction scheduling.
Note: this functionality is turned off by default now.
Test: cloner_test.cc
Test: test-art-target, test-art-host
Change-Id: Ic27fd8fb0bc0d7b69251252da37b8b510bc30acc
|
|
Rationale:
More saturation is better!
Bug: b/74026074
Test: test-art-host,target
Change-Id: Ib99e8965f26d96d956bcd3dbc7eb17b6c0050a24
|
|
Rationale:
Changes requested by Artem and Vladimir.
Bug: b/74026074
Test: test-art-host,target
Change-Id: If99dd2345369d4260e3b514bb212f81d433420a3
|
|
Rationale:
Because faster is better.
Bug: b/74026074
Test: test-art-host,target
Change-Id: Ifa970a62cef1c0b8bb1c593f629d8c724f1ffe0e
|
|
Rationale:
Having explicit MIN/MAX/ABS operations (in contrast
with intrinsics) simplifies recognition and optimization
of these common operations (e.g. constant folding, hoisting,
detection of saturation arithmetic). Furthermore, mapping
conditionals, selectors, intrinsics, etc. (some still TBD)
onto these operations generalizes the way they are optimized
downstream substantially.
Bug: b/65164101
Test: test-art-host,target
Change-Id: I69240683339356e5a012802f179298f0b04c6726
|
|
NOTE: includes a file that should have been there.
Bug: b/65164101
Test: test-art-host,target
Change-Id: Ic786b84b2635ea8f5909ad77196857f6de65bf26
|
|
Rationale:
Currently we have some remaining ugliness around signed and unsigned
SIMD operations due to lack of kUint32 and kUint64 in the HIR. By
"softly" introducing these types, ABS/MIN/MAX/HALVING_ADD/SAD_ACCUMULATE
operations can solely rely on the packed data types to distinguish
between signed and unsigned operations. Cleaner, and also allows for
some code removal in the current loop optimizer.
Bug: 72709770
Test: test-art-host test-art-target
Change-Id: I68e4cdfba325f622a7256adbe649735569cab2a3
|
|
Test: test-art-host test-art-target
Change-Id: I6f321446f54943e02f250732ec9da729f633c3a9
|
|
Rationale:
Refactors the way we set up optimization passes
in the compiler into a more centralized approach.
The refactoring also found some "holes" in the
existing mechanism (missing string lookup in
the debugging mechanism, or inablity to set
alternative name for optimizations that may repeat).
Bug: 64538565
Test: test-art-host test-art-target
Change-Id: Ie5e0b70f67ac5acc706db91f64612dff0e561f83
|
|
Test: test-art-host test-art-target
Change-Id: I32a3e21f96cdcbab2e108d71746670408deb901a
|
|
Cleanup errors from upstream cpplint in preparation
for moving art's cpplint fork to upstream tip-of-tree cpplint.
Test: cd art && mm
Bug: 68951293
Change-Id: I15faed4594cbcb8399850f8bdee39d42c0c5b956
|
|
Adding InstructionSet::kLast shall make it easier to encode
the InstructionSet in fewer bits using BitField<>. However,
introducing `kLast` into the `art` namespace is not a good
idea, so we change the InstructionSet to an enum class.
This also uncovered a case of InstructionSet::kNone being
erroneously used instead of vixl32::Condition::None(), so
it's good to remove `kNone` from the `art` namespace.
Test: m test-art-host-gtest
Test: testrunner.py --host --optimizing
Change-Id: I6fa6168dfba4ed6da86d021a69c80224f09997a6
|
|
|
|
Rationale:
Since aligned data access is generally better (enables more efficient
aligned moves and prevents nasty cache line splits), computing and/or
enforcing alignment has been added to the vectorizer:
(1) If the initial alignment is known completely and suffices,
then a static peeling factor enforces proper alignment.
(2) If (1) fails, but the base alignment allows, dynamically peeling
until total offset is aligned forces proper aligned access patterns.
By using ART conventions only, any forced alignment is preserved
over suspends checks where data may move.
Note 1:
Current allocation convention is just 8 byte alignment on arrays/strings,
so only ARM32 benefits. However, all optimizations are implemented in
a general way, so moving to a 16 byte alignment will immediately
take advantage of any new convention!!
Note 2:
This CL also exposes how bad the choice of 12 byte offset of arrays
really is. Even though the new optimizations fix the misaligned, it
requires peeling for the most common case: 0 indexed loops. Therefore,
we may even consider moving to a 16 byte offset. Again the optimizations
in this CL will immediately take advantage of that new convention!!
Test: test-art-host test-art-target
Change-Id: Ib6cc0fb68c9433d3771bee573603e64a3a9423ee
|
|
Enables vectorization of x += .... for very basic (simple, same-type)
constructs for MIPS.
Note: Testing is done with checker parts of tests 661 and 665,
locally changed to cover MIPS32 cases. These changes can't
be included in this patch since MSA is not a default option.
Test: test-art-host test-art-target
Change-Id: Ia3b3646afecb76c2f00996a30923ca70302be57e
|
|
Rationale:
This was needed to fix the regression introduced by
a prior type based cl. With the new type system ramping
up, however, this is actually more simplification (remove
the And recognition for example) than new code!
Test: test-art-host test-art-target
Bug: 67935418
Change-Id: I4284f8f29f3d26e4033a3014d0c697677cc0d795
|