Age | Commit message (Collapse) | Author |
|
Use HVecReplicateScalar instead of HVecSetScalars when creating an
initial vector for min/max. This way we are preventing that zeroes
from the initial vector are taken into account for min/max
calculations. Otherwise, min(MAX_INT, x[0],.., x[n-1]) = 0 if each
x[i] is positive which is incorrect.
Added regression test cases in 661-checker-simd-reduc.
Test: ./testrunner.py --target --optimizing in QEMU (arm64)
Change-Id: I1779eefc7f2ab9971dec561b2e1fbf262652410e
|
|
Support SIMD reduction (add, min, max) and SAD (for int->int only)
idioms for arm (32-bit) backend.
Test: test-art-target, test-art-host
Test: 661-checker-simd-reduc, 660-checker-simd-sad-int
Change-Id: Ic6121f5d781a9bcedc33041b6c4ecafad9b0420a
|
|
Passes using local ArenaAllocator were hiding their memory
usage from the allocation counting, making it difficult to
track down where memory was used. Using ScopedArenaAllocator
reveals the memory usage.
This changes the HGraph constructor which requires a lot of
changes in tests. Refactor these tests to limit the amount
of work needed the next time we change that constructor.
Test: m test-art-host-gtest
Test: testrunner.py --host
Test: Build with kArenaAllocatorCountAllocations = true.
Bug: 64312607
Change-Id: I34939e4086b500d6e827ff3ef2211d1a421ac91a
|
|
Also improves a few comment and uses new data
type method to test type consistency.
Test: test-art-host
Change-Id: I4a17f9d5bc458a091a259dd45ebcdc6531abbf84
|
|
This CL adds all the necessary codegen for the Uint8 type
but does not add code transformations that use that code.
Vectorization codegens are modified to use Uint8 as the
packed type when appropriate. The side effects are now
disconnected from the instruction's type after the graph has
been built to allow changing HArrayGet/H*FieldGet/HVecLoad
to use a type different from the underlying field or array.
Note: HArrayGet for String.charAt() is modified to have
no side effects whatsoever; Strings are immutable.
Test: m test-art-host-gtest
Test: testrunner.py --host --optimizing --jit
Test: testrunner.py --target --optimizing on Nexus 6P
Test: Nexus 6P boots.
Bug: 23964345
Change-Id: If2dfffedcfb1f50db24570a1e9bd517b3f17bfd0
|
|
Rationale:
The more, the better. Some of the analysis was
overly conservative (e.g. extension does not
need to happen from terminals only as long
as vectorized guarantees higher order bits
don't contribute). Also, added hidden-SUB for SAD.
Test: test-art-host test-art-target
Bug: 64091002
Change-Id: I66afd8fb4292ce5cf14f98f9c5ce2bf2b8c98488
|
|
Rationale:
The new example shows that scalar type of array reference does not
reflect signed-ness or unsigned-ness of vector operation. Instead
the vectorizer's analysis looks at zero or sign extension to determine
what operation is required and passes this as explicit or implicit
attribute to the code generator. So don't use packed data type to
decide what operation to perform. This become relevant while switching
to explicit signed and unsigned data types, where we want to pass the
right type to make this decision in the future
Test: test-art-host test-art-target
Bug: 64091002
Change-Id: I49a8827a13dd703910effcb5a5ebc4b9646cd1e8
|
|
Replace most uses of the runtime's Primitive in compiler
with a new class DataType. This prepares for introducing
new types, such as Uint8, that the runtime does not need
to know about.
Test: m test-art-host-gtest
Test: testrunner.py --host
Bug: 23964345
Change-Id: Iec2ad82454eec678fffcd8279a9746b90feb9b0c
|
|
Rationale:
Currently just on ARM64 (x86 lacks proper support),
using the SAD idiom yields great speedup on loops
that compute the sum-of-abs-difference operation.
Also includes some refinements around type conversions.
Speedup ExoPlayerAudio (golem run):
1.3x on ARM64
1.1x on x86
Test: test-art-host test-art-target
Bug: 64091002
Change-Id: Ia2b711d2bc23609a2ed50493dfe6719eedfe0130
|
|
Rationale:
should yield 1, not 0
Test: test-art-host test-art-target
Change-Id: I0ca68b2a5a4dba1c3e41248376002d9635716840
|
|
Improve SIMD loop unrolling factor heuristic for ARM64 by
accounting for max desired loop size, trip_count, etc. The
following example shows 21% perf increase:
for (int i = 0; i < LENGTH; i++) {
bc[i] = ba[i]; // Byte arrays
}
Test: test-art-host, test-art-target.
Change-Id: Ic587759c51aa4354df621ffb1c7ce4ebd798dfc1
|
|
Test: market scan.
Change-Id: I58b23b8d254883f30619ea3602d34bf93618d432
|
|
Rationale:
Provides a (somewhat crude) quantative way to detect changes in
loop vectorization and idiom recognition (e.g. by means of market
scans, or just inspecting the same application before/after a change).
Test: market scan
Change-Id: Ic85938ba2b33c967de3159742c60301454a152a0
|
|
Rationale:
Enables vectorization of x += .... for very basic (simple, same-type)
constructs. Paves the way for more complex (narrower and/or mixed-type)
constructs, which will be handled by the next CL.
This is a revert of Icb5d6c805516db0a1d911c3ede9a246ccef89a22
and thus a revert^2 of I2454778dd0ef1da915c178c7274e1cf33e271d0f
and thus a revert^3 of I1c1c87b6323e01442e8fbd94869ddc9e760ea1fc
and thus a revert^4 of I7880c135aee3ed0a39da9ae5b468cbf80e613766
PS1-2 shows what needed to change
Test: test-art-host test-art-target
Bug: 64091002
Change-Id: I647889e0da0959ca405b70081b79c7d3c9bcb2e9
|
|
Fails 530-checker-lse on arm64.
Bug: 64091002, 65212948
This reverts commit cfa59b49cde265dc5329a7e6956445f9f7a75f15.
Change-Id: Icb5d6c805516db0a1d911c3ede9a246ccef89a22
|
|
Rationale:
Enables vectorization of x += .... for very basic (simple, same-type)
constructs. Paves the way for more complex (narrower and/or mixed-type)
constructs, which will be handled by the next CL.
This is a revert^2 of I7880c135aee3ed0a39da9ae5b468cbf80e613766
and thus a revert of I1c1c87b6323e01442e8fbd94869ddc9e760ea1fc
PS1-2 shows what needed to change, with regression tests
Test: test-art-host test-art-target
Bug: 64091002, 65212948
Change-Id: I2454778dd0ef1da915c178c7274e1cf33e271d0f
|
|
This reverts commit 9879d0eac8fe2aae19ca6a4a2a83222d6383afc2.
Getting these type check failures in some builds. Need time to look at this better, so reverting for now :-(
dex2oatd F 08-30 21:14:29 210122 226218
code_generator.cc:115] Check failed: CheckType(instruction->GetType(), locations->InAt(0)) PrimDouble C
Change-Id: I1c1c87b6323e01442e8fbd94869ddc9e760ea1fc
|
|
Rationale:
Enables vectorization of x += .... for very basic (simple, same-type)
constructs. Paves the way for more complex (narrower and/or mixed-type)
constructs, which will be handled by the next CL.
Test: test-art-host test-art-target
Bug: 64091002
Change-Id: I7880c135aee3ed0a39da9ae5b468cbf80e613766
|
|
Rationale:
One "improvement" overlooked in the previous CL hoists
a try-test out of the optimization to make sure we don't
change HIR when not needed. However, the try-test may
affect the outcome of the test, so that was bad, bad!
Bug: 64091002
Test: test-art-host
Change-Id: Icf5f73e7cbeb209ee5fa5f6c1bef64fe127bb2fd
|
|
Rationale:
Recognize reductions in loops. Note that reductions are *not*
optimized yet (we would proceed with e.g. unrolling and vectorization).
This CL merely sets up the basic detection framework. Also does
a bit of cleanup on loop optimization code.
Bug: 64091002
Test: test-art-host
Change-Id: I0f52bd7ca69936315b03d02e83da743b8ad0ae72
|
|
After an instruction is removed during RemoveFromCycle its
environment isn't properly cleaned: it still has input instructions
present and registered (those instructions still hold records for
that).
Test: test-art-target, test-art-host.
Change-Id: Iea315bdf735d75fe477f43671f05b40dfecc63a8
|
|
Let clang-format reorder the header includes.
Derived with:
* .clang-format:
BasedOnStyle: Google
IncludeIsMainRegex: '(_test|-inl)?$'
* Steps:
find . -name '*.cc' -o -name '*.h' | xargs sed -i.bak -e 's/^#include/ #include/' ; git commit -a -m 'ART: Include cleanup'
git-clang-format -style=file HEAD^
manual inspection
git commit -a --amend
Test: mmma art
Change-Id: Ia963a8ce3ce5f96b5e78acd587e26908c7a70d02
|
|
MIPS32 implementation which uses MSA extension.
Note: Testing is done with checker parts of tests 640, 645, 646 and
651, locally changed to cover MIPS32 cases. These changes can't
be included in this patch since MSA is not a default option.
Test: ./testrunner.py --target --optimizing -j1 in QEMU (mips32r6)
Change-Id: Ieba28f94c48c943d5444017bede9a5d409149762
|
|
|
|
Basic vectorization support with 64-bit vector length on ARM 32-bit
platforms (128-bit vectors require massive changes in register
allocator).
Test: test-art-target, test-art-host
Change-Id: I1d740146c3f00170fc033ae5fd69d59321ddcbf4
|
|
Rationale:
We missed vectorizing a simple stencil operation
due to inaccurate unit stride analysis and failure
to detect single runtime data dependence test.
Test: test-art-host, test-art-target
Change-Id: I07ba03455bfb1c0aff371c1244a1328f885d0916
|
|
|
|
bug: 33775412
Test: no scanner crash (torn on whether I should spend some time working on a smali test)
Change-Id: I8b94725ce57171b592bede4bf55cd0a9626a8a10
|
|
Rationale:
This CL introduces the basic framework for dynamically peeling
(to obtain aligned access) and unrolling the vector loop (to reduce
looping overhead and allow more target specific optimizations
on e.g. SIMD loads and stores).
NOTE:
The current heuristics are "bogus" and merely meant to exercise
the new framework. This CL focuses on introducing correct code for
the vectorizer. Heuristics and the memory computations for alignment
are to be implemented later.
Test: test-art-target, test-art-host
Change-Id: I010af1475f42f92fd1daa6a967d7a85922beace8
|
|
We should not remove instructions that have deoptimize as
users, or that have environment uses in a debuggable setup.
bug: 62536525
bug: 33775412
Test: 656-loop-deopt
Change-Id: Iaec1a0b6e90c6a0169f18c6985f00fd8baf2dece
|
|
|
|
Add a strlcpy shim for the host, so we can use strlcpy instead of
strcpy everywhere.
Fixed warnings include unused-decls, (some) unreachable code, use
after std::move, string char append, leaks, (some) excessive padding.
Disable some warnings we cannot or do not want to avoid.
Bug: 32619234
Test: m
Test: m test-art-host
Change-Id: Ie191985eebb160d94b988b41735d4f0a1fa1b54e
|
|
Test: mma test-art-host-gtest
Test: ./testrunner.py --optimizing --target --64 in QEMU
Change-Id: I60dc9c97c2b6470414fa64750e7c9824e70bfb4e
|
|
Rationale:
Refinement requested by vmarko.
Test: test-art-host
Change-Id: I850466ebd5ad99bb617bc71c279159862e18e6ec
|
|
MIPS64 implementation which uses MSA extension. Also extended all
relevant checker tests to test MIPS64 implementation.
Test: booted MIPS64R6 in QEMU
Test: ./testrunner.py --target --optimizing -j1 in QEMU
Change-Id: I8b8a2f601076bca1925e21213db8ed1d41d79b52
|
|
This is a revert^2 of commit 636e870d55c1739e2318c2180fac349683dbfa97.
Rationale:
Under strict conditions, even operations that are sensitive
to higher order bits can vectorize by inspecting the operands
carefully. This enables more vectorization, as demonstrated
by the removal of quite a few TODOs.
Test: test-art-target, test-art-host
Change-Id: Ic2684f771d2e36df10432286198533284acaf472
|
|
Fails on armv8 / speed-profile
This reverts commit 636e870d55c1739e2318c2180fac349683dbfa97.
Change-Id: Ib2a09b3adeba994c6b095672a1e08b32d3871872
|
|
Rationale:
Under strict conditions, even operations that are sensitive
to higher order bits can vectorize by inspecting the operands
carefully. This enables more vectorization, as demonstrated
by the removal of quite a few TODOs.
Test: test-art-target, test-art-host
Change-Id: I2b0fda6a182da9aed9ce1708a53eaf0b7e1c9146
|
|
Rationale:
Recognition is now more robust with respect to
operation order or even cancelling constants.
Test: test-art-target, test-art-host
Change-Id: I4e920150e20e1453bb081e3f0ddcda8f1c605672
|
|
Rationale:
The more vectorized, the better!
Test: test-art-target, test-art-host
Change-Id: I758becca5beaa5b97fab2ab70f2e00cb53458703
|
|
Rationale:
It is better to have a single place that simplifies shift
factors outside the 32-bit or 64-bit distance range, so
that other phases (induction variable analysis, loop optimizations,
etc.) do not have to know about that.
Test: test-art-target, test-art-host
Change-Id: Idfd90259cca085426cc3055eccb90f3c0976036b
|
|
Rationale:
ARM is a bit less forgiving on shifting more than
the lane width of the SIMD instruction (rejecting
such cases is no loss, since it yields 0 anyway
and should be optimized differently).
Bug: 37776122
Test: test-art-target, test-art-host
Change-Id: I22d04afbfce82b4593f17c2f48c1fd5a0805d305
|
|
Rationale:
Like its scalar counterpart, the SIMD implementation of array get from
a string needs to deal with compressed and uncompressed cases.
Micro benchmarks shows 2x to 3x speedup for just copying data!
Test: test-art-target, test-art-host
Change-Id: I2fd714e50715b263123c215cd181f19194456d2b
|
|
Also adds is_string_char_at boolean in preparation of
[un]compressed string vectorization support.
Test: test-art-target, test-art-host
Change-Id: Ia99b28564727bf91b3d5cfc49f6d40a4dd1ffd3b
|
|
Rationale:
First of several idioms that map to very efficient SIMD instructions.
Note that the is-zero-ext and is-sign-ext are general-purpose utilities
that will be widely used in the vectorizer to detect low precision
idioms, so expect that code to be shared with many CLs to come.
Test: test-art-host, test-art-target
Change-Id: If7dc2926c72a2e4b5cea15c44ef68cf5503e9be9
|
|
regression test."
|
|
With regression test.
Test: test-art-host
Bug: 37247891
Change-Id: I55b06939d465d3ddb736d1ba659b1df179a5c390
|
|
* changes:
ARM64: Support vectorization for double and long.
ARM64: Support 128-bit registers for SIMD.
|
|
Rationale:
String array get instruction cannot be vectorized
in a straightforward way, since compression has
to be dealt with. So rejected for now.
Test: test-art-target, test-art-host
Bug: 37151445
Change-Id: I16112cb8b1be30babd8ec07af5976db0369f8c28
|
|
Rationale:
I forgot to add the check test part of this test,
and incidentally found an omission: intrinsic
information should be set in the scalar loop
(to get best code there too, not just a lib call).
Test: test-art-host, test-art-target
Change-Id: I94aa4cdf042f72690d10efee3a9dc7c476d5c5e0
|