summaryrefslogtreecommitdiff
path: root/compiler/optimizing/loop_optimization.cc
AgeCommit message (Collapse)Author
2017-10-13Fix min/max SIMD reductionGoran Jakovljevic
Use HVecReplicateScalar instead of HVecSetScalars when creating an initial vector for min/max. This way we are preventing that zeroes from the initial vector are taken into account for min/max calculations. Otherwise, min(MAX_INT, x[0],.., x[n-1]) = 0 if each x[i] is positive which is incorrect. Added regression test cases in 661-checker-simd-reduc. Test: ./testrunner.py --target --optimizing in QEMU (arm64) Change-Id: I1779eefc7f2ab9971dec561b2e1fbf262652410e
2017-10-12ARM: Support SIMD reduction for 32-bit backend.Artem Serov
Support SIMD reduction (add, min, max) and SAD (for int->int only) idioms for arm (32-bit) backend. Test: test-art-target, test-art-host Test: 661-checker-simd-reduc, 660-checker-simd-sad-int Change-Id: Ic6121f5d781a9bcedc33041b6c4ecafad9b0420a
2017-10-06ART: Use ScopedArenaAllocator for pass-local data.Vladimir Marko
Passes using local ArenaAllocator were hiding their memory usage from the allocation counting, making it difficult to track down where memory was used. Using ScopedArenaAllocator reveals the memory usage. This changes the HGraph constructor which requires a lot of changes in tests. Refactor these tests to limit the amount of work needed the next time we change that constructor. Test: m test-art-host-gtest Test: testrunner.py --host Test: Build with kArenaAllocatorCountAllocations = true. Bug: 64312607 Change-Id: I34939e4086b500d6e827ff3ef2211d1a421ac91a
2017-10-05Try to preserve dex pc better in vector code.Aart Bik
Also improves a few comment and uses new data type method to test type consistency. Test: test-art-host Change-Id: I4a17f9d5bc458a091a259dd45ebcdc6531abbf84
2017-10-03ART: Introduce Uint8 compiler data type.Vladimir Marko
This CL adds all the necessary codegen for the Uint8 type but does not add code transformations that use that code. Vectorization codegens are modified to use Uint8 as the packed type when appropriate. The side effects are now disconnected from the instruction's type after the graph has been built to allow changing HArrayGet/H*FieldGet/HVecLoad to use a type different from the underlying field or array. Note: HArrayGet for String.charAt() is modified to have no side effects whatsoever; Strings are immutable. Test: m test-art-host-gtest Test: testrunner.py --host --optimizing --jit Test: testrunner.py --target --optimizing on Nexus 6P Test: Nexus 6P boots. Bug: 23964345 Change-Id: If2dfffedcfb1f50db24570a1e9bd517b3f17bfd0
2017-10-02Generalized zero/sign-ext analysis. Generalized SAD.Aart Bik
Rationale: The more, the better. Some of the analysis was overly conservative (e.g. extension does not need to happen from terminals only as long as vectorized guarantees higher order bits don't contribute). Also, added hidden-SUB for SAD. Test: test-art-host test-art-target Bug: 64091002 Change-Id: I66afd8fb4292ce5cf14f98f9c5ce2bf2b8c98488
2017-09-27Added SAD test. Generalized vector analysis of narrow type.Aart Bik
Rationale: The new example shows that scalar type of array reference does not reflect signed-ness or unsigned-ness of vector operation. Instead the vectorizer's analysis looks at zero or sign extension to determine what operation is required and passes this as explicit or implicit attribute to the code generator. So don't use packed data type to decide what operation to perform. This become relevant while switching to explicit signed and unsigned data types, where we want to pass the right type to make this decision in the future Test: test-art-host test-art-target Bug: 64091002 Change-Id: I49a8827a13dd703910effcb5a5ebc4b9646cd1e8
2017-09-25ART: Introduce compiler data type.Vladimir Marko
Replace most uses of the runtime's Primitive in compiler with a new class DataType. This prepares for introducing new types, such as Uint8, that the runtime does not need to know about. Test: m test-art-host-gtest Test: testrunner.py --host Bug: 23964345 Change-Id: Iec2ad82454eec678fffcd8279a9746b90feb9b0c
2017-09-21Implement Sum-of-Abs-Differences idiom recognition.Aart Bik
Rationale: Currently just on ARM64 (x86 lacks proper support), using the SAD idiom yields great speedup on loops that compute the sum-of-abs-difference operation. Also includes some refinements around type conversions. Speedup ExoPlayerAudio (golem run): 1.3x on ARM64 1.1x on x86 Test: test-art-host test-art-target Bug: 64091002 Change-Id: Ia2b711d2bc23609a2ed50493dfe6719eedfe0130
2017-09-11No unrolling for large loop bodies.Aart Bik
Rationale: should yield 1, not 0 Test: test-art-host test-art-target Change-Id: I0ca68b2a5a4dba1c3e41248376002d9635716840
2017-09-07ARM64: Tune SIMD loop unrolling factor heuristic.Artem Serov
Improve SIMD loop unrolling factor heuristic for ARM64 by accounting for max desired loop size, trip_count, etc. The following example shows 21% perf increase: for (int i = 0; i < LENGTH; i++) { bc[i] = ba[i]; // Byte arrays } Test: test-art-host, test-art-target. Change-Id: Ic587759c51aa4354df621ffb1c7ce4ebd798dfc1
2017-09-06Pass stats into the loop optimization phase.Aart Bik
Test: market scan. Change-Id: I58b23b8d254883f30619ea3602d34bf93618d432
2017-09-06Added vectorization stats.Aart Bik
Rationale: Provides a (somewhat crude) quantative way to detect changes in loop vectorization and idiom recognition (e.g. by means of market scans, or just inspecting the same application before/after a change). Test: market scan Change-Id: Ic85938ba2b33c967de3159742c60301454a152a0
2017-09-05Basic SIMD reduction support.Aart Bik
Rationale: Enables vectorization of x += .... for very basic (simple, same-type) constructs. Paves the way for more complex (narrower and/or mixed-type) constructs, which will be handled by the next CL. This is a revert of Icb5d6c805516db0a1d911c3ede9a246ccef89a22 and thus a revert^2 of I2454778dd0ef1da915c178c7274e1cf33e271d0f and thus a revert^3 of I1c1c87b6323e01442e8fbd94869ddc9e760ea1fc and thus a revert^4 of I7880c135aee3ed0a39da9ae5b468cbf80e613766 PS1-2 shows what needed to change Test: test-art-host test-art-target Bug: 64091002 Change-Id: I647889e0da0959ca405b70081b79c7d3c9bcb2e9
2017-09-02Revert "Basic SIMD reduction support."Nicolas Geoffray
Fails 530-checker-lse on arm64. Bug: 64091002, 65212948 This reverts commit cfa59b49cde265dc5329a7e6956445f9f7a75f15. Change-Id: Icb5d6c805516db0a1d911c3ede9a246ccef89a22
2017-09-01Basic SIMD reduction support.Aart Bik
Rationale: Enables vectorization of x += .... for very basic (simple, same-type) constructs. Paves the way for more complex (narrower and/or mixed-type) constructs, which will be handled by the next CL. This is a revert^2 of I7880c135aee3ed0a39da9ae5b468cbf80e613766 and thus a revert of I1c1c87b6323e01442e8fbd94869ddc9e760ea1fc PS1-2 shows what needed to change, with regression tests Test: test-art-host test-art-target Bug: 64091002, 65212948 Change-Id: I2454778dd0ef1da915c178c7274e1cf33e271d0f
2017-08-30Revert "Basic SIMD reduction support."Aart Bik
This reverts commit 9879d0eac8fe2aae19ca6a4a2a83222d6383afc2. Getting these type check failures in some builds. Need time to look at this better, so reverting for now :-( dex2oatd F 08-30 21:14:29 210122 226218 code_generator.cc:115] Check failed: CheckType(instruction->GetType(), locations->InAt(0)) PrimDouble C Change-Id: I1c1c87b6323e01442e8fbd94869ddc9e760ea1fc
2017-08-30Basic SIMD reduction support.Aart Bik
Rationale: Enables vectorization of x += .... for very basic (simple, same-type) constructs. Paves the way for more complex (narrower and/or mixed-type) constructs, which will be handled by the next CL. Test: test-art-host test-art-target Bug: 64091002 Change-Id: I7880c135aee3ed0a39da9ae5b468cbf80e613766
2017-08-10Fix performance regression.Aart Bik
Rationale: One "improvement" overlooked in the previous CL hoists a try-test out of the optimization to make sure we don't change HIR when not needed. However, the try-test may affect the outcome of the test, so that was bad, bad! Bug: 64091002 Test: test-art-host Change-Id: Icf5f73e7cbeb209ee5fa5f6c1bef64fe127bb2fd
2017-08-08Set basic framework for detecting reductions.Aart Bik
Rationale: Recognize reductions in loops. Note that reductions are *not* optimized yet (we would proceed with e.g. unrolling and vectorization). This CL merely sets up the basic detection framework. Also does a bit of cleanup on loop optimization code. Bug: 64091002 Test: test-art-host Change-Id: I0f52bd7ca69936315b03d02e83da743b8ad0ae72
2017-08-01ART: Fix SimplifyInduction for an instruction with HEnvironment.Artem Serov
After an instruction is removed during RemoveFromCycle its environment isn't properly cleaned: it still has input instructions present and registered (those instructions still hold records for that). Test: test-art-target, test-art-host. Change-Id: Iea315bdf735d75fe477f43671f05b40dfecc63a8
2017-07-24ART: Include cleanupAndreas Gampe
Let clang-format reorder the header includes. Derived with: * .clang-format: BasedOnStyle: Google IncludeIsMainRegex: '(_test|-inl)?$' * Steps: find . -name '*.cc' -o -name '*.h' | xargs sed -i.bak -e 's/^#include/ #include/' ; git commit -a -m 'ART: Include cleanup' git-clang-format -style=file HEAD^ manual inspection git commit -a --amend Test: mmma art Change-Id: Ia963a8ce3ce5f96b5e78acd587e26908c7a70d02
2017-07-13MIPS32: ART VectorizerLena Djokic
MIPS32 implementation which uses MSA extension. Note: Testing is done with checker parts of tests 640, 645, 646 and 651, locally changed to cover MIPS32 cases. These changes can't be included in this patch since MSA is not a default option. Test: ./testrunner.py --target --optimizing -j1 in QEMU (mips32r6) Change-Id: Ieba28f94c48c943d5444017bede9a5d409149762
2017-06-30Merge "ARM: ART Vectorizer (64-bit vectors)."Treehugger Robot
2017-06-30ARM: ART Vectorizer (64-bit vectors).Artem Serov
Basic vectorization support with 64-bit vector length on ARM 32-bit platforms (128-bit vectors require massive changes in register allocator). Test: test-art-target, test-art-host Change-Id: I1d740146c3f00170fc033ae5fd69d59321ddcbf4
2017-06-29Improved subscript and data dependence analysis.Aart Bik
Rationale: We missed vectorizing a simple stencil operation due to inaccurate unit stride analysis and failure to detect single runtime data dependence test. Test: test-art-host, test-art-target Change-Id: I07ba03455bfb1c0aff371c1244a1328f885d0916
2017-06-28Merge "Prevent loop optimization in debuggable mode."Aart Bik
2017-06-28Prevent loop optimization in debuggable mode.Nicolas Geoffray
bug: 33775412 Test: no scanner crash (torn on whether I should spend some time working on a smali test) Change-Id: I8b94725ce57171b592bede4bf55cd0a9626a8a10
2017-06-27Unrolling and dynamic loop peeling framework in vectorizer.Aart Bik
Rationale: This CL introduces the basic framework for dynamically peeling (to obtain aligned access) and unrolling the vector loop (to reduce looping overhead and allow more target specific optimizations on e.g. SIMD loads and stores). NOTE: The current heuristics are "bogus" and merely meant to exercise the new framework. This CL focuses on introducing correct code for the vectorizer. Heuristics and the memory computations for alignment are to be implemented later. Test: test-art-target, test-art-host Change-Id: I010af1475f42f92fd1daa6a967d7a85922beace8
2017-06-22Fix loop optimization in the presence of environment uses.Nicolas Geoffray
We should not remove instructions that have deoptimize as users, or that have environment uses in a debuggable setup. bug: 62536525 bug: 33775412 Test: 656-loop-deopt Change-Id: Iaec1a0b6e90c6a0169f18c6985f00fd8baf2dece
2017-06-09Merge "MIPS64: Min/max vectorization support"Aart Bik
2017-06-08ART: Fix or disable some tidy warnings.Andreas Gampe
Add a strlcpy shim for the host, so we can use strlcpy instead of strcpy everywhere. Fixed warnings include unused-decls, (some) unreachable code, use after std::move, string char append, leaks, (some) excessive padding. Disable some warnings we cannot or do not want to avoid. Bug: 32619234 Test: m Test: m test-art-host Change-Id: Ie191985eebb160d94b988b41735d4f0a1fa1b54e
2017-06-07MIPS64: Min/max vectorization supportGoran Jakovljevic
Test: mma test-art-host-gtest Test: ./testrunner.py --optimizing --target --64 in QEMU Change-Id: I60dc9c97c2b6470414fa64750e7c9824e70bfb4e
2017-06-05Pass through inputs beyond arguments in invoke.Aart Bik
Rationale: Refinement requested by vmarko. Test: test-art-host Change-Id: I850466ebd5ad99bb617bc71c279159862e18e6ec
2017-05-29MIPS64: ART VectorizerGoran Jakovljevic
MIPS64 implementation which uses MSA extension. Also extended all relevant checker tests to test MIPS64 implementation. Test: booted MIPS64R6 in QEMU Test: ./testrunner.py --target --optimizing -j1 in QEMU Change-Id: I8b8a2f601076bca1925e21213db8ed1d41d79b52
2017-05-24Support for narrow operands in "dangerous" operations.Aart Bik
This is a revert^2 of commit 636e870d55c1739e2318c2180fac349683dbfa97. Rationale: Under strict conditions, even operations that are sensitive to higher order bits can vectorize by inspecting the operands carefully. This enables more vectorization, as demonstrated by the removal of quite a few TODOs. Test: test-art-target, test-art-host Change-Id: Ic2684f771d2e36df10432286198533284acaf472
2017-05-23Revert "Support for narrow operands in "dangerous" operations."Nicolas Geoffray
Fails on armv8 / speed-profile This reverts commit 636e870d55c1739e2318c2180fac349683dbfa97. Change-Id: Ib2a09b3adeba994c6b095672a1e08b32d3871872
2017-05-18Support for narrow operands in "dangerous" operations.Aart Bik
Rationale: Under strict conditions, even operations that are sensitive to higher order bits can vectorize by inspecting the operands carefully. This enables more vectorization, as demonstrated by the removal of quite a few TODOs. Test: test-art-target, test-art-host Change-Id: I2b0fda6a182da9aed9ce1708a53eaf0b7e1c9146
2017-05-18Made idiom recognition more robust.Aart Bik
Rationale: Recognition is now more robust with respect to operation order or even cancelling constants. Test: test-art-target, test-art-host Change-Id: I4e920150e20e1453bb081e3f0ddcda8f1c605672
2017-05-15Min/max SIMDization support.Aart Bik
Rationale: The more vectorized, the better! Test: test-art-target, test-art-host Change-Id: I758becca5beaa5b97fab2ab70f2e00cb53458703
2017-05-09Moved knowledge on masking shift operands to simplifier.Aart Bik
Rationale: It is better to have a single place that simplifies shift factors outside the 32-bit or 64-bit distance range, so that other phases (induction variable analysis, loop optimizations, etc.) do not have to know about that. Test: test-art-target, test-art-host Change-Id: Idfd90259cca085426cc3055eccb90f3c0976036b
2017-05-01Bug fix on shift that exceeds "lane width".Aart Bik
Rationale: ARM is a bit less forgiving on shifting more than the lane width of the SIMD instruction (rejecting such cases is no loss, since it yields 0 anyway and should be optimized differently). Bug: 37776122 Test: test-art-target, test-art-host Change-Id: I22d04afbfce82b4593f17c2f48c1fd5a0805d305
2017-04-28Enable string "array get" vectorization.Aart Bik
Rationale: Like its scalar counterpart, the SIMD implementation of array get from a string needs to deal with compressed and uncompressed cases. Micro benchmarks shows 2x to 3x speedup for just copying data! Test: test-art-target, test-art-host Change-Id: I2fd714e50715b263123c215cd181f19194456d2b
2017-04-25Pack booleans in the already existing bit field.Aart Bik
Also adds is_string_char_at boolean in preparation of [un]compressed string vectorization support. Test: test-art-target, test-art-host Change-Id: Ia99b28564727bf91b3d5cfc49f6d40a4dd1ffd3b
2017-04-19Implement halving add idiom (with checker tests).Aart Bik
Rationale: First of several idioms that map to very efficient SIMD instructions. Note that the is-zero-ext and is-sign-ext are general-purpose utilities that will be widely used in the vectorizer to detect low precision idioms, so expect that code to be shared with many CLs to come. Test: test-art-host, test-art-target Change-Id: If7dc2926c72a2e4b5cea15c44ef68cf5503e9be9
2017-04-17Merge "Fixed bug on pending environment use of termination condition. With ↵Aart Bik
regression test."
2017-04-14Fixed bug on pending environment use of termination condition.Aart Bik
With regression test. Test: test-art-host Bug: 37247891 Change-Id: I55b06939d465d3ddb736d1ba659b1df179a5c390
2017-04-12Merge changes I1d4db176,Ifb931a99Aart Bik
* changes: ARM64: Support vectorization for double and long. ARM64: Support 128-bit registers for SIMD.
2017-04-11Fix bug in vectorization of charAt, with regression testAart Bik
Rationale: String array get instruction cannot be vectorized in a straightforward way, since compression has to be dealt with. So rejected for now. Test: test-art-target, test-art-host Bug: 37151445 Change-Id: I16112cb8b1be30babd8ec07af5976db0369f8c28
2017-04-10Add checker part of test, fix intrinsic copyingAart Bik
Rationale: I forgot to add the check test part of this test, and incidentally found an omission: intrinsic information should be set in the scalar loop (to get best code there too, not just a lib call). Test: test-art-host, test-art-target Change-Id: I94aa4cdf042f72690d10efee3a9dc7c476d5c5e0