summaryrefslogtreecommitdiff
path: root/compiler/optimizing/loop_optimization.cc
AgeCommit message (Collapse)Author
2021-02-05ARM64: Support SVE VL other than 128-bit.Artem Serov
Arm SVE register size is not fixed and can be a multiple of 128 bits. To support that the patch removes explicit assumptions on the SIMD register size to be 128 bit from the vectorizer and code generators and enables configurable SVE vector length autovectorization, e.g. extends SIMD register save/restore routines. Test: art SIMD tests on VIXL simulator. Test: art tests on FVP (steps in test/README.arm_fvp.md) with FVP arg: -C SVE.ScalableVectorExtension.veclen=[2,4] (SVE vector [128,256] bits wide) Change-Id: Icb46e7eb17f21d3bd38b16dd50f735c29b316427
2021-02-04ART: Implement predicated SIMD vectorization.Artem Serov
This CL brings support for predicated execution for auto-vectorizer and implements arm64 SVE vector backend. This version passes all the VIXL simulator-runnable tests in SVE mode with checker off (as all VecOp CHECKs need to be adjusted for an extra input) and all tests in NEON mode. Test: art SIMD tests on VIXL simulator. Test: art tests on FVP (steps in test/README.arm_fvp.md) Change-Id: Ib78bde31a15e6713d875d6668ad4458f5519605f
2020-05-01ART: Introduce Loop Versioning in SuberblockCloner.Artem Serov
Support Loop Versioning in SuberblockCloner as a tool to enable further optimization (e.g. Dynamic Loop Unrolling). The patch brings the feature in without enabling it. Replace std::cout with LOG(INFO) for debug dumps. Test: superblock_cloner_test. Test: test-art-target. Change-Id: I303cabfb752b8c3c8597abfc0ac261e8616e8cee
2020-04-17ART: Refactor SIMD slots and regs size processing.Artem Serov
ART vectorizer assumes that there is single size of SIMD register used for the whole program. Make this assumption explicit and refactor the code. Note: This is a base for the future introduction of SIMD slots of size other than 8 or 16 bytes. Test: test-art-target, test-art-host. Change-Id: Id699d5e3590ca8c655ecd9f9ed4e63f49e3c4f9c
2020-04-14ART: Fix vectorizer HalvingAdd idiom.Artem Serov
IsAddConst2 function tried to extract addition chains for the halving add idiom: (A + B) >> 1. The problem was that regular shift right (x >> 1) was accepted for the idiom (with {A: x, B: 0}) and not processed as a shift - which broke the assumptions on shifts right and operand signedness. This CL fixes that. Test: 646-checker-simd-hadd. Test: test-art-target. Change-Id: Icf71e1a8e8c54e68114d7d5d6c4aa8a47ea5234d
2020-03-04ART: Fix a compiler crash for VectorizeDef() idioms.Artem Serov
SAD vectorization idiom (and DotProduct which was based on it) had a bug when some instruction in the loop was visited twice causing a compiler crash. GenerateVecOp() was called for both the idiom root instruction and its argument in the idiom vectorization routine; however the argument could have been already processed by that time. It happened when two vectorization idioms' matched patterns had a common sub-expression. Test: test-art-target. Test: 623-checker-loop-regressions. Change-Id: I8823c52f8ef62377c29310f0e335b9728d11068a
2020-02-13Remove MIPS support from Optimizing.Vladimir Marko
Test: aosp_taimen-userdebug boots. Test: m test-art-host-gtest Test: testrunner.py --host --optimizing Bug: 147346243 Change-Id: I97fdc15e568ae3fe390efb1da690343025f84944
2019-12-05Revert^4 "Implement Dot Product Vectorization for x86"Alex Light
This reverts commit 8e895008a3e2f2813bb46cb0c6bc76884e46e9ac. Reason for revert: The test failure seems unrelated. Bug: 144947842 Change-Id: I7b437f0443d71a5c762e1a8372564ed989971cc9
2019-11-26Revert "Revert^2 "Implement Dot Product Vectorization for x86""Nicolas Geoffray
This reverts commit 7cf5607f472020711e36eedbbfebb25b40d3f90e. Bug: 144947842 Reason for revert: Seems to have broken android.jvmti.cts.JvmtiHostTest1936#testJvmt Change-Id: Ied6ff6ddf1cb2e3e76adcaa0fda5e36af254b7c5
2019-10-31Revert^2 "Implement Dot Product Vectorization for x86"Vladimir Marko
This reverts commit b8c884e5f22390386b202459ab55ef3046631e42. And fixes a codegen bug (the reason why the original CL was reverted). Test: 684-checker-simd-dotprod Test: DEX2OAT_HOST_INSTRUCTION_SET_FEATURES="sse4.1" test.py --host Test: test.py --host --jit --gcstress Change-Id: Ibef925d1037abc9cb5f3d4dbd79f1d1eceae2f71 Signed-off-by: Shalini Salomi Bodapati <shalini.salomi.bodapati@intel.com>
2019-10-23Revert "Implement Dot Product Vectorization for x86"Vladimir Marko
This reverts commit 4b7caeee57767f6bce7bb138a1299c0ae84bebf9. Reason for revert: Test failure in jit-gcstress mode. +Exception in thread "main" java.lang.Error: Expected: 131072, found: 0 + at other.TestCharShort.expectEquals(TestCharShort.java:474) + at other.TestCharShort.testDotProd(TestCharShort.java:486) + at other.TestCharShort.run(TestCharShort.java:525) + at Main.main(Main.java:28) Change-Id: I251cf666e8335499d227910987b2d49629c3f53d
2019-10-23Implement Dot Product Vectorization for x86Shalini Salomi Bodapati
8% improvement in microbench for integral data types Test: ./test.py --host --64 Change-Id: I26b584f29d677283195c69b68650651368c656d1 Signed-off-by: Shalini Salomi Bodapati <shalini.salomi.bodapati@intel.com>
2019-10-14Revert "Make compiler/optimizing/ symbols hidden."Vladimir Marko
This reverts commit e2727154f25e0db9a5bb92af494d8e47b181dfcf. Reason for revert: Breaks ASAN tests (ODR violation). Bug: 142365358 Change-Id: I38103d74a1297256c81d90872b6902ff1e9ef7a4
2019-10-14Make compiler/optimizing/ symbols hidden.Vladimir Marko
Make symbols in compiler/optimizing hidden by a namespace attribute. The unit intrinsic_objects.{h,cc} is excluded as it is needed by dex2oat. As the symbols are no longer exported, gtests are now linked with the static version of the libartd-compiler library. libart-compiler.so size: - before: arm: 2396152 arm64: 3345280 - after: arm: 2016176 (-371KiB, -15.9%) arm64: 2874480 (-460KiB, -14.1%) Test: m test-art-host-gtest Test: testrunner.py --host --optimizing --jit Bug: 142365358 Change-Id: I1fb04a33351f53f00b389a1642e81a68e40912a8
2019-08-13Loop vectorizer should not remove instructions that can throwGeorgia Kouveli
Instructions that are not used outside of the inner loop were inadvertently being removed. Make sure this does not happen. Original author: Georgia Kouveli <georgia.kouveli@linaro.org> Committed by: David Horstmann <david.horstmann@linaro.org> Test: 1961-checker-loop-vectorizer Test: test-art-target Change-Id: I3af9e861e75669457e5925dd1d655db784a55287
2019-07-31ART: Disable vectorization for debuggable graphs.Artem Serov
SuspendCheck environment is incorrectly initialized with a stale version of the loop induction variable (a pre-loop one) for vectorized loops. The value can be retrieved from a corresponding stack maps only in case of asynchronous deoptimization in debuggable mode. Thus this workaround forbids loop optimizations on debuggable graphs so the bug is never triggered. Test: test-art-target, test-art-host. Bug: 138601207 Change-Id: Ica9f61f471c024146b7823214ef952e1db2a4663
2019-07-17Add AVX support for packed mul/div instructions.Shalini Salomi Bodapati
This is a follow up for the below patch: https://android-review.googlesource.com/c/platform/build/+/830841 Test: ./test.py --host --64, test-art-host-gtest Change-Id: Id2aa473035556ee230e66addeb69707df8530e75 Signed-off-by: Shalini Salomi Bodapati <shalini.salomi.bodapati@intel.com>
2019-06-05Add AVX support for packed add/sub instructions on x86Shalini Salomi Bodapati
Test: ./test.py --host, test-art-host-gtest Change-Id: I48d05e6f6befd54657d962119a543b27a8a51d71 Signed-off-by: Shalini Salomi Bodapati <shalini.salomi.bodapati@intel.com>
2019-04-30Update induction ranges in superblock cloner.Nicolas Geoffray
Because loop unrolling is part of a general loop optimization pass, it needs to update induction ranges as it will invalidate its instruction cache with new instructions. Bug: 131174583 Test: 696-loop Change-Id: Id3628efe316b58f69abbd9ebd43e891a8e42529f
2018-09-25ART: ARM64: Support DotProd SIMD idiom.Artem Serov
Implement support for vectorization idiom which performs dot product of two vectors and adds the result to wider precision components in the accumulator. viz. DOT_PRODUCT([ a1, .. , am], [ x1, .. , xn ], [ y1, .. , yn ]) = [ a1 + sum(xi * yi), .. , am + sum(xj * yj) ], for m <= n, non-overlapping sums, for either both signed or both unsigned operands x, y. The patch shows up to 7x performance improvement on a micro benchmark on Cortex-A57. Test: 684-checker-simd-dotprod. Test: test-art-host, test-art-target. Change-Id: Ibab0d51f537fdecd1d84033197be3ebf5ec4e455
2018-07-04ART: Implement loop full unrolling.Artem Serov
Performs whole loop unrolling for small loops with small trip count to eliminate the loop check overhead, to have more opportunities for inter-iteration optimizations. caffeinemark/FloatAtom: 1.2x performance on arm64 Cortex-A57. Test: 530-checker-peel-unroll. Test: test-art-host, test-art-target. Change-Id: Idf3fe3cb611376935d176c60db8c49907222e28a
2018-07-04ART: Refactor scalar loop optimizations.Artem Serov
Refactor scalar loop peeling and unrolling to eliminate repeated checks and graph traversals, to make the code more readable and to make it easier to add new scalar loop opts. This is a prerequisite for full unrolling patch. Test: 530-checker-peel-unroll. Test: test-art-target, test-art-host. Change-Id: If824a95f304033555085eefac7524e59ed540322
2018-06-25Move instruction_set_ to CompilerOptions.Vladimir Marko
Removes CompilerDriver dependency from ImageWriter and several other classes. Test: m test-art-host-gtest Test: testrunner.py --host --optimizing Test: Pixel 2 XL boots. Test: m test-art-target-gtest Test: testrunner.py --target --optimizing Change-Id: I3c5b8ff73732128b9c4fad9405231a216ea72465
2018-05-15ART: Enable scalar loop peeling and unrolling.Artem Serov
Turn on scalar loop peeling and unrolling by default. Test: 482-checker-loop-back-edge-use, 530-checker-peel-unroll Test: test-art-host, test-art-target, boot-to-gui Change-Id: Ibfe1b54f790a97b281e85396da2985e0f22c2834
2018-05-03Perform rudimentary check on graph size for no-change assertions.Aart Bik
Rationale: This will find blatant violations of asserting a no-change pass change if the graph size changed nevertheless. Bug: 78171933 Test: test-art-host,target Change-Id: I07b38e71c75dd6f728246d096976c8333b363329
2018-05-01Remove some SIMD recognition code.Aart Bik
Test: : test-art-host,target Change-Id: I7f00315c61ed99723236283bc39a4c7fb279df47
2018-04-26Step 1 of 2: conditional passes.Aart Bik
Rationale: The change adds a return value to Run() in preparation of conditional pass execution. The value returned by Run() is best effort, returning false means no optimizations were applied or no useful information was obtained. I filled in a few cases with more exact information, others still just return true. In addition, it integrates inlining as a regular pass, avoiding the ugly "break" into optimizations1 and optimziations2. Bug: b/78171933, b/74026074 Test: test-art-host,target Change-Id: Ia39c5c83c01dcd79841e4b623917d61c754cf075
2018-04-17Run GVN earlier.Aart Bik
Rationale: Running GVN earlier allows for better subsequent instruction simplifation. For example, running GVN before select generation also finds the MIN in: if (x > a[i]) x = a[i]; Bug: b/74026074 Test: test-art-host,target Change-Id: I633046375637c7809a3603fdf7c5cf77e8f21167
2018-04-17ART: Implement scalar loop peeling.Artem Serov
Implement scalar loop peeling for invariant exits elimination (on arm64). If the loop exit condition is loop invariant then loop peeling + GVN + DCE can eliminate this exit in the loop body. Note: GVN and DCE aren't applied during loop optimizations. Note: this functionality is turned off by default now. Test: test-art-host, test-art-target, boot-to-gui. Change-Id: I98d20054a431838b452dc06bd25c075eb445960c
2018-04-03Enabled nested min-max SIMDization for narrower operands.Aart Bik
Bug: b/74026074 Test: test-art-host,target Change-Id: Ic6ee31be6192fb2b3bae3be8986da261a744be07
2018-03-28Revert "Revert "Refined add/sub analysis vis-a-vis SIMD idioms.""Nicolas Geoffray
Bug: b/74026074 Looks like this CL is not the culprit :( Apologies Aart. This reverts commit 7f31326f7956d6a1630e7e53473b0581705796ec. Change-Id: I15830324bb276129bf44caf232af24d7c022ed9a
2018-03-28Revert "Refined add/sub analysis vis-a-vis SIMD idioms."Nicolas Geoffray
Bug: b/74026074 Fails 551-checker-shifter-operand on target. This reverts commit 81a1f853925d88d19119e850e22b7f66bddef63b. Change-Id: If3094f73744bbb5f9ab1df4509f757df24af0047
2018-03-27Refined add/sub analysis vis-a-vis SIMD idioms.Aart Bik
Rationale: Slightly more general detection of + and - with constants ensures less cases are undetected. Bug: b/74026074 Test: test-art-host,target Change-Id: Ie5bb2dd10294436a27487e5a1ddc77d9e2dd2303
2018-03-26Merge "ART: Implement scalar loop unrolling."Aart Bik
2018-03-26ART: Implement scalar loop unrolling.Artem Serov
Implement scalar loop unrolling for small loops (on arm64) with known trip count to reduce loop check and branch penalty and to provide more opportunities for instruction scheduling. Note: this functionality is turned off by default now. Test: cloner_test.cc Test: test-art-target, test-art-host Change-Id: Ic27fd8fb0bc0d7b69251252da37b8b510bc30acc
2018-03-19Recognize signed saturation in single clipping.Aart Bik
Rationale: More saturation is better! Bug: b/74026074 Test: test-art-host,target Change-Id: Ib99e8965f26d96d956bcd3dbc7eb17b6c0050a24
2018-03-15Minor cleanup of saturation arithmetic code.Aart Bik
Rationale: Changes requested by Artem and Vladimir. Bug: b/74026074 Test: test-art-host,target Change-Id: If99dd2345369d4260e3b514bb212f81d433420a3
2018-03-15Vectorization of saturation arithmetic.Aart Bik
Rationale: Because faster is better. Bug: b/74026074 Test: test-art-host,target Change-Id: Ifa970a62cef1c0b8bb1c593f629d8c724f1ffe0e
2018-03-07Introduce MIN/MAX/ABS as HIR nodes.Aart Bik
Rationale: Having explicit MIN/MAX/ABS operations (in contrast with intrinsics) simplifies recognition and optimization of these common operations (e.g. constant folding, hoisting, detection of saturation arithmetic). Furthermore, mapping conditionals, selectors, intrinsics, etc. (some still TBD) onto these operations generalizes the way they are optimized downstream substantially. Bug: b/65164101 Test: test-art-host,target Change-Id: I69240683339356e5a012802f179298f0b04c6726
2018-03-05Introduce ABS as HIR nodes (missing file).Aart Bik
NOTE: includes a file that should have been there. Bug: b/65164101 Test: test-art-host,target Change-Id: Ic786b84b2635ea8f5909ad77196857f6de65bf26
2018-02-01Clean up signed/unsigned in vectorizer.Aart Bik
Rationale: Currently we have some remaining ugliness around signed and unsigned SIMD operations due to lack of kUint32 and kUint64 in the HIR. By "softly" introducing these types, ABS/MIN/MAX/HALVING_ADD/SAD_ACCUMULATE operations can solely rely on the packed data types to distinguish between signed and unsigned operations. Cleaner, and also allows for some code removal in the current loop optimizer. Bug: 72709770 Test: test-art-host test-art-target Change-Id: I68e4cdfba325f622a7256adbe649735569cab2a3
2017-12-07Fixed spilling bug (visible on ARM64): missed SIMD type.Aart Bik
Test: test-art-host test-art-target Change-Id: I6f321446f54943e02f250732ec9da729f633c3a9
2017-11-20Refactored optimization passes setup.Aart Bik
Rationale: Refactors the way we set up optimization passes in the compiler into a more centralized approach. The refactoring also found some "holes" in the existing mechanism (missing string lookup in the debugging mechanism, or inablity to set alternative name for optimizations that may repeat). Bug: 64538565 Test: test-art-host test-art-target Change-Id: Ie5e0b70f67ac5acc706db91f64612dff0e561f83
2017-11-15MIPS: Implement Sum-of-Abs-DifferencesLena Djokic
Test: test-art-host test-art-target Change-Id: I32a3e21f96cdcbab2e108d71746670408deb901a
2017-11-08cpplint: Cleanup errorsIgor Murashkin
Cleanup errors from upstream cpplint in preparation for moving art's cpplint fork to upstream tip-of-tree cpplint. Test: cd art && mm Bug: 68951293 Change-Id: I15faed4594cbcb8399850f8bdee39d42c0c5b956
2017-11-02ART: Make InstructionSet an enum class and add kLast.Vladimir Marko
Adding InstructionSet::kLast shall make it easier to encode the InstructionSet in fewer bits using BitField<>. However, introducing `kLast` into the `art` namespace is not a good idea, so we change the InstructionSet to an enum class. This also uncovered a case of InstructionSet::kNone being erroneously used instead of vixl32::Condition::None(), so it's good to remove `kNone` from the `art` namespace. Test: m test-art-host-gtest Test: testrunner.py --host --optimizing Change-Id: I6fa6168dfba4ed6da86d021a69c80224f09997a6
2017-11-01Merge "Alignment optimizations in vectorizer."Aart Bik
2017-10-27Alignment optimizations in vectorizer.Aart Bik
Rationale: Since aligned data access is generally better (enables more efficient aligned moves and prevents nasty cache line splits), computing and/or enforcing alignment has been added to the vectorizer: (1) If the initial alignment is known completely and suffices, then a static peeling factor enforces proper alignment. (2) If (1) fails, but the base alignment allows, dynamically peeling until total offset is aligned forces proper aligned access patterns. By using ART conventions only, any forced alignment is preserved over suspends checks where data may move. Note 1: Current allocation convention is just 8 byte alignment on arrays/strings, so only ARM32 benefits. However, all optimizations are implemented in a general way, so moving to a 16 byte alignment will immediately take advantage of any new convention!! Note 2: This CL also exposes how bad the choice of 12 byte offset of arrays really is. Even though the new optimizations fix the misaligned, it requires peeling for the most common case: 0 indexed loops. Therefore, we may even consider moving to a 16 byte offset. Again the optimizations in this CL will immediately take advantage of that new convention!! Test: test-art-host test-art-target Change-Id: Ib6cc0fb68c9433d3771bee573603e64a3a9423ee
2017-10-27MIPS: Basic SIMD reduction support.Lena Djokic
Enables vectorization of x += .... for very basic (simple, same-type) constructs for MIPS. Note: Testing is done with checker parts of tests 661 and 665, locally changed to cover MIPS32 cases. These changes can't be included in this patch since MSA is not a default option. Test: test-art-host test-art-target Change-Id: Ia3b3646afecb76c2f00996a30923ca70302be57e
2017-10-20Improve sign and zero extension analysis.Aart Bik
Rationale: This was needed to fix the regression introduced by a prior type based cl. With the new type system ramping up, however, this is actually more simplification (remove the And recognition for example) than new code! Test: test-art-host test-art-target Bug: 67935418 Change-Id: I4284f8f29f3d26e4033a3014d0c697677cc0d795