summaryrefslogtreecommitdiff
path: root/compiler/optimizing/optimizing_compiler.cc
AgeCommit message (Collapse)Author
2017-10-17Use ScopedArenaAllocator for code generation.Vladimir Marko
Reuse the memory previously allocated on the ArenaStack by optimization passes. This CL handles only the architecture-independent codegen and slow paths, architecture-dependent codegen allocations shall be moved to the ScopedArenaAllocator in a follow-up. Memory needed to compile the two most expensive methods for aosp_angler-userdebug boot image: BatteryStats.dumpCheckinLocked() : 19.6MiB -> 18.5MiB (-1189KiB) BatteryStats.dumpLocked(): 39.3MiB -> 37.0MiB (-2379KiB) Also move definitions of functions that use bit_vector-inl.h from bit_vector.h also to bit_vector-inl.h . Test: m test-art-host-gtest Test: testrunner.py --host --optimizing Bug: 64312607 Change-Id: I84688c3a5a95bf90f56bd3a150bc31fedc95f29c
2017-10-11Use ScopedArenaAllocator for building HGraph.Vladimir Marko
Memory needed to compile the two most expensive methods for aosp_angler-userdebug boot image: BatteryStats.dumpCheckinLocked() : 21.1MiB -> 20.2MiB BatteryStats.dumpLocked(): 42.0MiB -> 40.3MiB This is because all the memory previously used by the graph builder is reused by later passes. And finish the "arena"->"allocator" renaming; make renamed allocator pointers that are members of classes const when appropriate (and make a few more members around them const). Test: m test-art-host-gtest Test: testrunner.py --host Bug: 64312607 Change-Id: Ia50aafc80c05941ae5b96984ba4f31ed4c78255e
2017-10-09Use ScopedArenaAllocator for register allocation.Vladimir Marko
Memory needed to compile the two most expensive methods for aosp_angler-userdebug boot image: BatteryStats.dumpCheckinLocked() : 25.1MiB -> 21.1MiB BatteryStats.dumpLocked(): 49.6MiB -> 42.0MiB This is because all the memory previously used by Scheduler is reused by the register allocator; the register allocator has a higher peak usage of the ArenaStack. And continue the "arena"->"allocator" renaming. Test: m test-art-host-gtest Test: testrunner.py --host Bug: 64312607 Change-Id: Idfd79a9901552b5147ec0bf591cb38120de86b01
2017-10-06ART: Use ScopedArenaAllocator for pass-local data.Vladimir Marko
Passes using local ArenaAllocator were hiding their memory usage from the allocation counting, making it difficult to track down where memory was used. Using ScopedArenaAllocator reveals the memory usage. This changes the HGraph constructor which requires a lot of changes in tests. Refactor these tests to limit the amount of work needed the next time we change that constructor. Test: m test-art-host-gtest Test: testrunner.py --host Test: Build with kArenaAllocatorCountAllocations = true. Bug: 64312607 Change-Id: I34939e4086b500d6e827ff3ef2211d1a421ac91a
2017-10-05MIPS32R2: Share address computationLena Djokic
For array accesses the element address has the following structure: Address = CONST_OFFSET + base_addr + index << ELEM_SHIFT The address part (index << ELEM_SHIFT) can be shared across array accesses with the same data type and index. For example, in the following loop 5 accesses can share address computation: void foo(int[] a, int[] b, int[] c) { for (i...) { a[i] = a[i] + 5; b[i] = b[i] + c[i]; } } Test: test-art-host, test-art-target Change-Id: Id09fa782934aad4ee47669275e7e1a4d7d23b0fa
2017-09-27Enables GVN for x86 and x86_64.Aart Bik
Rationale: As decided after the MIPS change, this change unifies our six code generators again a bit (we cannot move it into the generic path, since arm likes to run the simplifier first). Generally the GVN does some last minute cleanup (such as finding CSE in the runtime tests generated by dynamic BCE). I started a golem run to find impact. Test: test-art-host test-art-target Change-Id: Ib4098c5bae2269e71fee95cc31e3662d3aa47f6a
2017-09-27Merge "Enables GVN for MIPS32 and MIPS64."Treehugger Robot
2017-09-25ART: Introduce compiler data type.Vladimir Marko
Replace most uses of the runtime's Primitive in compiler with a new class DataType. This prepares for introducing new types, such as Uint8, that the runtime does not need to know about. Test: m test-art-host-gtest Test: testrunner.py --host Bug: 23964345 Change-Id: Iec2ad82454eec678fffcd8279a9746b90feb9b0c
2017-09-21Enables GVN for MIPS32 and MIPS64.Lena Djokic
Test: mma test-art-host-gtest Test: mma test-art-target-gtest in QEMU Test: ./testrunner.py --target --optimizing in QEMU Change-Id: Ie3c6b29b9125ff8aef888c3574bdb0ab96574bd4
2017-09-20Refactor compiled_method.h .Vladimir Marko
Move LinkerPatch to compiler/linker/linker_patch.h . Move SrcMapElem to compiler/debug/src_map_elem.h . Introduce compiled_method-inl.h to reduce the number of `#include`s in compiled_method.h . Test: m test-art-host-gtest Test: testrunner.py --host Change-Id: Id211cdf94a63ad265bf4709f1a5e06dffbe30f64
2017-09-20Refactor linker files from compiler/ to dex2oat/.Vladimir Marko
This shifts some code from the libart-compiler.so to dex2oat and reduces memory needed for JIT. We also avoid loading the libart-dexlayout.so for JIT but the memory savings are minimal (one shared clean page, two shared dirty pages and some per-app kernel mmap data) as the code has never been needed in memory by JIT. aosp_angler-userdebug file sizes (stripped): lib64/libart-compiler.so: 2989112 -> 2671888 (-310KiB) lib/libart-compiler.so: 2160816 -> 1939276 (-216KiB) bin/dex2oat: 141868 -> 368808 (+222KiB) LOAD/executable elf mapping sizes: lib64/libart-compiler.so: 2866308 -> 2555500 (-304KiB) lib/libart-compiler.so: 2050960 -> 1834836 (-211KiB) bin/dex2oat: 129316 -> 345916 (+212KiB) Test: m test-art-host-gtest Test: testrunner.py --host Test: cd art/; mma; cd - Change-Id: If62f02847a6cbb208eaf7e1f3e91af4663fa4a5f
2017-09-18ART: Remove old codeAndreas Gampe
Remove unused Quick compiler flag. Remove support for arm32 soft-float code (which is no longer supported by our compiler). Test: m Change-Id: I38b16291d90094dbf26776923a46afbf8de53f20
2017-09-18Add debug info for link-time generated thunks.Vladimir Marko
Add debug info for method call thunks (currently unused) and Baker read barrier thunks. Refactor debug info generation for trampolines and record their sizes; change their names to start with upper-case letters, so that they can be easily generated as `#fn_name`. This improved debug info must be generated by `dex2oat -g`, the debug info generated by `oatdump --symbolize` remains the same as before, except for the renamed trampolines and an adjustment for "code delta", i.e. the Thumb mode bit. Cortex-A53 erratum 843419 workaround thunks are not covered by this CL. Test: Manual; run-test --gdb -Xcompiler-option -g 160, pull symbols for gdbclient, break in the introspection entrypoint, check that gdb knows the new symbols (and disassembles them) and `backtrace` works when setting $pc to an address in the thunk. Bug: 36141117 Change-Id: Id224b72cfa7a0628799c7db65e66e24c8517aabf
2017-09-08Merge "optimizing: add block-scoped constructor fence merging pass"Treehugger Robot
2017-09-08optimizing: add block-scoped constructor fence merging passIgor Murashkin
Introduce a new "Constructor Fence Redundancy Elimination" pass. The pass currently performs local optimization only, i.e. within instructions in the same basic block. All constructor fences preceding a publish (e.g. store, invoke) get merged into one instruction. ============== OptStat#ConstructorFenceGeneratedNew: 43825 OptStat#ConstructorFenceGeneratedFinal: 17631 <+++ OptStat#ConstructorFenceRemovedLSE: 164 OptStat#ConstructorFenceRemovedPFRA: 9391 OptStat#ConstructorFenceRemovedCFRE: 16133 <--- Removes ~91.5% of the 'final' constructor fences in RitzBenchmark: (We do not distinguish the exact reason that a fence was created, so it's possible some "new" fences were also removed.) ============== Test: art/test/run-test --host --optimizing 476-checker-ctor-fence-redun-elim Bug: 36656456 Change-Id: I8020217b448ad96ce9b7640aa312ae784690ad99
2017-09-06Pass stats into the loop optimization phase.Aart Bik
Test: market scan. Change-Id: I58b23b8d254883f30619ea3602d34bf93618d432
2017-08-14Merge "RFC: Generate select instruction for conditional returns."Nicolas Geoffray
2017-08-11Merge changes Ic119441c,I83b96b41Treehugger Robot
* changes: optimizing: Add statistics for # of constructor fences added/removed optimizing: Refactor statistics to use OptimizingCompilerStats helper
2017-08-11optimizing: Add statistics for # of constructor fences added/removedIgor Murashkin
Statistics are attributed as follows: Added because: * HNewInstances requires a HConstructorFence following it. * HReturn requires a HConstructorFence (for final fields) preceding it. Removed because: * Optimized in Load-Store-Elimination. * Optimized in Prepare-For-Register-Allocation. Test: art/test.py Bug: 36656456 Change-Id: Ic119441c5151a5a840fc6532b411340e2d68e5eb
2017-08-11optimizing: Refactor statistics to use OptimizingCompilerStats helperIgor Murashkin
Remove all copies of 'MaybeRecordStat', replacing them with a single OptimizingCompilerStats::MaybeRecordStat helper. Change-Id: I83b96b41439dccece3eee2e159b18c95336ea933
2017-08-10Instrument ARM64 generated code to check the Marking Register.Roland Levillain
Generate run-time code in the Optimizing compiler checking that the Marking Register's value matches `self.tls32_.is.gc_marking` in debug mode (on target; and on host with JIT, or with AOT when compiling the core image). If a check fails, abort. Test: m test-art-target Test: m test-art-target with tree built with ART_USE_READ_BARRIER=false Test: ARM64 device boot test with libartd. Bug: 37707231 Change-Id: Ie9b322b22b3d26654a06821e1db71dbda3c43061
2017-08-10RFC: Generate select instruction for conditional returns.Mads Ager
The select generator currently only inserts select instructions if there is a diamond shape with a phi. This change extends the select generator to also deal with the pattern: if (condition) { movable instruction 0 return value0 } else { movable instruction 1 return value1 } which it turns into: moveable instruction 0 moveable instruction 1 return select (value0, value1, condition) Test: 592-checker-regression-bool-input Change-Id: Iac50fb181dc2c9b7619f28977298662bc09fc0e1
2017-08-10Revert recent JIT code cache changesOrion Hodson
Flakiness observed on the bots. Revert "Jit Code Cache instruction pipeline flushing" This reverts commit 56fe32eecd4f25237e66811fd766355a07908d22. Revert "ARM64: More JIT Code Cache maintenace" This reverts commit 17272ab679c9b5f5dac8754ac070b78b15271c27. Revert "ARM64: JIT Code Cache maintenance" This reverts commit 3ecac070ad55d433bbcbe11e21f4b44ab178effe. Revert "Change flush order in JIT code cache" This reverts commit 43ce5f82dae4dc5eebcf40e54b81ccd96eb5fba3. Revert "Separate rw from rx views of jit code cache" This reverts commit d1dbb74e5946fe6c6098a541012932e1e9dd3115. Test: art/test.py --target --64 Bug: 64527643 Bug: 62356545 Change-Id: Ifa10ac77a60ee96e8cb68881bade4d6b4f828714
2017-07-25Merge "Jit Code Cache instruction pipeline flushing"Treehugger Robot
2017-07-24ART: Include cleanupAndreas Gampe
Let clang-format reorder the header includes. Derived with: * .clang-format: BasedOnStyle: Google IncludeIsMainRegex: '(_test|-inl)?$' * Steps: find . -name '*.cc' -o -name '*.h' | xargs sed -i.bak -e 's/^#include/ #include/' ; git commit -a -m 'ART: Include cleanup' git-clang-format -style=file HEAD^ manual inspection git commit -a --amend Test: mmma art Change-Id: Ia963a8ce3ce5f96b5e78acd587e26908c7a70d02
2017-07-24Jit Code Cache instruction pipeline flushingOrion Hodson
Restores instruction pipeline flushing on all cores following crashes on ARMv7 with dual JIT code page mappings. We were inadvertantly toggling permission on a non-executable page rather than executable. Removes the data cache flush for roots data and replaces it with a sequentially consistent barrier. Fix MemMap::RemapAtEnd() when all pages are given out. To meet invariants checked in the destructor, the base pointer needs to be assigned as nullptr when this happens. Bug: 63833411 Bug: 62332932 Test: art/test.py --target Change-Id: I705cf5a3c80e78c4e912ea3d2c3c4aa89dee26bb
2017-07-19Pass the logger to the JIT compiler.Nicolas Geoffray
To avoid effects of concurrent method entrypoints update, just pass the logger to the JIT compiler, which will invoke it directly with the pointer to the newly allocated code. Test: test.py --trace Change-Id: I5fbcd7cbc948b7d46c98c1545d6e530fb1190602
2017-06-07Use ArtMethod* .bss entries for HInvokeStaticOrDirect.Vladimir Marko
Test: m test-art-host-gtest Test: testrunner.py --host Test: testrunner.py --target Test: Nexus 6P boots. Test: Build aosp_mips64-userdebug. Bug: 30627598 Change-Id: I0e54fdd2e91e983d475b7a04d40815ba89ae3d4f
2017-05-23Merge "Use PC-relative pointer to boot image methods."Treehugger Robot
2017-05-22Use PC-relative pointer to boot image methods.Vladimir Marko
In preparation for adding ArtMethod entries to the .bss section, add direct PC-relative pointers to methods so that the number of needed .bss entries for boot image is small. Test: m test-art-host-gtest Test: testrunner.py --host Test: testrunner.py --target on Nexus 6P Test: Nexus 6P boots. Test: Build aosp_mips64-userdebug Bug: 30627598 Change-Id: Ia89f5f9975b741ddac2816e1570077ba4b4c020f
2017-05-19Create load store analysis passxueliang.zhong
This CL separates load store analysis from LSE pass. The load and store analysis in LSE pass records information about heap memory accesses for arrays and fields. Such information can also be used in the other optimizations like instruction scheduling pass which can eliminate side-effect dependencies between memory accesses to different locations. Test: m test-art-host Test: m test-art-target Test: m test-art-host-gtest-load_store_analysis_test Test: 530-checker-lse Change-Id: I353a2b9a03b19bfa0e7ef07716d60bd4254c7ea7
2017-05-08Instruction scheduling for ARM.xueliang.zhong
Performance improvements on various benchmarks with this CL: benchmarks improvements --------------------------- algorithm 1% benchmarksgame 2% caffeinemark 2% math 3% stanford 4% Tested on ARM Cortex-A53 CPU. The code size impact is negligible. Test: m test-art-host Test: m test-art-target Change-Id: I314c90c09ce27e3d224fc686ef73c7d94a6b5a2c
2017-04-24ART: More header cleanup - method_verifier.hAndreas Gampe
Move enumerations to own header. Move the compiler interface (of what the compiler can tolerate) into its own header. Replace or remove method_verifier.h where possible. Test: mmma art Change-Id: I075fcb10b02b6c1c760daad31cb18eaa42067b6d
2017-04-10optimizing: do not illegally remove constructor barriers after inliningIgor Murashkin
Remove the illegal optimization that destroyed constructor barriers after inlining invoke-super constructor calls. --- According to JLS 7.5.1, "Note that if one constructor invokes another constructor, and the invoked constructor sets a final field, the freeze for the final field takes place at the end of the invoked constructor." This means if an object is published (stored to a location potentially visible to another thread) inside of an outer constructor, all final field stores from any inner constructors must be visible to other threads. Test: art/test.py Bug: 37001605 Change-Id: I3b55f6c628ff1773dab88022a6475d50a1a6f906
2017-04-06Clean up after MIPS got read barriers supportGoran Jakovljevic
This enables checker tests, as well as compiler_driver_test and reflection_test for MIPS32 and MIPS64. Test: mma test-art-host-gtest Test: mma test-art-target-gtest in QEMU (MIPS64) Test: ./testrunner.py --optimizing --target in QEMU (MIPS64) Change-Id: Ic6fe5b17f7f2cd7e38e12fef25afccf9358b80e0
2017-03-28MIPS: Implement read barriers.Alexey Frunze
This is the core functionality. Further improvements will be done separately. This also adds/moves memory barriers where they belong and removes the UnsafeGetLongVolatile and UnsafePutLongVolatile MIPS32 intrinsics as they need to load/store a pair of registers atomically, which is not supported directly by the CPU. Test: booted MIPS32R2 in QEMU Test: test-art-target-run-test Test: booted MIPS64 (with 2nd arch MIPS32R6) in QEMU Test: "testrunner.py --target --optimizing -j1" Test: same MIPS64 boot/test with ART_READ_BARRIER_TYPE=TABLELOOKUP Test: "testrunner.py --target --optimizing --32 -j2" on CI20 Test: same CI20 test with ART_READ_BARRIER_TYPE=TABLELOOKUP Change-Id: I0ff91525fefba3ec1cc019f50316478a888acced
2017-03-24Improvements in the Inliner.Nicolas Geoffray
- Change from a depth limit to a total number of HInstructions inlined limit. Remove the dex2oat depth limit argument. - Add more stats to diagnose reasons for not inlining. - Clean up logging to easily parse output. Individual Ritz benchmarks improve from 3 to 10%. No change in other heuristics. There was already an instruction budget. Note that the instruction budget is rarely hit in the "apps" I've tried with. Compile-times improve from 5 to 15%. Code size go from 4% increase (Gms) to 1% decrease (Docs). bug:35724239 test: test-art-host test-art-target Change-Id: I5a35c4bd826cf21fead77859709553c5b57608d6
2017-03-16Delete SrcMapMathieu Chartier
No longer used. SrcMapElem is still used by elf_debug_line_writer.h. Address previous comments from aog/351387. Test: make Change-Id: Ib1525168b14889abbdc78ba20c64f3223b140a51
2017-03-16Add method info to oat filesMathieu Chartier
The method info data is stored separately from the code info to reduce oat size by improving deduplication of stack maps. To reduce code size, this moves the invoke info and inline info method indices to this table. Oat size for a large app (arm64): 77746816 -> 74023552 (-4.8%) Average oat size reduction for golem (arm64): 2% Repurposed unused SrcMapElem deduping to be for MethodInfo. TODO: Delete SrcMapElem in a follow up CL. Bug: 36124906 Test: clean-oat-host && test-art-host-run-test Change-Id: I2241362e728389030b959f42161ce817cf6e2009
2017-03-14Revert^6 "Hash-based dex cache type array."Vladimir Marko
Fixed ImageWriter to write class table also if it contains only boot class loader classes. Added a regression test and added extra checks for debug-build to verify that dex cache types from app image are also in the class table. Removed some unnecessary debug output. Test: 158-app-image-class-table Bug: 34839984 Bug: 30627598 Bug: 34659969 This reverts commit 0b66d6174bf1f6023f9d36dda8538490b79c2e9f. Change-Id: I6a747904940c6ebc297f4946feef99dc0adf930c
2017-03-13Revert^5 "Hash-based dex cache type array."Vladimir Marko
For app images, ImageWriter does not add boot image classes to the app image class table even though it keeps them in the dex caches. The reason for that is unknown, the code looks OK. Bug: 34839984 Bug: 30627598 Bug: 34659969 Also reverts "Improve debugging output for a crash." This reverts commits bfb80d25eaeb7a604d5dd25a370e3869e96a33ab, 8dd56fcb3196f466ecaffd445397cb11ef85f89f. Test: testrunner.py --host Change-Id: Ic8db128207c07588c7f11563208ae1e85c8b0e84
2017-03-08Merge "Invoke typed arraycopy for primitive arrays."Nicolas Geoffray
2017-03-07Invoke typed arraycopy for primitive arrays.Nicolas Geoffray
Apps will always call the Object version of arraycopy. When we can infer the types of the passed arrays, replace the method being called to be the typed System.arraycopy one. 10% improvement on ExoPlayerBench. Test: 641-checker-arraycopy bug: 7103825 Change-Id: I872d7a6e163a4614510ef04ae582eb90ec48b5fa
2017-03-06Pass driver to loop opt. Add new side_effects phase.Aart Bik
Rationale: Break-out CL of ART Vectorizer: number 3. The purpose is making the original CL smaller and easier to review. Bug: 34083438 Test: test-art-host Change-Id: I7cece807ee4f5fcaeae41f1deed33ac263447b77
2017-02-27Implement code sinking.Nicolas Geoffray
Small example of what the optimization does: Object o = new Object(); if (test) { throw new Error(o.toString()); } will be turned into (note that the first user of 'o' is the 'new Error' allocation which has 'o' in its environment): if (test) { Object o = new Obect(); throw new Error(o.toString()); } There are other examples in 639-checker-code-sinking. Ritz individual benchmarks improve on art-jit-cc from 5% (EvaluateComplexFormulas) to 23% (MoveFunctionColumn) on all platforms. Test: 639-checker-code-sinking Test: test-art-host Test: borg job run Test: libcore + jdwp bug:35634932 bug:30933338 Change-Id: Ib99c00c93fe76ffffb17afffb5a0e30a14310652
2017-02-20Revert^4 "Hash-based dex cache type array."Vladimir Marko
Added extra output to the abort message to collect more data when we hit the crash. Added extra check when loading an app image to verify that the class table isn't already broken. Test: testrunner.py --host Bug: 34839984 Bug: 30627598 Bug: 34659969 This reverts commit 5812e20ff7cbc8efa0b8d7486ada2f58840a6ad5. Change-Id: I9bb442a184c236dcb75b3e42a095f39cd6bee59d
2017-02-14ART: Add operator == and != with nullptr to HandleAndreas Gampe
Get it in line with ObjPtr and prettify our code. Test: m Change-Id: I1322e2a9bc7a85d7f2441034a19bf4d807b81a0e
2017-02-13Revert^3 "Hash-based dex cache type array."Mathieu Chartier
Assert failing for "earchbox:search": F zygote64: class_linker.cc:4612] Check failed: handle_scope_iface.Get() != nullptr Test: m test-art-host Bug: 34839984 Bug: 30627598 Bug: 34659969 This reverts commit 85c0f2ac03417f5125bc2ff1dab8109859c67d5c. Change-Id: I39846c20295af5875b0f945be7035c73ded23135
2017-02-10Revert^2 "Hash-based dex cache type array."Vladimir Marko
The reason for the revert was fixed by https://android-review.googlesource.com/332666 . We now enable clearing dex cache types in test 155 from that CL. Also avoid an unnecessary store in LookupResolvedTypes() and prevent verifier from messing up the dex cache types. Test: m test-art-host Bug: 34839984 Bug: 30627598 Bug: 34659969 This reverts commit d16363a93053de0f32252c7897d839a46aff14ae. Change-Id: Ie8603cfa772e78e648d005b0b6eae59062ae729d
2017-02-06Merge "Revert "Revert "Inline across dex files for JIT."""Nicolas Geoffray