summaryrefslogtreecommitdiff
path: root/compiler/optimizing/code_generator.cc
AgeCommit message (Collapse)Author
2015-04-06ART: Enable more Clang warningsAndreas Gampe
Change-Id: Ie6aba02f4223b1de02530e1515c63505f37e184c
2015-04-01[optimizing] Implement x86/x86_64 math intrinsicsMark Mendell
Implement floor/ceil/round/RoundFloat on x86 and x86_64. Implement RoundDouble on x86_64. Add support for roundss and roundsd on both architectures. Support them in the disassembler as well. Add the instruction set features for x86, as the 'round' instruction is only supported if SSE4.1 is supported. Fix the tests to handle the addition of passing the instruction set features to x86 and x86_64. Add assembler tests for roundsd and roundss to x86_64 assembler tests. Change-Id: I9742d5930befb0bbc23f3d6c83ce0183ed9fe04f Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
2015-03-24Merge "ART: Boolean simplifier"David Brazdil
2015-03-24ART: Boolean simplifierDavid Brazdil
The optimization recognizes the negation pattern generated by 'javac' and replaces it with a single condition. To this end, boolean values are now consistently assumed to be represented by an integer. This is a first optimization which deletes blocks from the HGraph and does so by replacing the corresponding entries with null. Hence, existing code can continue indexing the list of blocks with the block ID, but must check for null when iterating over the list. Change-Id: I7779da69cfa925c6521938ad0bcc11bc52335583
2015-03-24Unify ART's various implementations of bit_cast.Roland Levillain
ART had several implementations of art::bit_cast: 1. one in runtime/base/casts.h, declared as: template <class Dest, class Source> inline Dest bit_cast(const Source& source); 2. another one in runtime/utils.h, declared as: template<typename U, typename V> static inline V bit_cast(U in); 3. and a third local version, in runtime/memory_region.h, similar to the previous one: template<typename Source, typename Destination> static Destination MemoryRegion::local_bit_cast(Source in); This CL removes versions 2. and 3. and changes their callers to use 1. instead. That version was chosen over the others as: - it was the oldest one in the code base; and - its syntax was closer to the standard C++ cast operators, as it supports the following use: bit_cast<Destination>(source) since `Source' can be deduced from `source'. Change-Id: I7334fd5d55bf0b8a0c52cb33cfbae6894ff83633
2015-03-16Update locations of registers after slow paths spilling.Nicolas Geoffray
Change-Id: Id9aafcc13c1a085c17ce65d704c67b73f9de695d
2015-03-13Merge "[optimizing] Don't record None locations in the stack maps."Nicolas Geoffray
2015-03-13[optimizing] Don't record None locations in the stack maps.Nicolas Geoffray
- moved environment recording from code generator to stack map stream - added creation/loading factory methods for the DexRegisterMap (hides internal details) - added new tests Change-Id: Ic8b6d044f0d8255c6759c19a41df332ef37876fe
2015-03-13Refactor code in preparation of correct stack maps in slow path.Nicolas Geoffray
Move the logic of saving/restoring live registers in slow path in the SlowPathCode method. Also add a RecordPcInfo helper to SlowPathCode, that will act as the placeholder of saving correct stack maps. Change-Id: I25c2bc7a642ef854bbc8a3eb570e5c8c8d2d030c
2015-03-13Fix build breakage.Nicolas Geoffray
Change-Id: I86959eca5d8f5458ff75c78776b0af9db9c26800
2015-03-13Merge "Tweak liveness when instructions are used in environments."Nicolas Geoffray
2015-03-12Tweak liveness when instructions are used in environments.Nicolas Geoffray
Instructions remain live when debuggable, but only instructions with object types remain live when non-debuggable. Enable StackVisitor::GetThisObject for optimizing. Change-Id: Id87b2cbf33a02450059acc9993995782e5f28987
2015-03-12Compress the Dex register maps built by the optimizing compiler.Roland Levillain
- Replace the current list-based (fixed-size) Dex register encoding in stack maps emitted by the optimizing compiler with another list-based variable-size Dex register encoding compressing short locations on 1 byte (3 bits for the location kind, 5 bits for the value); other (large) values remain encoded on 5 bytes. - In addition, use slot offsets instead of byte offsets to encode the location of Dex registers placed in stack slots at small offsets, as it enables more values to use the short (1-byte wide) encoding instead of the large (5-byte wide) one. - Rename art::DexRegisterMap::LocationKind as art::DexRegisterLocation::Kind, turn it into a strongly-typed enum based on a uint8_t, and extend it to support new kinds (kInStackLargeOffset and kConstantLargeValue). - Move art::DexRegisterEntry from compiler/optimizing/stack_map_stream.h to runtime/stack_map.h and rename it as art::DexRegisterLocation. - Adjust art::StackMapStream, art::CodeGenerator::RecordPcInfo, art::CheckReferenceMapVisitor::CheckOptimizedMethod, art::StackVisitor::GetVRegFromOptimizedCode, and art::StackVisitor::SetVRegFromOptimizedCode. - Implement unaligned memory accesses in art::MemoryRegion. - Use them to manipulate data in Dex register maps. - Adjust oatdump to support the new Dex register encoding. - Update compiler/optimizing/stack_map_test.cc. Change-Id: Icefaa2e2b36b3c80bb1b882fe7ea2f77ba85c505
2015-03-05[optimizing] Use callee-save registers for x86Mark Mendell
Add ESI, EDI, EBP to available registers for non-baseline mode. Ensure that they aren't used when byte addressible registers are needed. Change-Id: Ie7130d4084c2ae9cfcd1e47c26eb3e5dcac1ebd6 Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
2015-03-02Opt Compiler: ARM64: Enable explicit memory barriers over acquire/releaseSerban Constantinescu
Implement remaining explicit memory barrier code paths and temporarily enable the use of explicit memory barriers for testing. This CL also enables the use of instruction set features in the ARM64 backend. kUseAcquireRelease has been replaced with PreferAcquireRelease(), which for now is statically set to false (prefer explicit memory barriers). Please note that we still prefer acquire-release for the ARM64 Optimizing Compiler, but we would like to exercise the explicit memory barrier code path too. Change-Id: I84e047ecd43b6fbefc5b82cf532e3f5c59076458 Signed-off-by: Serban Constantinescu <serban.constantinescu@arm.com>
2015-02-20Merge "Display optimizing compiler's CodeInfo objects in oatdump."Roland Levillain
2015-02-19Ensure the graph is correctly typed.Nicolas Geoffray
We used to be forgiving because of HIntConstant(0) also being used for null. We now create a special HNullConstant for such uses. Also, we need to run the dead phi elimination twice during ssa building to ensure the correctness. Change-Id: If479efa3680d3358800aebb1cca692fa2d94f6e5
2015-02-19Display optimizing compiler's CodeInfo objects in oatdump.Roland Levillain
A few elements are not displayed yet (stack mask, inline info) though. Change-Id: I5e51a801c580169abc5d1ef43ad581aadc110754
2015-02-18Avoid generating jmp +0.Nicolas Geoffray
When a block branches to a non-following block, but blocks in-between do branch to it, we can avoid doing the branch. Change-Id: I9b343f662a4efc718cd4b58168f93162a24e1219
2015-02-06Optimize leaf methods.Nicolas Geoffray
Avoid suspend checks and stack changes when not needed. Change-Id: I0fdb31e8c631e99091b818874a558c9aa04b1628
2015-02-04Finally implement Location::kNoOutputOverlap.Nicolas Geoffray
The [i, i + 1) interval scheme we chose for representing lifetime positions is not optimal for doing this optimization. It however doesn't prevent recognizing a non-split interval during the TryAllocateFreeReg phase, and try to re-use its inputs' registers. Change-Id: I80a2823b0048d3310becfc5f5fb7b1230dfd8201
2015-02-03Use a different block order when not compiling baseline.Nicolas Geoffray
Use the linearized order instead, as it puts blocks logically next to each other in a better way. Also, it does not contain dead blocks. Change-Id: Ie65b56041a093c8155e6c1e06351cb36a4053505
2015-01-24Support callee-save registers on ARM.Nicolas Geoffray
Change-Id: I7c519b7a828c9891b1141a8e51e12d6a8bc84118
2015-01-23Support callee save floating point registers on x64.Nicolas Geoffray
- Share the computation of core_spill_mask and fpu_spill_mask between backends. - Remove explicit stack overflow check support: we need to adjust them and since they are not tested, they will easily bitrot. Change-Id: I0b619b8de4e1bdb169ea1ae7c6ede8df0d65837a
2015-01-21Enable core callee-save on x64.Nicolas Geoffray
Will work on other architectures and FP support in other CLs. Change-Id: I8cef0343eedc7202d206f5217fdf0349035f0e4d
2015-01-21Merge "ART: Replace NULL to nullptr in the optimizing compiler"Roland Levillain
2015-01-21ART: Replace NULL to nullptr in the optimizing compilerJean Christophe Beyler
Replace macro NULL to the nullptr variation for C++. Change-Id: Ib6e48dd4bb3c254343383011b67372622578ca76 Signed-off-by: Jean Christophe Beyler <jean.christophe.beyler@intel.com>
2015-01-21Revert "Revert "Fully support pairs in the register allocator.""Nicolas Geoffray
This reverts commit c399fdc442db82dfda66e6c25518872ab0f1d24f. Change-Id: I19f8215c4b98f2f0827e04bf7806c3ca439794e5
2015-01-21Record implicit null checks at the actual invoke time.Calin Juravle
ImplicitNullChecks are recorded only for instructions directly (see NB below) preceeded by NullChecks in the graph. This way we avoid recording redundant safepoints and minimize the code size increase. NB: ParallalelMoves might be inserted by the register allocator between the NullChecks and their uses. These modify the environment and the correct action would be to reverse their modification. This will be addressed in a follow-up CL. Change-Id: Ie50006e5a4bd22932dcf11348f5a655d253cd898
2015-01-21Revert "Fully support pairs in the register allocator."Nicolas Geoffray
Libcore tests fail. This reverts commit 41aedbb684ccef76ff8373f39aba606ce4cb3194. Change-Id: I2572f120d4bbaeb7a4d4cbfd47ab00c9ea39ac6c
2015-01-21Fully support pairs in the register allocator.Nicolas Geoffray
Enabled on ARM for longs and doubles. Change-Id: Id8792d08bd7ca9fb049c5db8a40ae694bafc2d8b
2015-01-20Merge "Add implicit null checks for the optimizing compiler"Calin Juravle
2015-01-16Add implicit null checks for the optimizing compilerCalin Juravle
- for backends: arm, arm64, x86, x86_64 - fixed parameter passing for CodeGenerator - 003-omnibus-opcodes test verifies that NullPointerExceptions work as expected Change-Id: I1b302acd353342504716c9169a80706cf3aba2c8
2015-01-16Do not use register pair in a parallel move.Nicolas Geoffray
The ParallelMoveResolver does not work with pairs. Instead, decompose the pair into two individual moves. Change-Id: Ie9d3f0b078cef8dc20640c98b20bb20cc4971a7f
2015-01-15[optimizing compiler] Compute live spill sizeMark Mendell
The current stack frame calculation assumes that each live register to be saved/restored has the word size of the machine. This fails for X86, where a double in an XMM register takes up 8 bytes. Change the calculation to keep track of the number of core registers and number of fp registers to handle this distinction. This is slightly pessimal, as the registers may not be active at the same time, but the only way to handle this would be to allocate both classes of registers simultaneously, or remember all the active intervals, matching them up and compute the size of each safepoint interval. Change-Id: If7860aa319b625c214775347728cdf49a56946eb Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
2015-01-12Merge "Move code around in OptimizingCompiler::Compile to reduce stack space."Nicolas Geoffray
2015-01-12Move code around in OptimizingCompiler::Compile to reduce stack space.Nicolas Geoffray
Also fix an (intentional) memory leak, by allocating the CodeGenerator on the heap instead of the arena: they construct an Assembler object that requires destruction. BUG:18787334 Change-Id: I8cf0667cb70ce5b14d4ac334bd4487a562635f1b
2015-01-08Implement double and float support for arm in register allocator.Nicolas Geoffray
The basic approach is: - An instruction that needs two registers gets two intervals. - When allocating the low part, we also allocate the high part. - When splitting a low (or high) interval, we also split the high (or low) equivalent. - Allocation follows the (S/D register) requirement that low registers are always even and the high equivalent is low + 1. Change-Id: I06a5148e05a2ffc7e7555d08e871ed007b4c2797
2015-01-05Look at instruction set features when generating volatiles codeCalin Juravle
Change-Id: Ia882405719fdd60b63e4102af7e085f7cbe0bb2a
2014-12-22ART: Swap-space in the compilerAndreas Gampe
Introduce a swap-space and corresponding allocator to transparently switch native allocations to memory backed by a file. Bug: 18596910 (cherry picked from commit 62746d8d9c4400e4764f162b22bfb1a32be287a9) Change-Id: I131448f3907115054a592af73db86d2b9257ea33
2014-12-18Merge "Revert "Don't block quick callee saved registers for optimizing.""Nicolas Geoffray
2014-12-18Revert "Don't block quick callee saved registers for optimizing."Nicolas Geoffray
X64 has one libcore test failing, and codegen_test on arm is failing. This reverts commit 6004796d6c630696127df2494dcd4f30d1367a34. Change-Id: I20e00431fa18e11ce4c0cb6fffa91977fa8e9b4f
2014-12-18Merge "Don't block quick callee saved registers for optimizing."Nicolas Geoffray
2014-12-18Don't block quick callee saved registers for optimizing.Nicolas Geoffray
This change builds on: https://android-review.googlesource.com/#/c/118983/ - Also fix x86_64 assembler bug triggered by this change. - Fix (and improve) x86's backend byte register usage. - Fix a bug in baseline register allocator: a fixed out register must prevent inputs from allocating it. Change-Id: I4883862e29b4e4b6470f1823cf7eab7e7863d8ad
2014-12-15Inlining support in optimizing.Nicolas Geoffray
Currently only inlines simple things that don't require an environment, such as: - Returning a constant. - Returning a parameter. - Returning an arithmetic operation. Change-Id: Ie844950cb44f69e104774a3cf7a8dea66bc85661
2014-12-08[optimizing compiler] Add REM_FLOAT and REM_DOUBLECalin Juravle
- for arm, x86, x86_64 backends - reinstated fmod quick entry points for x86. This is a partial revert of bd3682eada753de52975ae2b4a712bd87dc139a6 which added inline assembly for floting point rem on x86. Note that Quick still uses the inline version. - fix rem tests for longs Change-Id: I73be19a9f2f2bcf3f718d9ca636e67bdd72b5440
2014-12-04Add support for float-to-long in the optimizing compiler.Roland Levillain
- Add support for the float-to-long Dex instruction in the optimizing compiler. - Add a Dex PC field to art::HTypeConversion to allow the x86 and ARM code generators to produce runtime calls. - Instruct art::CodeGenerator::RecordPcInfo not to record PC information for HTypeConversion instructions. - Add S0 to the list of ARM FPU parameter registers. - Have art::x86_64::X86_64Assembler::cvttss2si work with 64-bit operands. - Generate x86, x86-64 and ARM (but not ARM64) code for float to long HTypeConversion nodes. - Add related tests to test/422-type-conversion. Change-Id: I954214f0d537187883f83f7a83a1bb2dd8a21fd4
2014-11-28Vixl: Update the VIXL interface to VIXL 1.7 and enable VIXL debug.Serban Constantinescu
This patch updates the interface to VIXL 1.7 and enables the debug version of VIXL when ART is built in debug mode. Change-Id: I443fb941bec3cffefba7038f93bb972e6b7d8db5 Signed-off-by: Serban Constantinescu <serban.constantinescu@arm.com>
2014-11-27Add support for long-to-double in the optimizing compiler.Roland Levillain
- Add support for the long-to-double Dex instruction in the optimizing compiler. - Enable requests of temporary FPU (double) registers during code generation. - Fix art::x86::X86Assembler::LoadLongConstant and extend it to int64_t values. - Have art::x86_64::X86_64Assembler::cvtsi2sd work with 64-bit operands. - Generate x86, x86-64 and ARM (but not ARM64) code for long to double HTypeConversion nodes. - Add related tests to test/422-type-conversion. Change-Id: Ie73d9e5e25bd2e15f585c371e8fc2dcb83438ccd
2014-11-19Fix safepoint bug when computing live registers.Nicolas Geoffray
Change-Id: I8f28dd287c0e04223c49dea6a323058c1b210913