Age | Commit message (Collapse) | Author |
|
This patch enhances the existing ART-Compiler
to generate Intel(R) AVX/AVX2 MOV Instructions for
doing SIMD Operations on Intel(R) Architecture CPUs.
It also provides the framework for AVX/AVX2 Instruction
encoding and dissassembly
BUG: 127881558
Test: run-test gtest
Change-Id: I9386aecc134941a2d907f9ec6b2d5522ec5ff8b5
|
|
Rationale:
Saturation arithmetic? It is coming!
Bug: b/74026074
Test: visual inspection
Change-Id: I056a2f785b01f9d56749a9fca611846f871e253c
|
|
Rationale:
Few instructions needed to implement SIMD reductions.
Test: assembler_x86_[64_]test
Bug: 64091002
Change-Id: I785acfc6c8c4ad4f290ddeab32da9b767f944e24
|
|
Rationale:
The more vectorized, the better!
Test: test-art-target, test-art-host
Change-Id: I758becca5beaa5b97fab2ab70f2e00cb53458703
|
|
Rationale:
Enables fast compare gt.
Test: assembler_x86[_64]_test
Change-Id: I0a069649480529f3fec2c2b100e2aaaa2cd79820
|
|
Rationale:
Break-out CL of ART Vectorizer.
Enables fast halving add with rounding
Bug: 34083438
Test: assembler_x86[_64]_test
Change-Id: I09173376b803d671a6b05a33e630f45f778cea52
|
|
Rationale:
Break-out CL of ART Vectorizer.
Bug: 34083438
Test: test-art-host
Change-Id: I4027033cbe48a19c426326fc307fe4437b143d61
|
|
And fix disassembly of the now unused TESTL.
Test: testrunner.py --host with string compression enabled.
Test: Manual inspection of dump-oat output.
Bug: 35433135
Bug: 31040547
Change-Id: I36c955bc1f2243954ecc315266a2f3fce5d87693
|
|
Rationale:
ART vectorizer needs SIMD for integer operations too.
Test: assembler_x86[_64]_test
Bug: 34083438
Change-Id: Id6fec558c617d38cb643839eafcd10e59dcd6e0a
|
|
Some more intrusive changes than I would have liked, as long as
ART logging is different from libbase logging.
Fix up some includes.
Bug: 15436106
Bug: 31338270
Test: m test-art-host
Change-Id: I9fbe4b85b2d74e079a4981f3aec9af63b163a461
|
|
To prepare separation of disassembler from libart, add a function
hook to the disassembler options for thread offset name printing.
Bug: 15436106
Change-Id: I9e9b7e565ae923952c64026f675ac527b560f51b
|
|
|
|
Move away from size_t to dedicated enum (class).
Bug: 30373134
Bug: 30419309
Test: m test-art-host
Change-Id: Id453c330f1065012e7d4f9fc24ac477cc9bb9269
|
|
Rationale:
These instructions should be marked as load, so that, using
Intel syntax, destination (xmm0) appears at left hand side, as in
roundss xmm0, xmm1
and not the other way around. First I suspected a bug in the
encoding (hence the test) and even the register allocator, but
since the code behaved correctly, only disassembly was really wrong.
Test: disassembler_x86_test (but nothing for actual disassembly)
BUG=26327751
Change-Id: I060ef57f4d5a64cdc04b97ae8a799d1c0d22da05
|
|
|
|
Rationale:
Recognizing this important operation as an intrinsic has
various advantages:
(1) having the no-side-effects/no-throw allows for
much more GVN/LICM/BCE.
(2) Some architectures, like x86_64, provide direct
support for this operation.
Performance improvements on X86_64:
CheckersEvalBench (32-bit bitboard): 27,210KNS -> 36,798KNS = + 35%
ReversiEvalBench (64-bit bitboard): 52,562KNS -> 89,086KNS = + 69%
Change-Id: I65d549b0469b7909b12c6611cdc34a8640a5751f
|
|
There are some variations of NOPs which are possible on x86.
Change-Id: I6aab3bc98682e521532cc746f3a371d9c5d98ee8
|
|
These are for use in new intrinsics. Bsf (Bit Scan Forward) is used in
{Long,Integer}NumberOfTrailingZeros and the rotates are used in
{Long,Integer}Rotate{Left,Right}.
Change-Id: Icb599d7e1eec4e4ea9e5b4f0b1654c7b8d4de678
Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
|
|
Add support for 'bsr' instruction. Add tests.
Change-Id: I1cd8b30d7f3f5ee7fbeef8124cc6a31bf8ce59d5
Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
|
|
Add 'REP MOVSW' as a supported instruction for x86 32 and 64 bit.
Added tests.
Change-Id: I1c615ac1e7fa46c48983c90f791b92be0375c8b8
Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
|
|
Also included support for repe_cmpsl instruction. This is a follow up to
commit 71311f868e2 which added support for repe_cmpsw in the x86 and
x86_64 assemblers.
Change-Id: I2beac05a57341539acf96cdf77062facd031a864
|
|
The instruction PEXTRW encoded by sequence 66 0F 3A 15
was incorrectly encoded in compiler table and incorrectly
parsed by disassembler.
Change-Id: Ib4d4db923cb15a76e74f13f6b5514cb0d1cbe164
Signed-off-by: nikolay serdjuk <nikolay.y.serdjuk@intel.com>
|
|
The instruction PEXTRW encoded by sequence 66 0F C5 has form:
PEXTRW reg, xmm, imm8. Its reg is encoded in the REG part and
xmm is encoded in the R/M part of ModR/M byte. Since the order
is opposite to the PEXTRB and PEXTRD, we have to set 'load' to
true and 'store' leave as false.
Change-Id: I32c42ea005eec29f7bf969f275c36ffa0a95fa6d
|
|
Implement floor/ceil/round/RoundFloat on x86 and x86_64.
Implement RoundDouble on x86_64.
Add support for roundss and roundsd on both architectures. Support them
in the disassembler as well.
Add the instruction set features for x86, as the 'round' instruction is
only supported if SSE4.1 is supported.
Fix the tests to handle the addition of passing the instruction set
features to x86 and x86_64.
Add assembler tests for roundsd and roundss to x86_64 assembler tests.
Change-Id: I9742d5930befb0bbc23f3d6c83ce0183ed9fe04f
Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
|
|
Index 4 in SIB is valid when given Rex.x, where it denotes r12 and
not the invalid rsp.
Bug: 19149560
Change-Id: I1a74bcbb1ccf3686e45a3df5d852a86444f9d850
|
|
Probably a typo from last refactoring.
Change-Id: I086a87120ca0f0dfddbe803573b0e0f79cc6d945
|
|
Using Clang, this pushes the frame size of the caller across our
limit. Thus forbid inlining. The function is only called once per
compile, impact is insignificant.
Bug: 18738594
Change-Id: I19c3f1168a5104ab508a8dbf9f2a8c035cb97e3c
|
|
The function leads to large stack frames with Clang. Break out
some parts and use four char* variables for opcode.
Bug: 18733806
Change-Id: I8bf6da6c763175d7081c4231fa5d3b6809316220
|
|
Change-Id: I2f0a2851a15f5a099a5bc0249e3ea0616cdcd94e
|
|
Change-Id: I7a79c1671a6ff8b2040887133b3e0925ef9a3cfe
|
|
Move DISALLOW_COPY_AND_ASSIGN to delete functions. By no having declarations
with no definitions this prompts better warning messages so deal with these
by correcting the code.
Add a DISALLOW_ALLOCATION and use for ValueObject and mirror::Object.
Make X86 assembly operand types ValueObjects to fix compilation errors.
Tidy the use of iostream and ostream.
Avoid making cutils a dependency via mutex-inl.h for tests that link against
libart. Push tracing dependencies into appropriate files and mutex.cc.
x86 32-bit host symbols size is increased for libarttest, avoid copying this
in run-test 115 by using symlinks and remove this test's higher than normal
ulimit.
Fix the RunningOnValgrind test in RosAllocSpace to not use GetHeap as it
returns NULL when the heap is under construction by Runtime.
Change-Id: Ia246f7ac0c11f73072b30d70566a196e9b78472b
|
|
Move gVerboseMethods to CompilerOptions. Now "--verbose-methods=" option to
dex2oat rather than runtime argument "-verbose-methods:".
Move ToStr and Dumpable out of logging.h, move LogMessageData into logging.cc
except for a forward declaration.
Remove ConstDumpable as Dump methods are all const (and make this so if not
currently true).
Make LogSeverity an enum and improve compile time assertions and type checking.
Remove log_severity.h that's only used in logging.h.
With system headers gone from logging.h, go add to .cc files missing system
header includes.
Also, make operator new in ValueObject private for compile time instantiation
checking.
Change-Id: I3228f614500ccc9b14b49c72b9821c8b0db3d641
|
|
Falling through switch cases on a clang build must now annotate the fallthrough
with the FALLTHROUGH_INTENDED macro.
Bug: 17731372
Change-Id: I836451cd5f96b01d1ababdbf9eef677fe8fa8324
|
|
Remove extra semicolons.
Dollar signs in C++ identifiers are an extension.
Named variadic macros are an extension.
Binary literals are a C++14 feature.
Enum re-declarations are not allowed.
Overflow.
Change-Id: I7d16b2217b2ef2959ca69de84eaecc754517714a
|
|
- Added printing of OatClass offsets.
- Added printing of OatMethod offsets.
- Added bounds checks for code size size, code size, mapping table, gc map, vmap table.
- Added sanity check of 100k for code size.
- Added partial disassembly of questionable code.
- Added --no-disassemble to disable disassembly.
- Added --no-dump:vmap to disable vmap dumping.
- Reordered OatMethod info to be in file order.
Bug: 15567083
(cherry picked from commit 34fa79ece5b3a1940d412cd94dbdcc4225aae72f)
Change-Id: I2c368f3b81af53b735149a866f3e491c9ac33fb8
|
|
This patch fixes the implementation of the x86 vectorization opcodes.
Change-Id: I0028d54a9fa6edce791b7e3a053002d076798748
Signed-off-by: Razvan A Lupusoru <razvan.a.lupusoru@intel.com>
Signed-off-by: Udayan Banerji <udayan.banerji@intel.com>
Signed-off-by: Philbert Lin <philbert.lin@intel.com>
|
|
Added non-temporal store support as a hint from the ME.
Added the implementation of the memory barrier
extended instruction that supports non-temporal stores
by explicitly serializing all previous store-to-memory instructions.
Change-Id: I8205a92083f9725253d8ce893671a133a0b6849d
Signed-off-by: Jean Christophe Beyler <jean.christophe.beyler@intel.com>
Signed-off-by: Chao-ying Fu <chao-ying.fu@intel.com>
|
|
Added support for x86 inlined shift long for 32bit
Change-Id: I6caef60dd7d80227c3057fd6f64b0ecb11025afa
Signed-off-by: Yixin Shou <yixin.shou@intel.com>
|
|
The patch fixes an issue with disassembling 'movsxd' and 'movabsq'
instructions altered with 64bit immediates: not only a REX.W prefix
may be prepended to these instructions.
Change-Id: Ida7c7b368327a6b5cae1ff12ec00ceb0769c0a3d
Signed-off-by: Vladimir Kostyukov <vladimir.kostyukov@intel.com>
|
|
Registers, which are part of opcode might have 1-byte size
or 2-byte size depending on the instruction and 66h prefix.
This patch makes the decoding of such instruction correct.
Examples:
- '664155' should be decoded as 'push r13w'
(66h + REX.B)
- '41B320' should be decoded as 'mov r11l, 0x20'
(byte-operand + REX.B)
Change-Id: I83913e3a5f2ef03c4019c0f5eea6b11fc51ee4cc
Signed-off-by: Vladimir Kostyukov <vladimir.kostyukov@intel.com>
|
|
|
|
Add support for reserving vector registers for the duration of vector loop.
Add support for 16x16 multiplication, shifts, and add reduce.
Changed the vectorization implementation to be able to use the dataflow
elements for SSA recreation and fixed a few implementation details.
Change-Id: I2f358f05f574fc4ab299d9497517b9906f234b98
Signed-off-by: Jean Christophe Beyler <jean.christophe.beyler@intel.com>
Signed-off-by: Olivier Come <olivier.come@intel.com>
Signed-off-by: Udayan Banerji <udayan.banerji@intel.com>
|
|
The patch adresses the coments from review done by Ian Rogers.
Clean-up of assembler.
Change-Id: I9dbb350dfc6645f8a63d624b2b785233529459a9
Signed-off-by: Serguei Katkov <serguei.i.katkov@intel.com>
|
|
|
|
Some of the FF-opcodes' (i.e., push, call, jmp) register names
depend on the the target (32-bit vs 64-bit). This patch makes
such opcodes target-specific.
Change-Id: I4fa0b7ee5310e14f4022850ac2160c21be5d1c99
Signed-off-by: Vladimir Kostyukov <vladimir.kostyukov@intel.com>
|
|
This patch load 64 bit constant into a register by a single movabsq
instruction on 64 bit bit instead of previous mov, shift, add
instruction sequences.
Change-Id: I9d013c4f6c0b5c2e43bd125f91436263c7e6028c
Signed-off-by: Yixin Shou <yixin.shou@intel.com>
|
|
This patch extends the disassembler with new FPU instructions:
- fstsw
- fucompp
- fprem
Change-Id: I9458510bc17f2b3b286edec102552f64be05147e
Signed-off-by: Vladimir Kostyukov <vladimir.kostyukov@intel.com>
|
|
The patch adds the HADDPS, HADDPD, SHUFPS, and SHUFPD instruction generation
for X86.
Change-Id: Ida105d3e57be231a5331564c1a9bc298cf176ce6
Signed-off-by: Olivier Come <olivier.come@intel.com>
|
|
Yet another instruction not disassembled properly.
Add 'b', 'w', 'q' to opcodes to diffferentiate between various versions
and make it more understandable.
Change-Id: Ib794aac660bc8bc4900bfa49eab5aed682996adc
Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
|
|
I noticed another missing instruction.
Change-Id: I71170496b014ac2609116eff2aeb13a13e71e263
Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
|