Age | Commit message (Collapse) | Author |
|
Implemented as a runtime call.
Bug: 66890674
Test: art/test.py --target -r -t 979
Test: art/test.py --target --64 -r -t 979
Test: art/test.py --host -r -t 979
Change-Id: I67f461c819a7d528d7455afda8b4a59e9aed381c
|
|
Implemented as a runtime call.
Bug: 66890674
Test: art/test.py --target -r -t 979
Test: art/test.py --target --64 -r -t 979
Test: art/test.py --host -r -t 979
Change-Id: I4b3d3969d455d0198cfe122eea8abd54e0ea20ee
|
|
|
|
Disabled the build time flag. (No image version bump needed.)
Bug: 26687569
Bug: 64692057
Bug: 76420366
This reverts commit 3fbd3ad99fad077e5c760e7238bcd55b07d4c06e.
Change-Id: I5d83c4ce8a7331c435d5155ac6e0ce1c77d60004
|
|
This reverts commit 3f41323cc9da335e9aa4f3fbad90a86caa82ee4d.
Reason for revert: Fails sporadically.
Bug: 26687569
Bug: 64692057
Bug: 76420366
Change-Id: I84d1e9e46c58aeecf17591ff71fbac6a1e583909
|
|
Previously, the interpreter checked for dex pc 0 to see if
the method was just entered. If we deopt at dex pc 0, the
instrumentation would emit an erroneous MethodEnteredEvent
and the JIT would have received a MethodEntered() call. For
JIT-on-first-use, the method would be compiled the same way
as before, leading to the same deopt until stack overflow.
We fix this by using a new `from_deoptimize` flag passed
by the caller.
Test: 680-checker-deopt-dex-pc-0
Test: testrunner.py --host \
--jit --runtime-option=-Xjitthreshold:0
Bug: 62611253
Change-Id: I50b88f15484aeae16e1375a1d80f6563fb9066e7
|
|
Add extra output for debugging failures and re-enable
the bitstring type checks.
Test: m test-art-host-gtest
Test: testrunner.py --host --optimizing --jit
Test: testrunner.py --host -t 670-bitstring-type-check
Test: Pixel 2 XL boots.
Test: testrunner.py --target --optimizing --jit
Test: testrunner.py --target -t 670-bitstring-type-check
Bug: 64692057
Bug: 26687569
This reverts commit bff7a52e2c6c9e988c3ed1f12a2da0fa5fd37cfb.
Change-Id: I090e241983f3ac6ed8394d842e17716087d169ac
|
|
The code move to libdexfile/dex/descriptors_names.cc apparently did not
remove the original code from runtime/utils.cc. Fix that duplication
and all the header mentions needed. Also, split the test files to go
along with the new locations for the code to be tested.
Bug: 22322814
Test: make -j 50 checkbuild
make -j 50 test-art-host-gtest
flash & boot marlin
Change-Id: Ie734672c4bca2c647d8016291f910b5608674545
|
|
With regression test!
Bug: 72874888
Test: test-art-host
Change-Id: Icb3ec5dbfa14a1f77da681ba7e100ec9a5ab9ba6
|
|
Rationale:
Currently we have some remaining ugliness around signed and unsigned
SIMD operations due to lack of kUint32 and kUint64 in the HIR. By
"softly" introducing these types, ABS/MIN/MAX/HALVING_ADD/SAD_ACCUMULATE
operations can solely rely on the packed data types to distinguish
between signed and unsigned operations. Cleaner, and also allows for
some code removal in the current loop optimizer.
Bug: 72709770
Test: test-art-host test-art-target
Change-Id: I68e4cdfba325f622a7256adbe649735569cab2a3
|
|
Bug: 64692057
Bug: 71853552
Bug: 26687569
This reverts commit eb0ebed72432b3c6b8c7b38f8937d7ba736f4567.
Change-Id: I7daeaa077960ba41b2ed42bc47f17501621be4be
|
|
We guard the use of this feature with a compile-time flag,
set to true in this CL.
Boot image size for aosp_taimen-userdebug in AOSP master:
- before:
arm boot*.oat: 63604740
arm64 boot*.oat: 74237864
- after:
arm boot*.oat: 63531172 (-72KiB, -0.1%)
arm64 boot*.oat: 74135008 (-100KiB, -0.1%)
The new TypeCheckBenchmark yields the following changes
using the little cores of taimen fixed at 1.4016GHz:
32-bit 64-bit
timeCheckCastLevel1ToLevel1 11.48->15.80 11.47->15.78
timeCheckCastLevel2ToLevel1 15.08->15.79 15.08->15.79
timeCheckCastLevel3ToLevel1 19.01->15.82 17.94->15.81
timeCheckCastLevel9ToLevel1 42.55->15.79 42.63->15.81
timeCheckCastLevel9ToLevel2 39.70->14.36 39.70->14.35
timeInstanceOfLevel1ToLevel1 13.74->17.93 13.76->17.95
timeInstanceOfLevel2ToLevel1 17.02->17.95 16.99->17.93
timeInstanceOfLevel3ToLevel1 24.03->17.95 24.45->17.95
timeInstanceOfLevel9ToLevel1 47.13->17.95 47.14->18.00
timeInstanceOfLevel9ToLevel2 44.19->16.52 44.27->16.51
This suggests that the bitstring typecheck should not be
used for exact type checks which would be equivalent to the
"Level1ToLevel1" benchmark. Whether the implementation is
a beneficial replacement for the kClassHierarchyCheck and
kAbstractClassCheck on average depends on how many levels
from the target class (or Object for a negative result) is
a typical object's class.
Test: m test-art-host-gtest
Test: testrunner.py --host --optimizing --jit
Test: testrunner.py --host -t 670-bitstring-type-check
Test: Pixel 2 XL boots.
Test: testrunner.py --target --optimizing --jit
Test: testrunner.py --target -t 670-bitstring-type-check
Bug: 64692057
Bug: 71853552
Bug: 26687569
Change-Id: I538d7e036b5a8ae2cc3fe77662a5903d74854562
|
|
Adding InstructionSet::kLast shall make it easier to encode
the InstructionSet in fewer bits using BitField<>. However,
introducing `kLast` into the `art` namespace is not a good
idea, so we change the InstructionSet to an enum class.
This also uncovered a case of InstructionSet::kNone being
erroneously used instead of vixl32::Condition::None(), so
it's good to remove `kNone` from the `art` namespace.
Test: m test-art-host-gtest
Test: testrunner.py --host --optimizing
Change-Id: I6fa6168dfba4ed6da86d021a69c80224f09997a6
|
|
Rationale:
Since aligned data access is generally better (enables more efficient
aligned moves and prevents nasty cache line splits), computing and/or
enforcing alignment has been added to the vectorizer:
(1) If the initial alignment is known completely and suffices,
then a static peeling factor enforces proper alignment.
(2) If (1) fails, but the base alignment allows, dynamically peeling
until total offset is aligned forces proper aligned access patterns.
By using ART conventions only, any forced alignment is preserved
over suspends checks where data may move.
Note 1:
Current allocation convention is just 8 byte alignment on arrays/strings,
so only ARM32 benefits. However, all optimizations are implemented in
a general way, so moving to a 16 byte alignment will immediately
take advantage of any new convention!!
Note 2:
This CL also exposes how bad the choice of 12 byte offset of arrays
really is. Even though the new optimizations fix the misaligned, it
requires peeling for the most common case: 0 indexed loops. Therefore,
we may even consider moving to a 16 byte offset. Again the optimizations
in this CL will immediately take advantage of that new convention!!
Test: test-art-host test-art-target
Change-Id: Ib6cc0fb68c9433d3771bee573603e64a3a9423ee
|
|
Memory needed to compile the two most expensive methods for
aosp_angler-userdebug boot image:
BatteryStats.dumpCheckinLocked() : 21.1MiB -> 20.2MiB
BatteryStats.dumpLocked(): 42.0MiB -> 40.3MiB
This is because all the memory previously used by the graph
builder is reused by later passes.
And finish the "arena"->"allocator" renaming; make renamed
allocator pointers that are members of classes const when
appropriate (and make a few more members around them const).
Test: m test-art-host-gtest
Test: testrunner.py --host
Bug: 64312607
Change-Id: Ia50aafc80c05941ae5b96984ba4f31ed4c78255e
|
|
This CL adds all the necessary codegen for the Uint8 type
but does not add code transformations that use that code.
Vectorization codegens are modified to use Uint8 as the
packed type when appropriate. The side effects are now
disconnected from the instruction's type after the graph has
been built to allow changing HArrayGet/H*FieldGet/HVecLoad
to use a type different from the underlying field or array.
Note: HArrayGet for String.charAt() is modified to have
no side effects whatsoever; Strings are immutable.
Test: m test-art-host-gtest
Test: testrunner.py --host --optimizing --jit
Test: testrunner.py --target --optimizing on Nexus 6P
Test: Nexus 6P boots.
Bug: 23964345
Change-Id: If2dfffedcfb1f50db24570a1e9bd517b3f17bfd0
|
|
Replace most uses of the runtime's Primitive in compiler
with a new class DataType. This prepares for introducing
new types, such as Uint8, that the runtime does not need
to know about.
Test: m test-art-host-gtest
Test: testrunner.py --host
Bug: 23964345
Change-Id: Iec2ad82454eec678fffcd8279a9746b90feb9b0c
|
|
Test: Rely on TreeHugger.
Change-Id: I3388a469a96c665abc51abe2cf7d2b2004db7d78
|
|
It's not the right dex file if the invokes come from inlined
methods.
Test: manual
Change-Id: I4e3fb35e2bddc67510c39e12075c9a5ca0498a3a
|
|
Test: m test-art-host-gtest
Test: testrunner.py --host
Change-Id: I2b720e2ed8f96303cf80e9daa6d5278bf0c3da2f
|
|
Rationale:
The more vectorized, the better!
Test: test-art-target, test-art-host
Change-Id: I758becca5beaa5b97fab2ab70f2e00cb53458703
|
|
Test: test-art-host, test-art-target.
Change-Id: I06af8415e15352d09d176cae828163cbe99ae7a7
|
|
Rationale:
First of several idioms that map to very efficient SIMD instructions.
Note that the is-zero-ext and is-sign-ext are general-purpose utilities
that will be widely used in the vectorizer to detect low precision
idioms, so expect that code to be shared with many CLs to come.
Test: test-art-host, test-art-target
Change-Id: If7dc2926c72a2e4b5cea15c44ef68cf5503e9be9
|
|
|
|
We use HDeoptimize in a few places, but when it comes to data
dependency we either:
- don't have any (BCE, CHA), in which case we should make sure no
code that the deoptimzation guards moves before the HDeoptimize
- have one on the receiver (inline cache), in which case we can
update the dominated users with the HDeoptimize to get the data
dependency correct.
bug:35661819
bug:36371709
test: 644-checker-deopt
Change-Id: I4820c6710b06939e7f5a59606971693e995fb958
|
|
Rationale:
The last ART vectorizer break-out CL \O/
This ensures spilling on x86 and x86_4 is correct.
Also, it paves the way to wider SIMD on ARM and MIPS.
Test: test-art-host
Bug: 34083438
Change-Id: I5b27d18c2045f3ab70b64c335423b3ff2a507ac2
|
|
This commit mirrors the work that has already been done for ARM64.
Test: m test-art-target-run-test-551-checker-shifter-operand
Change-Id: Iec8c1563b035f40f0e18dcffde28d91dc21922f8
|
|
This reverts commit 0fb5af1c8287b1ec85c55c306a1c43820c38a337.
This takes us back to the original change and attempts to fix the
issues encountered:
- Adds transition record push/pop around artInvokePolymorphic.
- Changes X86/X64 relocations for MacSDK.
- Implements MIPS entrypoint for art_quick_invoke_polymorphic.
- Corrects size of returned reference in art_quick_invoke_polymorphic
on ARM.
Bug: 30550796,33191393
Test: art/test/run-test 953
Test: m test-art-run-test
Change-Id: Ib6b93e00b37b9d4ab743a3470ab3d77fe857cda8
|
|
This reverts commit 02e3092f8d98f339588e48691db77f227b48ac1e.
Reasons for revert:
- Breaks MIPS/MIPS64 build.
- Fails under GCStress test on x64.
- Different x64 build configuration doesn't like relocation.
Change-Id: I512555b38165d05f8a07e8aed528f00302061001
|
|
Adds basic support to invoke method handles in compiled code.
Enables method verification for methods containing invoke-polymorphic.
Adds k45cc/k45rc output to Instruction::DumpString() which
was found to be missing when enabling verification.
Include stack traces in test 957-methodhandle-transforms for
failures so they can be easily identified.
Bug: 30550796,33191393
Test: art/test/run-test 953
Test: m test-art-run-test
Change-Id: Ic9a96ea24906087597d96ad8159a5bc349d06950
|
|
The latest chapter in the ongoing saga of attempting to dump a DEX
file without having to start a whole runtime instance. This episode
finds us removing references to ArtMethod/ArtField/mirror.
One aspect of this change that I would like to call out specfically
is that the utils versions of the "Pretty*" functions all were written
to accept nullptr as an argument. I have split these functions up as
follows:
1) an instance method, such as PrettyClass that obviously requires
this != nullptr.
2) a static method, that behaves the same way as the util method, but
calls the instance method if p != nullptr.
This requires using a full class qualifier for the static methods,
which isn't exactly beautiful. I have tried to remove as many cases
as possible where it was clear p != nullptr.
Bug: 22322814
Test: test-art-host
Change-Id: I21adee3614aa697aa580cd1b86b72d9206e1cb24
|
|
Remove dependency on compiler driver for sharpening
and dex2dex (the methods called on the compiler driver were
doing unnecessary work), and remove the now unused methods
in compiler driver.
Also remove test that is now invalid, as sharpening always
succeeds.
test: m test-art-host m test-art-target
Change-Id: I54e91c6839bd5b0b86182f2f43ba5d2c112ef908
|
|
To prepare separation of disassembler from libart, add a function
hook to the disassembler options for thread offset name printing.
Bug: 15436106
Change-Id: I9e9b7e565ae923952c64026f675ac527b560f51b
|
|
Change-Id: I423fb378ee61fb53c3b328fc74f4e95cdef0992a
|
|
This will allow a cleaner commit in an upcoming
refactoring of register allocation.
Test: m test-art-host
Change-Id: If420c97b088b3c934411ff83373e024003120746
|
|
Currently, an HBoundsCheck is fed by an HArrayLength, causing a load of
the array length, followed by a register compare.
Avoid the load when we can by comparing directly with the array length
in memory. Implement this by marking the HArrayLength as 'emitted at
use site', and then generating the code in the HBoundsCheck.
Only do this replacement when we are the only user of the ArrayLength
and it isn't visible to the environment.
Handle the special case where the array is 'null' and where an implicit
null check can't be eliminated.
This code moves the load of the length to the slow code for the failed
check, which is what we want.
Test: 609-checker-x86-bounds-check
Change-Id: I9cdb183301e048234bb0ffeda940eedcf4a655bd
Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
|
|
And some other cleanup after
https://android-review.googlesource.com/230742
Test: No new tests. ART test suite passed (tested on host).
Change-Id: I4743bf17544d0234c6ccb46dd0c1b9aae5c93e17
|
|
|
|
Replace String.charAt() with HArrayLength, HBoundsCheck and
HArrayGet. This allows GVN on the HArrayLength and BCE on
the HBoundsCheck as well as using the infrastructure for
HArrayGet, i.e. better handling of constant indexes than
the old intrinsic and using the HArm64IntermediateAddress.
Bug: 28330359
Change-Id: I32bf1da7eeafe82537a60416abf6ac412baa80dc
|
|
For classes in the boot image, use either direct pointers
or PC-relative addresses. For other classes, use PC-relative
access to the dex cache arrays for AOT and direct address of
the type's dex cache slot for JIT.
For aosp_flounder-userdebug:
- 32-bit boot.oat: -252KiB (-0.3%)
- 64-bit boot.oat: -412KiB (-0.4%)
- 32-bit dalvik cache total: -392KiB (-0.4%)
- 64-bit dalvik-cache total: -2312KiB (-1.0%)
(contains more files than the 32-bit dalvik cache)
For aosp_flounder-userdebug forced to compile PIC:
- 32-bit boot.oat: -124KiB (-0.2%)
- 64-bit boot.oat: -420KiB (-0.5%)
- 32-bit dalvik cache total: -136KiB (-0.1%)
- 64-bit dalvik-cache total: -1136KiB (-0.5%)
(contains more files than the 32-bit dalvik cache)
Bug: 27950288
Change-Id: I4da991a4b7e53c63c92558b97923d18092acf139
|
|
Introduce HInstruction::GetInputRecords(), a new virtual
function that returns an ArrayRef<> to all input records.
Implement all other functions dealing with input records as
wrappers around GetInputRecords(). Rewrite functions that
previously used multiple virtual calls to deal with input
records, especially in loops, to prefetch the ArrayRef<>
only once for each instruction. Besides avoiding all the
extra calls, this also allows the compiler (clang++) to
perform additional optimizations.
This speeds up the Nexus 5 boot image compilation by ~0.5s
(4% of "Compile Dex File", 2% of dex2oat time) on AOSP ToT.
Change-Id: Id8ebe0fb9405e38d918972a11bd724146e4ca578
|
|
information."
|
|
Also adds 16 bit literal information.
Rationale:
When "run-away" instructions are disassembled, the literal
addresses may go out of range, causing oatdump to crash.
This CL guards memory access against the full memory range
allocated to assembly instructions and data (it is possible
but not really necessary to refine this a bit). Out of range
arguments are now displayed as (?) to denote the issue, which
is a lot nicer than crashing.
BUG=28670871
Change-Id: I51e9b6a6a99162546fe31059f14278e8980451c2
|
|
GVN was implicitly extending the liveness of an instruction across
an irreducible loop.
Fix this problem by clearing the value set at loop entries that contain
an irreducible loop.
bug:28252896
(cherry picked from commit 77ce6430af2709432b22344ed656edd8ec80581b)
Change-Id: Ie0121e83b2dfe47bcd184b90a69c0194d13fce54
|
|
Use HArrayLength for String.length() in anticipation of
changing the String.charAt() to HBoundsCheck+HArrayGet to
allow the existing BCE to seamlessly work for strings.
Use HArrayLength+HEqual for String.isEmpty().
We previously relied on inlining but we now want to apply
the new intrinsics even when we do not inline, i.e. when
compiling debuggable (as is currently the case for boot
image) or when we hit inlining limits, i.e. depth, size,
or the number of accumulated dex registers.
Bug: 28330359
Change-Id: Iab9d2f6d2967bdd930a72eb461f27efe8f37c103
|
|
Create a new template class IntrusiveForwardList<> that
mimicks std::forward_list<> except that all allocations
are handled externally. This is essentially the same as
boost::intrusive::slist<> but since we're not using Boost
we have to reinvent the wheel.
Use the new container to replace the HUseList and use the
iterators to "before" use nodes in HUserRecord<> to avoid
the extra pointer to the previous node which was used
exclusively for removing nodes from the list. This reduces
the size of the HUseListNode by 25%, 32B to 24B in 64-bit
compiler, 16B to 12B in 32-bit compiler. This translates
directly to overall memory savings for the 64-bit compiler
but due to rounding up of the arena allocations to 8B, we
do not get any improvement in the 32-bit compiler.
Compiling the Nexus 5 boot image with the 64-bit dex2oat
on host this CL reduces the memory used for compiling the
most hungry method, BatteryStats.dumpLocked(), by ~3.3MiB:
Before:
MEM: used: 47829200, allocated: 48769120, lost: 939920
Number of arenas allocated: 345,
Number of allocations: 815492, avg size: 58
...
UseListNode 13744640
...
After:
MEM: used: 44393040, allocated: 45361248, lost: 968208
Number of arenas allocated: 319,
Number of allocations: 815492, avg size: 54
...
UseListNode 10308480
...
Note that while we do not ship the 64-bit dex2oat to the
device, the JIT compilation for 64-bit processes is using
the 64-bit libart-compiler.
Bug: 28173563
Change-Id: I985eabd4816f845372d8aaa825a1489cf9569208
|
|
|
|
We have seen Checker tests timing out on debug-GC configurations after
having switched to Optimizing because its GraphVisualizer makes too
many syscalls which the configuration keeps track of.
This patch replaces std::endl with "\n" across GraphVisualizer so as
to not flush the stream after every line of output.
Bug: 27826765
Change-Id: I5e3f1e92f8a84f36d324d56945e2d420b2d36a5d
|
|
For strings in the boot image, use either direct pointers
or pc-relative addresses. For other strings, use PC-relative
access to the dex cache arrays for AOT and direct address of
the string's dex cache slot for JIT.
For aosp_flounder-userdebug:
- 32-bit boot.oat: -692KiB (-0.9%)
- 64-bit boot.oat: -948KiB (-1.1%)
- 32-bit dalvik cache total: -900KiB (-0.9%)
- 64-bit dalvik cache total: -3672KiB (-1.5%)
(contains more files than the 32-bit dalvik cache)
For aosp_flounder-userdebug forced to compile PIC:
- 32-bit boot.oat: -380KiB (-0.5%)
- 64-bit boot.oat: -928KiB (-1.0%)
- 32-bit dalvik cache total: -468KiB (-0.4%)
- 64-bit dalvik cache total: -1928KiB (-0.8%)
(contains more files than the 32-bit dalvik cache)
Bug: 26884697
Change-Id: Iec7266ce67e6fedc107be78fab2e742a8dab2696
|
|
GraphChecker tries to verify that Boolean inputs are properly typed.
This is non-trivial in the presence of simplifying optimizations
which capitalize on the fact that a Boolean value is internally
represented as an integer.
This patch removes the test from GraphChecker.
Bug: 27625564
Change-Id: Ic61ea2193765b4578550538e965ca4f80fa4b287
|