Age | Commit message (Collapse) | Author |
|
This is in preparation of removing it from OatQuickMethodHeader.
Bug: 123510633
Test: m test-art-host-gtest
Test: ./art/test.py -b -r --host
Change-Id: I5c5adb4c040e329b81c1393aa1b80ee017729c8a
|
|
- Add a new hotness count in the ProfilingInfo to not conflict with
interpreter hotness which may use it for OSR.
- Add a baseline flag in the OatQuickMethodHeader to identify baseline
compiled methods.
- Add a -Xusetieredjit flag to experiment and test.
Bug: 119800099
Test: test.py with Xusetieredjit to true
Change-Id: I8512853f869f1312e3edc60bf64413dee9143c52
|
|
This reverts commit e2727154f25e0db9a5bb92af494d8e47b181dfcf.
Reason for revert: Breaks ASAN tests (ODR violation).
Bug: 142365358
Change-Id: I38103d74a1297256c81d90872b6902ff1e9ef7a4
|
|
Make symbols in compiler/optimizing hidden by a namespace
attribute. The unit intrinsic_objects.{h,cc} is excluded as
it is needed by dex2oat.
As the symbols are no longer exported, gtests are now linked
with the static version of the libartd-compiler library.
libart-compiler.so size:
- before:
arm: 2396152
arm64: 3345280
- after:
arm: 2016176 (-371KiB, -15.9%)
arm64: 2874480 (-460KiB, -14.1%)
Test: m test-art-host-gtest
Test: testrunner.py --host --optimizing --jit
Bug: 142365358
Change-Id: I1fb04a33351f53f00b389a1642e81a68e40912a8
|
|
Some of safepoints don't need to have DexRegisterMap info;
this will decrease the stackmap size.
.oat file size reduction:
- boot.oat: -233 kb (-5.4%)
- boot-framework.oat: -704 kb (-4.9%)
Test: 461-get-reference-vreg, 466-get-live-vreg.
Test: 543-env-long-ref, 616-cha*.
Test: test-art-target, +gc-stress.
Change-Id: Idbad355770e30a30dcf14127642e03ee666878b8
|
|
Keep the BitTable decoder simple (1+NumColumns varints).
Move special case handling up to CodeInfo (empty/dedup).
This speeds up CodeInfo by 5%, and maps startup by 0.05%.
Change in size is negligible (the bits mostly just move).
Test: test.py -b --host --64 --optimizing
Change-Id: Ib6abe52f04384de9ffd7cfba04a3124b62f713ff
|
|
Consumers of CodeInfo can skip significant chunks of work
if they can quickly determine that method has no inlining.
Store this fact as a flag bit at the start of code info.
This changes binary format and adds <0.1% to oat size.
I added the extra flag field as the simplest solution for now,
although I would like to use it for more things in the future.
(e.g. store the special cases of empty/deduped tables in it)
This improves app startup by 0.4% (maps,speed).
PMD on golem seems to gets around 15% faster.
Bug: 133257467
Test: ./art/test.py -b --host --64
Change-Id: Ia498a31bafc74b51cc95b8c70cf1da4b0e3d894e
|
|
This reverts commit e1412dacbf1d2a809bd1fca658cc8cb8f61f8ee6.
Bug: 123510633
Bug: 127305289
Reason for revert: b/127305289
Change-Id: I54557b05a44777f1fa2c15bde4fa648980f42eed
|
|
This reverts commit e35ac04a1a9a22b1c4386b27f3a30cd840aa17b1.
Bug: 123510633
Bug: 127305289
Reason for revert: b/127305289
Change-Id: I18c2d9291411b31641333c14c47da8c4fdf317f7
|
|
The unpacking is tricky for host tooling as we need to propagate ISA.
This adds 0.05% to oat file size.
Bug: 123510633
Change-Id: I5618db5e5dbe83d8a2bb89aef61cb0b10e336f40
|
|
This temporarily adds 0.25% to oat file size.
The space will be reclaimed back in follow-up CL.
This reverts commit 8f20a23a35fa6fbe4dcb4ff70268a24dc7fb2a24.
Reason for revert: Reland as-is after CL/903819
Bug: 123510633
Test: DCHECK compare the two stored code sizes.
Change-Id: Ia3ab31c208948f4996188764fcdcba13d9977d19
|
|
This reverts commit 68efa7b1128486e08ae60cd27181645b27bbd2e4.
Reason for revert: Breaks tests
Change-Id: I28fb143990f58e0d5f0b106bea9d9a159f19297e
|
|
This temporarily adds 0.25% to oat file size.
The space will be reclaimed back in follow-up CL.
Bug: 123510633
Test: DCHECK compare the two stored code sizes.
Change-Id: I15340824ca637fd075a4cef87771b06cb96bb9f4
|
|
Avoid the repetitive code patterns and simplify code.
Test: test-art-host-gtest-stack_map_test
Test: checked output of oatdump
Change-Id: I2354bc652837eb34efeecf4de56a027384544034
|
|
Test: test-art-host-gtest-stack_map_test
Change-Id: Ife021d03e4e486043ec609f9af8673ace7bde497
|
|
|
|
This reduces the peak memory used for large methods with
multiple blocks to schedule.
Compiling the aosp_taimen-userdebug boot image, the most
memory hungry method BatteryStats.dumpLocked has the
Scheduler memory allocations in ArenaStack hidden by the
register allocator:
- before:
MEM: used: 8300224, allocated: 9175040, lost: 197360
Scheduler 8300224
- after:
MEM: used: 5914296, allocated: 7864320, lost: 78200
SsaLiveness 5532840
RegAllocator 144968
RegAllocVldt 236488
The total arena memory used, including the ArenaAllocator
not listed above, goes from 44333648 to 41950324 (-5.4%).
(Measured with kArenaAllocatorCountAllocations=true,
kArenaAllocatorPreciseTracking=false.)
Also remove one unnecessary -Wframe-larger-than= workaround
and add one workaround for large frame with the above arena
alloc tracking flags.
Test: m test-art-host-gtest
Test: testrunner.py --host
Bug: 34053922
Change-Id: I7fd8d90dcc13b184b1e5bd0bcac072388710a129
|
|
This saves 0.3% of oat file size.
Test: test-art-host-gtest-stack_map_test
Change-Id: I85003946a9579f03cb1ed2b5e9b2c62b3efe6734
|
|
There is no need to treat it specially any more,
because of the de-duplication at BitTable level.
This saves 0.6% of oat file size.
Test: test-art-host-gtest
Change-Id: Ife7927d736243879a41d6f325d49ebf6930a63f6
|
|
Test: test-art-host-gtest
Change-Id: I5ce28973042f9241e72ceb52fc5db472ca571563
|
|
The stored information will be used in follow-up CLs.
This temporarily increases .oat file size by 0.7%.
Test: test-art-host-gtest
Change-Id: Ie7d898b06398ae44287bb1e8153861ab112a216c
|
|
Test: test-art-host-gtest-stack_map_test
Test: test-art-host-gtest-bit_table_test
Change-Id: I96c04e21864009b64cb3177a0e9f0f8782a9b10b
|
|
|
|
Instead of declaring the classes explicitly and then casting,
create generic BitTableBuilder::Entry class for that purpose.
This removes the need to keep the POD helper classes in sync.
Test: test-art-host-gtest-stack_map_test
Test: test-art-host-gtest-bit_table_test
Change-Id: I4c632313bafd3a4bc823648436a5310b6f2a1d13
|
|
Test: test-art-host-gtest-stack_map_test
Change-Id: I0abab008159db023d531df69214cd3bb8c0639bd
|
|
Add 'Kind' column to stack maps which marks special stack map types,
and use it at run-time to add extra sanity checks.
It will also allow us to binary search the stack maps.
The column increases .oat file by 0.2%.
Test: test-art-host-gtest-stack_map_test
Change-Id: I2a9143afa0e32bb06174604ca81a64c41fed232f
|
|
The register maps tend to be similar from stack map to stack map,
so instead of encoding them again, store only the modified ones.
The dex register bitmap stores the delta now - if register has
been modified since the previous stack map, the bit will be set.
The decoding logic scans backwards through stack maps until it
eventfully finds the most recent value of each register.
This CL saves ~2.5% of .oat file size (~10% of stackmap size).
Due to the scan, this makes dex register decoding slower by factor
of 2.5, but that still beats the old algorithm before refactoring.
Test: test-art-host-gtest-stack_map_test
Change-Id: Id5217a329eb757954e0c9447f38b05ec34118f84
|
|
This saves 0.3% of .oat file size.
Test: test-art-host-gtest-stack_map_test
Change-Id: Ic7d5addf04fb9b7a2f29a7d1d99ea93b39388fd2
|
|
The new version is more complicated but it gives much higher
confidence about the correctness of the stackmap encoding.
The old version was comparing the internal builder entries to the
decoded information, which verified the bit-level manipulations,
but it did not verify that we created the internal state correctly.
The new version directly compares the parameters passed to the
StackMapStream and the decoded values. This way, it really tests
the whole system. It uses lambda captures to record the parameters.
Test: test-art-host-gtest-stack_map_test
Change-Id: Ib92819cc35ce0d790128392d303f6feabd7d9c74
|
|
Simplify code by encoding dex register maps using BitTables.
The overall design is unchanged (bitmask+indices+catalogue).
This CL saves ~0.4% of .oat file size.
The dex register map decoding is factor of 3 faster now
(based on the time to verify the register maps on Arm).
This is not too surprising as the old version was O(n^2).
It also reduces compiler arena memory usage by 11% since the
BitTableBuilder is more memory efficient, we store less
intermediate data, and we deduplicate most data on the fly.
Test: test-art-host-gtest-stack_map_test
Change-Id: Ib703a5ddf7f581280522d589e4a2bfebe53c26a9
|
|
Store some of the needed decoding state explicitly to avoid passing it
around all the time. The DexRegisterMap class is rewritten in next CL.
Test: test-art-host-gtest-stack_map_test
Change-Id: Ie268dff2a1c1da2e08f0e6799ae51c30e11f350b
|
|
I need to reduce the StackMapEntry to a POD type so that it
can be used in BitTableBuilder.
Test: test-art-host-gtest-stack_map_test
Change-Id: I5f9ad7fdc9c9405f22669a11aea14f925ef06ef7
|
|
This reverts commit 8b20b5c1f5b454b2f8b8bff492c88724b5002600.
Reason for revert: Retry submit unmodified after fixing the test.
Use BitTable to store the masks as well and move the
deduplication responsibility to the BitTable builders.
Don't generate entries for masks which are all zeros.
This saves 0.2% of .oat file size on both Arm64 and Arm.
Encode registers as (value+shift) due to tailing zeros.
This saves 1.0% of .oat file size on Arm64 and 0.2% on Arm.
Test: test-art-target-gtest-exception_test
Test: test-art-host-gtest-bit_table_test
Test: test-art-host-gtest-stack_map_test
Change-Id: Ib643776dbec3f051cc29cd13ff39e453fab5fae9
|
|
This reverts commit ffaf87a429766ed80e6afee5bebea93db539620b.
Reason for revert: Breaks exception_test32 on target
for CMS and heap poisoning configs.
Change-Id: I127c17f693e28211a799f73a50e73105edee7e4c
|
|
Use BitTable to store the masks as well and move the
deduplication responsibility to the BitTable builders.
Don't generate entries for masks which are all zeros.
This saves 0.2% of .oat file size on both Arm64 and Arm.
Encode registers as (value+shift) due to tailing zeros.
This saves 1.0% of .oat file size on Arm64 and 0.2% on Arm.
Test: test-art-host-gtest
Change-Id: I636b7edd49e10e8afc9f2aa385b5980f7ee0e1f1
|
|
Remove most of the code related to handling of bit encodings.
The design is still same; the encodings are just more implicit.
Most of the complexity is replaced with a single BitTable class,
which is a generic purpose table of tightly bit-packed integers.
It has its own header which stores the bit-encoding of columns,
and that removes the need to handle the encodings explicitly.
Other classes, like StackMap, are accessors into the BitTable,
with named getter methods for the individual columns.
This CL saves ~1% of .oat file size (~4% of stackmap size).
Test: test-art-host-gtest
Change-Id: I7e92683753b0cc376300e3b23d892feac3670890
|
|
Move the remainder of the Arena stuff, plus dumpable and
runtime/*memory_region* to libartbase. More preparation to build
profiling library.
Bug: 22322814
Test: make -j 50 checkbuild
Change-Id: Iaf26d310c89bc58846553281576c18102f5e4122
|
|
When compiling an intrinsic method, generate a graph that
invokes the same method and try to compile it. If the call
is actually intrinsified (or simplified to other HIR) and
yields a leaf method, use the result of this compilation
attempt, otherwise compile the actual code or JNI stub.
Note that CodeGenerator::CreateThrowingSlowPathLocations()
actually marks the locations as kNoCall if the throw is not
in a catch block, thus considering some throwing methods
(for example, String.charAt()) as leaf methods.
We would ideally want to use the intrinsic codegen for all
intrinsics that do not generate a slow-path call to the
default implementation. Relying on the leaf method is
suboptimal as we're missing out on methods that do other
types of calls, for example runtime calls. This shall be
fixed in a subsequent CL.
Test: m test-art-host-gtest
Test: testrunner.py --host --optimizing
Bug: 67717501
Change-Id: I640fda7c22d4ff494b5ff77ebec3b7f5f75af652
|
|
Reuse the memory previously allocated on the ArenaStack by
optimization passes.
This CL handles only the architecture-independent codegen
and slow paths, architecture-dependent codegen allocations
shall be moved to the ScopedArenaAllocator in a follow-up.
Memory needed to compile the two most expensive methods for
aosp_angler-userdebug boot image:
BatteryStats.dumpCheckinLocked() : 19.6MiB -> 18.5MiB (-1189KiB)
BatteryStats.dumpLocked(): 39.3MiB -> 37.0MiB (-2379KiB)
Also move definitions of functions that use bit_vector-inl.h
from bit_vector.h also to bit_vector-inl.h .
Test: m test-art-host-gtest
Test: testrunner.py --host --optimizing
Bug: 64312607
Change-Id: I84688c3a5a95bf90f56bd3a150bc31fedc95f29c
|
|
Memory needed to compile the two most expensive methods for
aosp_angler-userdebug boot image:
BatteryStats.dumpCheckinLocked() : 21.1MiB -> 20.2MiB
BatteryStats.dumpLocked(): 42.0MiB -> 40.3MiB
This is because all the memory previously used by the graph
builder is reused by later passes.
And finish the "arena"->"allocator" renaming; make renamed
allocator pointers that are members of classes const when
appropriate (and make a few more members around them const).
Test: m test-art-host-gtest
Test: testrunner.py --host
Bug: 64312607
Change-Id: Ia50aafc80c05941ae5b96984ba4f31ed4c78255e
|
|
Define the constant with the types to allow lowering the dependency
on DexFile.
Test: m
Change-Id: I3c61421db45be96d2057e01b1a7825883d8bd178
|
|
The method info data is stored separately from the code info to
reduce oat size by improving deduplication of stack maps.
To reduce code size, this moves the invoke info and inline info
method indices to this table.
Oat size for a large app (arm64): 77746816 -> 74023552 (-4.8%)
Average oat size reduction for golem (arm64): 2%
Repurposed unused SrcMapElem deduping to be for MethodInfo.
TODO: Delete SrcMapElem in a follow up CL.
Bug: 36124906
Test: clean-oat-host && test-art-host-run-test
Change-Id: I2241362e728389030b959f42161ce817cf6e2009
|
|
Invoke info records the invoke type and dex method index for invokes
that may reach artQuickResolutionTrampoline. Having this information
recorded allows the runtime to avoid reading the dex code and pulling
in extra pages.
Code size increase for a large app:
93886360 -> 95811480 (2.05% increase)
1/2 of the code size increase is from making less stack maps deduped.
I suspect there is less deduping because of the invoke info method
index.
Merged disabled until we measure the RAM savings.
Test: test-art-host, N6P boots
Bug: 34109702
Change-Id: I6c5e4a60675a1d7c76dee0561a12909e4ab6d5d9
|
|
Before it only deduplicated the normal stack map dex register maps.
Code size for a large app: 93341616 -> 92678040 (-0.7%)
Added test.
Bug: 34621054
Test: test-art-host
Change-Id: I4fab4e40915bfa12cb978edbb3cbc19e2cf00954
|
|
Previously:
Table layout was computed multiple places like stack_map_stream,
and getters. This made it difficult to add new stack map tables and
made the code hard to understand.
This change makes the table layout specified all inside of the code
info. Updating the layout only requires changing ComputeTableOffsets.
Changed the stack map inline info offset to be an index, so that it is
not require the inline infos are directly after the dex register table.
Oat file size for a large app: 94459576 -> 93882040 (-0.61%)
Updated oatdump and fixed a bug that was incorrectly computing the
register mask bytes.
Bug: 34621054
Test: test-art-host
Change-Id: I3a7f141e09d5a18bce2bc6c9439835244a22016e
|
|
Data is commonly shared between different stack maps. The register
masks are stored after the stack masks.
Oat size for a large app:
96722288 -> 94485872 (-2.31%)
Average oat size reduction according to golem -3.193%.
Bug: 34621054
Test: test-art-host
Change-Id: I5eacf668992e866d11ddba0c01675038a16cdfb4
|
|
The stack masks repeat often enough so that it is worth deduplicating
them.
Oat size for a large app:
98143600 -> 96722288 (-1.44%)
Bug: 34621054
Test: test-art-host
Change-Id: If73d51e46066357049d5be2e406ae9a32b7ff1f4
|
|
Compress native PC based on instruction alignment. This reduces the
size of stack maps, boot.oat is 0.4% smaller for arm64.
Test: test-art-host, test-art-target, N6P booting
Change-Id: I2b70eecabda88b06fa80a85688fd992070d54278
|
|
Currently done for JIT. Can be extended for AOT and inlined boot
image methods.
Also refactor the lookup of a inlined method at runtime to not
rely on the dex cache, but look at the class loader tables.
bug: 30933338
test: test-art-host, test-art-target
Change-Id: I58bd4d763b82ab8ca3023742835ac388671d1794
|
|
Replace String.charAt() with HArrayLength, HBoundsCheck and
HArrayGet. This allows GVN on the HArrayLength and BCE on
the HBoundsCheck as well as using the infrastructure for
HArrayGet, i.e. better handling of constant indexes than
the old intrinsic and using the HArm64IntermediateAddress.
Bug: 28330359
Change-Id: I32bf1da7eeafe82537a60416abf6ac412baa80dc
|