diff options
author | Elliott Hughes <enh@google.com> | 2019-10-08 12:04:09 -0700 |
---|---|---|
committer | Elliott Hughes <enh@google.com> | 2019-10-08 12:04:09 -0700 |
commit | a4959aa6f819df0efdab6995e5de530655734389 (patch) | |
tree | 7f2ceae2b6c167bbfd20d1e9bbc9c1d52ff6fa7e /libc/malloc_hooks/malloc_hooks.cpp | |
parent | 29ec2881a0a87ccda75c77fd967934d863d53cc1 (diff) |
Reimplement the <ctype.h> is* functions.
Following on from the towlower()/towupper() changes, add benchmarks for
most of <ctype.h>, rewrite the tests to cover the entire defined range
for all of these functions, and then reimplement most of the functions.
The old table-based implementation is mostly a bad idea on modern
hardware, with only ispunct() showing a significant benefit compared to
any other way I could think of writing it, and isalnum() a marginal but
still convincingly genuine benefit.
My new benchmarks make an effort to test an example from each relevant
range of characters to avoid, say, accidentally optimizing the behavior
of `isalnum('0')` at the expense of `isalnum('z')`.
Interestingly, clang is able to generate what I believe to be the
optimal implementations from the most readable code, which is
impressive. It certainly matched or beat all my attempts to be clever!
The BSD table-based implementations made a special case of EOF despite
having a `_ctype_` table that's offset by 1 to include EOF at index 0.
I'm not sure why they didn't take advantage of that, but removing the
explicit check for EOF measurably improves the generated code on arm and
arm64, so even the two functions that still use the table benefit from
this rewrite.
Here are the benchmark results:
arm64 before:
BM_ctype_isalnum_n 3.73 ns 3.73 ns 183727137
BM_ctype_isalnum_y1 3.82 ns 3.81 ns 186383058
BM_ctype_isalnum_y2 3.73 ns 3.72 ns 187809830
BM_ctype_isalnum_y3 3.78 ns 3.77 ns 181383055
BM_ctype_isalpha_n 3.75 ns 3.75 ns 189453927
BM_ctype_isalpha_y1 3.76 ns 3.75 ns 184854043
BM_ctype_isalpha_y2 4.32 ns 3.78 ns 186326931
BM_ctype_isascii_n 2.49 ns 2.48 ns 275583822
BM_ctype_isascii_y 2.51 ns 2.51 ns 282123915
BM_ctype_isblank_n 3.11 ns 3.10 ns 220472044
BM_ctype_isblank_y1 3.20 ns 3.19 ns 226088868
BM_ctype_isblank_y2 3.11 ns 3.11 ns 220809122
BM_ctype_iscntrl_n 3.79 ns 3.78 ns 188719938
BM_ctype_iscntrl_y1 3.72 ns 3.71 ns 186209237
BM_ctype_iscntrl_y2 3.80 ns 3.80 ns 184315749
BM_ctype_isdigit_n 3.76 ns 3.74 ns 188334682
BM_ctype_isdigit_y 3.78 ns 3.77 ns 186249335
BM_ctype_isgraph_n 3.99 ns 3.98 ns 177814143
BM_ctype_isgraph_y1 3.98 ns 3.95 ns 175140090
BM_ctype_isgraph_y2 4.01 ns 4.00 ns 178320453
BM_ctype_isgraph_y3 3.96 ns 3.95 ns 175412814
BM_ctype_isgraph_y4 4.01 ns 4.00 ns 175711174
BM_ctype_islower_n 3.75 ns 3.74 ns 188604818
BM_ctype_islower_y 3.79 ns 3.78 ns 154738238
BM_ctype_isprint_n 3.96 ns 3.95 ns 177607734
BM_ctype_isprint_y1 3.94 ns 3.93 ns 174877244
BM_ctype_isprint_y2 4.02 ns 4.01 ns 178206135
BM_ctype_isprint_y3 3.94 ns 3.93 ns 175959069
BM_ctype_isprint_y4 4.03 ns 4.02 ns 176158314
BM_ctype_isprint_y5 3.95 ns 3.94 ns 178745462
BM_ctype_ispunct_n 3.78 ns 3.77 ns 184727184
BM_ctype_ispunct_y 3.76 ns 3.75 ns 187947503
BM_ctype_isspace_n 3.74 ns 3.74 ns 185300285
BM_ctype_isspace_y1 3.77 ns 3.76 ns 187202066
BM_ctype_isspace_y2 3.73 ns 3.73 ns 184105959
BM_ctype_isupper_n 3.81 ns 3.80 ns 185038761
BM_ctype_isupper_y 3.71 ns 3.71 ns 185885793
BM_ctype_isxdigit_n 3.79 ns 3.79 ns 184965673
BM_ctype_isxdigit_y1 3.76 ns 3.75 ns 188251672
BM_ctype_isxdigit_y2 3.79 ns 3.78 ns 184187481
BM_ctype_isxdigit_y3 3.77 ns 3.76 ns 187635540
arm64 after:
BM_ctype_isalnum_n 3.37 ns 3.37 ns 205613810
BM_ctype_isalnum_y1 3.40 ns 3.39 ns 204806361
BM_ctype_isalnum_y2 3.43 ns 3.43 ns 205066077
BM_ctype_isalnum_y3 3.50 ns 3.50 ns 200057128
BM_ctype_isalpha_n 2.97 ns 2.97 ns 236084076
BM_ctype_isalpha_y1 2.97 ns 2.97 ns 236083626
BM_ctype_isalpha_y2 2.97 ns 2.97 ns 236084246
BM_ctype_isascii_n 2.55 ns 2.55 ns 272879994
BM_ctype_isascii_y 2.46 ns 2.45 ns 286522323
BM_ctype_isblank_n 3.18 ns 3.18 ns 220431175
BM_ctype_isblank_y1 3.18 ns 3.18 ns 220345602
BM_ctype_isblank_y2 3.18 ns 3.18 ns 220308509
BM_ctype_iscntrl_n 3.10 ns 3.10 ns 220344270
BM_ctype_iscntrl_y1 3.10 ns 3.07 ns 228973615
BM_ctype_iscntrl_y2 3.07 ns 3.07 ns 229192626
BM_ctype_isdigit_n 3.07 ns 3.07 ns 228925676
BM_ctype_isdigit_y 3.07 ns 3.07 ns 229182934
BM_ctype_isgraph_n 2.66 ns 2.66 ns 264268737
BM_ctype_isgraph_y1 2.66 ns 2.66 ns 264445277
BM_ctype_isgraph_y2 2.66 ns 2.66 ns 264327427
BM_ctype_isgraph_y3 2.66 ns 2.66 ns 264427480
BM_ctype_isgraph_y4 2.66 ns 2.66 ns 264155250
BM_ctype_islower_n 2.66 ns 2.66 ns 264421600
BM_ctype_islower_y 2.66 ns 2.66 ns 264341148
BM_ctype_isprint_n 2.66 ns 2.66 ns 264415198
BM_ctype_isprint_y1 2.66 ns 2.66 ns 264268793
BM_ctype_isprint_y2 2.66 ns 2.66 ns 264419205
BM_ctype_isprint_y3 2.66 ns 2.66 ns 264205886
BM_ctype_isprint_y4 2.66 ns 2.66 ns 264440797
BM_ctype_isprint_y5 2.72 ns 2.72 ns 264333293
BM_ctype_ispunct_n 3.52 ns 3.51 ns 198956572
BM_ctype_ispunct_y 3.38 ns 3.38 ns 201661792
BM_ctype_isspace_n 3.39 ns 3.39 ns 206896620
BM_ctype_isspace_y1 3.39 ns 3.39 ns 206569020
BM_ctype_isspace_y2 3.39 ns 3.39 ns 206564415
BM_ctype_isupper_n 2.76 ns 2.75 ns 254227134
BM_ctype_isupper_y 2.76 ns 2.75 ns 254235314
BM_ctype_isxdigit_n 3.60 ns 3.60 ns 194418653
BM_ctype_isxdigit_y1 2.97 ns 2.97 ns 236082424
BM_ctype_isxdigit_y2 3.48 ns 3.48 ns 200390011
BM_ctype_isxdigit_y3 3.48 ns 3.48 ns 202255815
arm32 before:
BM_ctype_isalnum_n 4.77 ns 4.76 ns 129230464
BM_ctype_isalnum_y1 4.88 ns 4.87 ns 147939321
BM_ctype_isalnum_y2 4.74 ns 4.73 ns 145508054
BM_ctype_isalnum_y3 4.81 ns 4.80 ns 144968914
BM_ctype_isalpha_n 4.80 ns 4.79 ns 148262579
BM_ctype_isalpha_y1 4.74 ns 4.73 ns 145061326
BM_ctype_isalpha_y2 4.83 ns 4.82 ns 147642546
BM_ctype_isascii_n 3.74 ns 3.72 ns 186711139
BM_ctype_isascii_y 3.79 ns 3.78 ns 183654780
BM_ctype_isblank_n 4.20 ns 4.19 ns 169733252
BM_ctype_isblank_y1 4.19 ns 4.18 ns 165713363
BM_ctype_isblank_y2 4.22 ns 4.21 ns 168776265
BM_ctype_iscntrl_n 4.75 ns 4.74 ns 145417484
BM_ctype_iscntrl_y1 4.82 ns 4.81 ns 146283250
BM_ctype_iscntrl_y2 4.79 ns 4.78 ns 148662453
BM_ctype_isdigit_n 4.77 ns 4.76 ns 145789210
BM_ctype_isdigit_y 4.84 ns 4.84 ns 146909458
BM_ctype_isgraph_n 4.72 ns 4.71 ns 145874663
BM_ctype_isgraph_y1 4.86 ns 4.85 ns 142037606
BM_ctype_isgraph_y2 4.79 ns 4.78 ns 145109612
BM_ctype_isgraph_y3 4.75 ns 4.75 ns 144829039
BM_ctype_isgraph_y4 4.86 ns 4.85 ns 146769899
BM_ctype_islower_n 4.76 ns 4.75 ns 147537637
BM_ctype_islower_y 4.79 ns 4.78 ns 145648017
BM_ctype_isprint_n 4.82 ns 4.81 ns 147154780
BM_ctype_isprint_y1 4.76 ns 4.76 ns 145117604
BM_ctype_isprint_y2 4.87 ns 4.86 ns 145801406
BM_ctype_isprint_y3 4.79 ns 4.78 ns 148043446
BM_ctype_isprint_y4 4.77 ns 4.76 ns 145157619
BM_ctype_isprint_y5 4.91 ns 4.90 ns 147810800
BM_ctype_ispunct_n 4.74 ns 4.73 ns 145588611
BM_ctype_ispunct_y 4.82 ns 4.81 ns 144065436
BM_ctype_isspace_n 4.78 ns 4.77 ns 147153712
BM_ctype_isspace_y1 4.73 ns 4.72 ns 145252863
BM_ctype_isspace_y2 4.84 ns 4.83 ns 148615797
BM_ctype_isupper_n 4.75 ns 4.74 ns 148276631
BM_ctype_isupper_y 4.80 ns 4.79 ns 145529893
BM_ctype_isxdigit_n 4.78 ns 4.77 ns 147271646
BM_ctype_isxdigit_y1 4.74 ns 4.74 ns 145142209
BM_ctype_isxdigit_y2 4.83 ns 4.82 ns 146398497
BM_ctype_isxdigit_y3 4.78 ns 4.77 ns 147617686
arm32 after:
BM_ctype_isalnum_n 4.35 ns 4.35 ns 161086146
BM_ctype_isalnum_y1 4.36 ns 4.35 ns 160961111
BM_ctype_isalnum_y2 4.36 ns 4.36 ns 160733210
BM_ctype_isalnum_y3 4.35 ns 4.35 ns 160897524
BM_ctype_isalpha_n 3.67 ns 3.67 ns 189377208
BM_ctype_isalpha_y1 3.68 ns 3.67 ns 189438146
BM_ctype_isalpha_y2 3.75 ns 3.69 ns 190971186
BM_ctype_isascii_n 3.69 ns 3.68 ns 191029191
BM_ctype_isascii_y 3.68 ns 3.68 ns 191011817
BM_ctype_isblank_n 4.09 ns 4.09 ns 171887541
BM_ctype_isblank_y1 4.09 ns 4.09 ns 171829345
BM_ctype_isblank_y2 4.08 ns 4.07 ns 170585590
BM_ctype_iscntrl_n 4.08 ns 4.07 ns 170614383
BM_ctype_iscntrl_y1 4.13 ns 4.11 ns 171495899
BM_ctype_iscntrl_y2 4.19 ns 4.18 ns 165255578
BM_ctype_isdigit_n 4.25 ns 4.24 ns 165237008
BM_ctype_isdigit_y 4.24 ns 4.24 ns 165256149
BM_ctype_isgraph_n 3.82 ns 3.81 ns 183610114
BM_ctype_isgraph_y1 3.82 ns 3.81 ns 183614131
BM_ctype_isgraph_y2 3.82 ns 3.81 ns 183616840
BM_ctype_isgraph_y3 3.79 ns 3.79 ns 183620182
BM_ctype_isgraph_y4 3.82 ns 3.81 ns 185740009
BM_ctype_islower_n 3.75 ns 3.74 ns 183619502
BM_ctype_islower_y 3.68 ns 3.68 ns 190999901
BM_ctype_isprint_n 3.69 ns 3.68 ns 190899544
BM_ctype_isprint_y1 3.68 ns 3.67 ns 190192384
BM_ctype_isprint_y2 3.67 ns 3.67 ns 189351466
BM_ctype_isprint_y3 3.67 ns 3.67 ns 189430348
BM_ctype_isprint_y4 3.68 ns 3.68 ns 189430161
BM_ctype_isprint_y5 3.69 ns 3.68 ns 190962419
BM_ctype_ispunct_n 4.14 ns 4.14 ns 171034861
BM_ctype_ispunct_y 4.19 ns 4.19 ns 168308152
BM_ctype_isspace_n 4.50 ns 4.50 ns 156250887
BM_ctype_isspace_y1 4.48 ns 4.48 ns 155124476
BM_ctype_isspace_y2 4.50 ns 4.50 ns 155077504
BM_ctype_isupper_n 3.68 ns 3.68 ns 191020583
BM_ctype_isupper_y 3.68 ns 3.68 ns 191015669
BM_ctype_isxdigit_n 4.50 ns 4.50 ns 156276745
BM_ctype_isxdigit_y1 3.28 ns 3.27 ns 214729725
BM_ctype_isxdigit_y2 4.48 ns 4.48 ns 155265129
BM_ctype_isxdigit_y3 4.48 ns 4.48 ns 155216846
I've also corrected a small mistake in the documentation for isxdigit().
Test: tests and benchmarks
Change-Id: I4a77859f826c3fc8f0e327e847886882f29ec4a3
Diffstat (limited to 'libc/malloc_hooks/malloc_hooks.cpp')
0 files changed, 0 insertions, 0 deletions