ci/gcc.git - [no description]

diff options

author	Richard Sandiford <richard.sandiford@arm.com>	2020-04-14 21:04:03 +0100
committer	Richard Sandiford <richard.sandiford@arm.com>	2020-04-17 16:09:38 +0100
commit	8b50d7a47624030d87645237c60bd8f7ac78b2ec (patch)
tree	00f4b13286baea1366f15755050f0a7f8e6f7913 /libstdc++-v3/testsuite/21_strings/basic_string
parent	2e3897490e0f99b22a2813cfb34d59a1ea71ff68 (diff)

aarch64: Tweak SVE load/store costs

We were seeing performance regressions on 256-bit SVE with code like: for (int i = 0; i < count; ++i) #pragma GCC unroll 128 for (int j = 0; j < 128; ++j) *dst++ = 1; (derived from lmbench). For 128-bit SVE, it's clearly better to use Advanced SIMD STPs here, since they can store 256 bits at a time. We already do this for -msve-vector-bits=128 because in that case Advanced SIMD comes first in autovectorize_vector_modes. If we handled full-loop predication well for this kind of loop, the choice between Advanced SIMD and 256-bit SVE would be mostly a wash, since both of them could store 256 bits at a time. However, SVE would still have the extra prologue overhead of setting up the predicate, so Advanced SIMD would still be the natural choice. As things stand though, we don't handle full-loop predication well for this kind of loop, so the 256-bit SVE code is significantly worse. Something to fix for GCC 11 (hopefully). However, even though we account for the overhead of predication in the cost model, the SVE version (wrongly) appeared to need half the number of stores. That was enough to drown out the predication overhead and meant that we'd pick the SVE code over the Advanced SIMD code. 512-bit SVE has a clear advantage over Advanced SIMD, so we should continue using SVE there. This patch tries to account for this in the cost model. It's a bit of a compromise; see the comment in the patch for more details. 2020-04-17 Richard Sandiford <richard.sandiford@arm.com> gcc/ * config/aarch64/aarch64.c (aarch64_advsimd_ldp_stp_p): New function. (aarch64_sve_adjust_stmt_cost): Add a vectype parameter. Double the cost of load and store insns if one loop iteration has enough scalar elements to use an Advanced SIMD LDP or STP. (aarch64_add_stmt_cost): Update call accordingly. gcc/testsuite/ * gcc.target/aarch64/sve/cost_model_2.c: New test. * gcc.target/aarch64/sve/cost_model_3.c: Likewise. * gcc.target/aarch64/sve/cost_model_4.c: Likewise. * gcc.target/aarch64/sve/cost_model_5.c: Likewise. * gcc.target/aarch64/sve/cost_model_6.c: Likewise. * gcc.target/aarch64/sve/cost_model_7.c: Likewise.

Diffstat (limited to 'libstdc++-v3/testsuite/21_strings/basic_string')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: