ci/gcc.git - [no description]

Age	Commit message (Collapse)	Author
2022-08-06	Daily bump.linaro-local/ci/tcwg_kernel/gnu-master-arm-stable-allyesconfig linaro-local/ci/tcwg_kernel/gnu-master-arm-next-allnoconfig linaro-local/ci/tcwg_kernel/gnu-master-arm-mainline-allmodconfig linaro-local/ci/tcwg_kernel/gnu-master-aarch64-stable-defconfig linaro-local/ci/tcwg_kernel/gnu-master-aarch64-next-defconfig linaro-local/ci/tcwg_kernel/gnu-master-aarch64-lts-defconfig linaro-local/ci/tcwg_bmk_llvm_tx1/llvm-master-aarch64-spec2k6-O2_LTO linaro-local/ci/tcwg_bmk_llvm_tk1/llvm-master-arm-spec2k6-O3 linaro-local/ci/tcwg_bmk_llvm_fx/llvm-master-aarch64-cpu2017-O3 linaro-local/ci/tcwg_bmk_gnu_tx1/gnu-master-aarch64-spec2k6-O2_LTO linaro-local/ci/tcwg_bmk_gnu_tx1/gnu-master-aarch64-spec2k6-O2 linaro-local/ci/tcwg_bmk_gnu_tk1/gnu-master-arm-spec2k6-O2_LTO linaro-local/ci/tcwg_bmk_gnu_apm/gnu-master-aarch64-spec2k6-Os	GCC Administrator

2022-08-05	New warning: -Wanalyzer-jump-through-null [PR105947]	David Malcolm
	This patch adds a new warning to -fanalyzer for jumps through NULL function pointers. gcc/analyzer/ChangeLog: PR analyzer/105947 * analyzer.opt (Wanalyzer-jump-through-null): New option. * engine.cc (class jump_through_null): New. (exploded_graph::process_node): Complain about jumps through NULL function pointers. gcc/ChangeLog: PR analyzer/105947 * doc/invoke.texi: Add -Wanalyzer-jump-through-null. gcc/testsuite/ChangeLog: PR analyzer/105947 * gcc.dg/analyzer/function-ptr-5.c: New test. Signed-off-by: David Malcolm <dmalcolm@redhat.com>
2022-08-05	middle-end: Allow backend to expand/split double word compare to 0/-1.linaro-local/ci/tcwg_bmk_gnu_fx/gnu-master-aarch64-cpu2017-O3	Roger Sayle
	This patch to the middle-end's RTL expansion reorders the code in emit_store_flag_1 so that the backend has more control over how best to expand/split double word equality/inequality comparisons against zero or minus one. With the current implementation, the middle-end always decides to lower this idiom during RTL expansion using SUBREGs and word mode instructions, without ever consulting the backend's machine description. Hence on x86_64, a TImode comparison against zero is always expanded as: (parallel [ (set (reg:DI 91) (ior:DI (subreg:DI (reg:TI 88) 0) (subreg:DI (reg:TI 88) 8))) (clobber (reg:CC 17 flags))]) (set (reg:CCZ 17 flags) (compare:CCZ (reg:DI 91) (const_int 0 [0]))) This patch, which makes no changes to the code itself, simply reorders the clauses in emit_store_flag_1 so that the middle-end first attempts expansion using the target's doubleword mode cstore optab/expander, and only if this fails, falls back to lowering to word mode operations. On x86_64, this allows the expander to produce: (set (reg:CCZ 17 flags) (compare:CCZ (reg:TI 88) (const_int 0 [0]))) which is a candidate for scalar-to-vector transformations (and combine simplifications etc.). On targets that don't define a cstore pattern for doubleword integer modes, there should be no change in behaviour. For those that do, the current behaviour can be restored (if desired) by restricting the expander/insn to not apply when the comparison is EQ or NE, and operand[2] is either const0_rtx or constm1_rtx. This change just keeps RTL expansion more consistent (in philosophy). For other doubleword comparisons, such as with operators LT and GT, or with constants other than zero or -1, the wishes of the backend are respected, and only if the optab expansion fails are the default fall-back implementations using narrower integer mode operations (and conditional jumps) used. 2022-08-05 Roger Sayle <roger@nextmovesoftware.com> gcc/ChangeLog * expmed.cc (emit_store_flag_1): Move code to expand double word equality and inequality against zero or -1, using word operations, to after trying to use the backend's cstore<mode>4 optab/expander.
2022-08-05	libstdc++: Add feature test macro for <experimental/scope>linaro-local/ci/tcwg_kernel/gnu-master-arm-stable-allnoconfig linaro-local/ci/tcwg_kernel/gnu-master-arm-norov-defconfig linaro-local/ci/tcwg_kernel/gnu-master-aarch64-stable-allyesconfig linaro-local/ci/tcwg_kernel/gnu-master-aarch64-next-allyesconfig	Jonathan Wakely
	libstdc++-v3/ChangeLog: * include/experimental/scope (__cpp_lib_experimental_scope): Define. * testsuite/experimental/scopeguard/uniqueres.cc: Check macro.
2022-08-05	libstdc++: Implement <experimental/scope> from LFTSv3	Jonathan Wakely
	libstdc++-v3/ChangeLog: * include/Makefile.am: Add new header. * include/Makefile.in: Regenerate. * include/experimental/scope: New file. * testsuite/experimental/scopeguard/uniqueres.cc: New test. * testsuite/experimental/scopeguard/exit.cc: New test.
2022-08-05	middle-end: Guard value_replacement and store_elim from seeing diamonds.	Tamar Christina
	This excludes value_replacement and store_elim from diamonds as they don't handle the form properly. gcc/ChangeLog: PR middle-end/106534 * tree-ssa-phiopt.cc (tree_ssa_phiopt_worker): Guard the value_replacement and store_elim from diamonds.
2022-08-05	backthreader dump fixlinaro-local/ci/tcwg_kernel/gnu-master-arm-stable-allmodconfig	Richard Biener
	This fixes odd SUCCEEDED dumps from the backthreader registry that can happen even though register_jump_thread cancelled the thread as invalid. * tree-ssa-threadbackward.cc (back_threader::maybe_register_path): Check whether the registry register_path rejected the path. (back_threader_registry::register_path): Return whether register_jump_thread succeeded.
2022-08-05	Inline unsupported_range constructor.	Aldy Hernandez
	An unsupported_range temporary is instantiated in every Value_Range for completeness sake and should be mostly a NOP. However, it's showing up in the callgrind stats, because it's not inline. This fixes the oversight. PR tree-optimization/106514 gcc/ChangeLog: * value-range.cc (unsupported_range::unsupported_range): Move... * value-range.h (unsupported_range::unsupported_range): ...here. (unsupported_range::set_undefined): New.
2022-08-05	tree-optimization/106533 - loop distribution of inner loop of nestlinaro-local/ci/tcwg_kernel/gnu-master-aarch64-norov-defconfig	Richard Biener
	Loop distribution currently gives up if the outer loop of a loop nest it analyzes contains a stmt with side-effects instead of continuing to analyze the innermost loop. The following fixes that by continuing anyway. PR tree-optimization/106533 * tree-loop-distribution.cc (loop_distribution::execute): Continue analyzing the inner loops when find_seed_stmts_for_distribution fails. * gcc.dg/tree-ssa/ldist-39.c: New testcase.
2022-08-05	rs6000: Correct return value of check_p9modulo_hw_available.linaro-local/ci/tcwg_kernel/gnu-master-arm-lts-allyesconfig linaro-local/ci/tcwg_kernel/gnu-master-aarch64-stable-allnoconfig linaro-local/ci/tcwg_bmk_llvm_fx/llvm-master-aarch64-spec2k6-O2	Haochen Gui
	Set the return value to 0 when modulo is supported, and to 1 when not supported. gcc/testsuite/ * lib/target-supports.exp (check_p9modulo_hw_available): Correct return value.
2022-08-04	[RSIC-V] Fix 32bit riscv with zbs extension enabled	Andrew Pinski
	The problem here was a disconnect between splittable_const_int_operand predicate and the function riscv_build_integer_1 for 32bits with zbs enabled. The splittable_const_int_operand predicate had a check for TARGET_64BIT which was not needed so this patch removed it. Committed as obvious after a build for risc32-elf configured with --with-arch=rv32imac_zba_zbb_zbc_zbs. Thanks, Andrew Pinski gcc/ChangeLog: * config/riscv/predicates.md (splittable_const_int_operand): Remove the check for TARGET_64BIT for single bit const values.
2022-08-05	Daily bump.	GCC Administrator

2022-08-04	Add myself as AutoFDO maintainer	Eugene Rozenfeld
	ChangeLog: * MAINTAINERS: Add myself as AutoFDO maintainer.
2022-08-04	libstdc++: Make std::string_view(Range&&) constructor explicitlinaro-local/ci/tcwg_bmk_llvm_fx/llvm-master-aarch64-cpu2017-O2	Jonathan Wakely
	The P2499R0 paper was recently approved for C++23. libstdc++-v3/ChangeLog: * include/std/string_view (basic_string_view(Range&&)): Add explicit as per P2499R0. * testsuite/21_strings/basic_string_view/cons/char/range_c++20.cc: Adjust implicit conversions. Check implicit conversions fail. * testsuite/21_strings/basic_string_view/cons/wchar_t/range_c++20.cc: Likewise.
2022-08-04	libstdc++: Add comparisons to std::default_sentinel_t (LWG 3719)	Jonathan Wakely
	This library defect was recently approved for C++23. libstdc++-v3/ChangeLog: * include/bits/fs_dir.h (directory_iterator): Add comparison with std::default_sentinel_t. Remove redundant operator!= for C++20. * (recursive_directory_iterator): Likewise. * include/bits/iterator_concepts.h [!__cpp_lib_concepts] (default_sentinel_t, default_sentinel): Define even if concepts are not supported. * include/bits/regex.h (regex_iterator): Add comparison with std::default_sentinel_t. Remove redundant operator!= for C++20. (regex_token_iterator): Likewise. (regex_token_iterator::_M_end_of_seq()): Add noexcept. * testsuite/27_io/filesystem/iterators/lwg3719.cc: New test. * testsuite/28_regex/iterators/regex_iterator/lwg3719.cc: New test. * testsuite/28_regex/iterators/regex_token_iterator/lwg3719.cc: New test.
2022-08-04	Loop over intersected bitmaps.	Andrew MacLeod
	compute_ranges_in_block loops over the import list and then checks the same bit in exports. It is nmore efficent to loop over the intersection of the 2 bitmaps. PR tree-optimization/106514 * gimple-range-path.cc (path_range_query::compute_ranges_in_block): Use EXECUTE_IF_AND_IN_BITMAP to loop over 2 bitmaps.
2022-08-04	middle-end: Simplify subtract where both arguments are being bitwise inverted.	Tamar Christina
	This adds a match.pd rule that drops the bitwwise nots when both arguments to a subtract is inverted. i.e. for: float g(float a, float b) { return ~(int)a - ~(int)b; } we instead generate float g(float a, float b) { return (int)b - (int)a; } We already do a limited version of this from the fold_binary fold functions but this makes a more general version in match.pd that applies more often. gcc/ChangeLog: * match.pd: New bit_not rule. gcc/testsuite/ChangeLog: * gcc.dg/subnot.c: New test.
2022-08-04	middle-end: Fix phi-ssa assertion triggers. [PR106519]	Tamar Christina
	For the diamond PHI form in tree_ssa_phiopt_worker we need to extract edge e2 sooner. This changes it so we extract it at the same time we determine we have a diamond shape. gcc/ChangeLog: PR middle-end/106519 * tree-ssa-phiopt.cc (tree_ssa_phiopt_worker): Check final phi edge for diamond shapes. gcc/testsuite/ChangeLog: PR middle-end/106519 * gcc.dg/pr106519.c: New test.
2022-08-04	match.pd: Add bitwise and pattern [PR106243]	Sam Feifer
	This patch adds a new optimization to match.pd. The pattern, -x & 1, now gets simplified to x & 1, reducing the number of instructions produced. This patch also adds tests for the optimization rule. Bootstrapped/regtested on x86_64-pc-linux-gnu. PR tree-optimization/106243 gcc/ChangeLog: * match.pd (-x & 1): New simplification. gcc/testsuite/ChangeLog: * gcc.dg/pr106243-1.c: New test. * gcc.dg/pr106243.c: New test.
2022-08-04	tree-optimization/106521 - unroll-and-jam LC SSA rewrite	Richard Biener
	The LC SSA rewrite performs SSA verification at start but the VN run performed on the unrolled-and-jammed body can leave us with invalid SSA form until CFG cleanup is run. So make sure we do that before rewriting into LC SSA. PR tree-optimization/106521 * gimple-loop-jam.cc (tree_loop_unroll_and_jam): Perform CFG cleanup manually before rewriting into LC SSA. * gcc.dg/torture/pr106521.c: New testcase.
2022-08-04	Backwards threader greedy search TLC	Richard Biener
	I've tried to understand how the greedy search works seeing the bitmap dances and the split into resolve_phi. I've summarized the intent of the algorithm as // For further greedy searching we want to remove interesting // names defined in BB but add ones on the PHI edges for the // respective edges. but the implementation differs in detail. In particular when there is more than one interesting PHI in BB it seems to only consider the first for translating defs across edges. It also only applies the loop crossing restriction when there is an interesting PHI. The following preserves the loop crossing restriction to the case of interesting PHIs but merges resolve_phi back, changing interesting as outlined with the intent above. It should get more threading cases when there are multiple interesting PHI defs in a block. It might be a bit faster due to less bitmap operations but in the end the main intent was to make what happens more obvious. * tree-ssa-threadbackward.cc (populate_worklist): Remove. (back_threader::resolve_phi): Likewise. (back_threader::find_paths_to_names): Rewrite greedy search.
2022-08-04	libstdc++: Rename data members of std::unexpected and std::bad_expected_access	Jonathan Wakely
	The P2549R1 paper was accepted for C++23. I already implemented it for our <expected>, but I didn't rename the private daata members, only the public member functions. This renames the data members for consistency with the working draft. libstdc++-v3/ChangeLog: * include/std/expected (unexpected::_M_val): Rename to _M_unex. (bad_expected_access::_M_val): Likewise.
2022-08-04	libstdc++: Update value of __cpp_lib_ios_noreplace macro	Jonathan Wakely
	My P2467R1 proposal was accepted for C++23 so there's an official value for this macro now. libstdc++-v3/ChangeLog: * include/bits/ios_base.h (__cpp_lib_ios_noreplace): Update value to 202207L. * include/std/version (__cpp_lib_ios_noreplace): Likewise. * testsuite/27_io/basic_ofstream/open/char/noreplace.cc: Check for new value. * testsuite/27_io/basic_ofstream/open/wchar_t/noreplace.cc: Likewise.
2022-08-04	libstdc++: Unblock atomic wait on non-futex platforms [PR106183]	Jonathan Wakely
	When using a mutex and condition variable, the notifying thread needs to increment _M_ver while holding the mutex lock, and the waiting thread needs to re-check after locking the mutex. This avoids a missed notification as described in the PR. By moving the increment of _M_ver to the base _M_notify we can make the use of the mutex local to the use of the condition variable, and simplify the code a little. We can use a relaxed store because the mutex already provides sequential consistency. Also we don't need to check whether __addr == &_M_ver because we know that's always true for platforms that use a condition variable, and so we also know that we always need to use notify_all() not notify_one(). Reviewed-by: Thomas Rodgers <trodgers@redhat.com> libstdc++-v3/ChangeLog: PR libstdc++/106183 * include/bits/atomic_wait.h (__waiter_pool_base::_M_notify): Move increment of _M_ver here. [!_GLIBCXX_HAVE_PLATFORM_WAIT]: Lock mutex around increment. Use relaxed memory order and always notify all waiters. (__waiter_base::_M_do_wait) [!_GLIBCXX_HAVE_PLATFORM_WAIT]: Check value again after locking mutex. (__waiter_base::_M_notify): Remove increment of _M_ver.
2022-08-04	Adjust index number of tuple pretty printer	Ulrich Drepper
	The tuple pretty printer uses 1-based indeces which is quite confusing considering the access to the same values with the std::get functions uses 0-based indeces. This patch changes the pretty printer since this is not a guaranteed API. libstdc++-v3/ChangeLog: * python/libstdcxx/v6/printers.py (class StdTuplePrinter): Use zero-based indeces just like std:get takes.
2022-08-04	PR106342 - IBM zSystems: Provide vsel for all vector modes	Ilya Leoshkevich
	dg.exp=pr104612.c fails with an ICE on s390x, because copysignv2sf3 produces an insn that vsel<mode> is supposed to recognize, but can't, because it's not defined for V2SF. Fix by defining it for all vector modes supported by copysign<mode>3. gcc/ChangeLog: * config/s390/vector.md (V_HW_FT): New iterator. * config/s390/vx-builtins.md (vsel<mode>): Use V_HW_FT instead of V_HW.
2022-08-04	Daily bump.	GCC Administrator

2022-08-03	Do not enable -mblock-ops-vector-pair.	Michael Meissner
	Testing has shown that using the load vector pair and store vector pair instructions for block moves has some performance issues on power10. A patch on June 11th modified the code so that GCC would not set -mblock-ops-vector-pair by default if we are tuning for power10, but it would set the option if we were tuning for a different machine and have load and store vector pair instructions enabled. This patch eliminates the code setting -mblock-ops-vector-pair. If you want to generate load vector pair and store vector pair instructions for block moves, you must use -mblock-ops-vector-pair. 2022-08-03 Michael Meissner <meissner@linux.ibm.com> gcc/ * config/rs6000/rs6000.cc (rs6000_option_override_internal): Remove code setting -mblock-ops-vector-pair.
2022-08-03	Do not walk equivalence set in path_oracle::killing_def.	Andrew MacLeod
	When killing a def in the path ranger, there is no need to walk the set of existing equivalences clearing bits. An equivalence match requires that both ssa-names have to be in each others set. As killing_def creates a new empty set contianing only the current def, it already ensures false equivaelnces won't happen. PR tree-optimization/106514 * value-relation.cc (path_oracle::killing_def) Do not walk the equivalence set clearing bits.
2022-08-03	testsuite: btf: fix regexps in btf-int-1.c	Jose E. Marchesi
	The regexps in hte test btf-int-1.c were not working properly with the commenting style of at least one target: powerpc64le-linux-gnu. This patch changes the test to use better regexps. Tested in bpf-unkonwn-none, x86_64-linux-gnu and powerpc64le-linux-gnu. Pushed to master as obvious. gcc/testsuite/ChangeLog: PR testsuite/106515 * gcc.dg/debug/btf/btf-int-1.c: Fix regexps in scan-assembler-times.
2022-08-03	middle-end: Support recognition of three-way max/min.	Tamar Christina
	This patch adds support for three-way min/max recognition in phi-opts. Concretely for e.g. #include <stdint.h> uint8_t three_min (uint8_t xc, uint8_t xm, uint8_t xy) { uint8_t xk; if (xc < xm) { xk = (uint8_t) (xc < xy ? xc : xy); } else { xk = (uint8_t) (xm < xy ? xm : xy); } return xk; } we generate: <bb 2> [local count: 1073741824]: _5 = MIN_EXPR <xc_1(D), xy_3(D)>; _7 = MIN_EXPR <xm_2(D), _5>; return _7; instead of <bb 2>: if (xc_2(D) < xm_3(D)) goto <bb 3>; else goto <bb 4>; <bb 3>: xk_5 = MIN_EXPR <xc_2(D), xy_4(D)>; goto <bb 5>; <bb 4>: xk_6 = MIN_EXPR <xm_3(D), xy_4(D)>; <bb 5>: # xk_1 = PHI <xk_5(3), xk_6(4)> return xk_1; The same function also immediately deals with turning a minimization problem into a maximization one if the results are inverted. We do this here since doing it in match.pd would end up changing the shape of the BBs and adding additional instructions which would prevent various optimizations from working. gcc/ChangeLog: * tree-ssa-phiopt.cc (minmax_replacement): Optionally search for the phi sequence of a three-way conditional. (replace_phi_edge_with_variable): Support diamonds. (tree_ssa_phiopt_worker): Detect diamond phi structure for three-way min/max. (strip_bit_not, invert_minmax_code): New. gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/split-path-1.c: Disable phi-opts so we don't optimize code away. * gcc.dg/tree-ssa/minmax-10.c: New test. * gcc.dg/tree-ssa/minmax-11.c: New test. * gcc.dg/tree-ssa/minmax-12.c: New test. * gcc.dg/tree-ssa/minmax-13.c: New test. * gcc.dg/tree-ssa/minmax-14.c: New test. * gcc.dg/tree-ssa/minmax-15.c: New test. * gcc.dg/tree-ssa/minmax-16.c: New test. * gcc.dg/tree-ssa/minmax-3.c: New test. * gcc.dg/tree-ssa/minmax-4.c: New test. * gcc.dg/tree-ssa/minmax-5.c: New test. * gcc.dg/tree-ssa/minmax-6.c: New test. * gcc.dg/tree-ssa/minmax-7.c: New test. * gcc.dg/tree-ssa/minmax-8.c: New test. * gcc.dg/tree-ssa/minmax-9.c: New test.
2022-08-03	d: Merge upstream dmd d7772a2369, phobos 5748ca43f.	Iain Buclaw
	In upstream dmd, the compiler front-end and run-time have been merged together into one repository. Both dmd and libdruntime now track that. D front-end changes: - Deprecated `scope(failure)' blocks that contain `return' statements. - Deprecated using integers for `version' or `debug' conditions. - Deprecated returning a discarded void value from a function. - `new' can now allocate an associative array. D runtime changes: - Added avx512f detection to core.cpuid module. Phobos changes: - Changed std.experimental.logger.core.sharedLog to return shared(Logger). gcc/d/ChangeLog: * dmd/MERGE: Merge upstream dmd d7772a2369. * dmd/VERSION: Bump version to v2.100.1. * d-codegen.cc (get_frameinfo): Check whether decision to generate closure changed since semantic finished. * d-lang.cc (d_handle_option): Remove handling of -fdebug=level and -fversion=level. * decl.cc (DeclVisitor::visit (VarDeclaration )): Generate evaluation of noreturn variable initializers before throw. expr.cc (ExprVisitor::visit (AssignExp )): Don't generate assignment for noreturn types, only evaluate for side effects. lang.opt (fdebug=): Undocument -fdebug=level. (fversion=): Undocument -fversion=level. libphobos/ChangeLog: * configure: Regenerate. * configure.ac (libtool_VERSION): Update to 4:0:0. * libdruntime/MERGE: Merge upstream druntime d7772a2369. * libdruntime/Makefile.am (DRUNTIME_DSOURCES): Add core/internal/array/duplication.d. * libdruntime/Makefile.in: Regenerate. * src/MERGE: Merge upstream phobos 5748ca43f. * testsuite/libphobos.gc/nocollect.d:
2022-08-03	cselib: add function to check if SET is redundant [PR106187]	Richard Earnshaw
	A SET operation that writes memory may have the same value as an earlier store but if the alias sets of the new and earlier store do not conflict then the set is not truly redundant. This can happen, for example, if objects of different types share a stack slot. To fix this we define a new function in cselib that first checks for equality and if that is successful then finds the earlier store in the value history and checks the alias sets. The routine is used in two places elsewhere in the compiler: cfgcleanup and postreload. gcc/ChangeLog: PR rtl-optimization/106187 * alias.h (mems_same_for_tbaa_p): Declare. * alias.cc (mems_same_for_tbaa_p): New function. * dse.cc (record_store): Use it instead of open-coding alias check. * cselib.h (cselib_redundant_set_p): Declare. * cselib.cc: Include alias.h (cselib_redundant_set_p): New function. * cfgcleanup.cc: (mark_effect): Use cselib_redundant_set_p instead of rtx_equal_for_cselib_p. * postreload.cc (reload_cse_simplify): Use cselib_redundant_set_p. (reload_cse_noop_set_p): Delete.
2022-08-03	gcov-dump: add --stable option	Martin Liska
	The option prints TOP N counters in a stable format usage for comparison (diff). gcc/ChangeLog: * doc/gcov-dump.texi: Document the new option. * gcov-dump.cc (main): Parse the new option. (print_usage): Show the option. (tag_counters): Sort key:value pairs of TOP N counter.
2022-08-03	profile: do not collect stats unless TDF_DETAILS	Martin Liska
	gcc/ChangeLog: * profile.cc (compute_branch_probabilities): Do not collect stats unless TDF_DETAILS.
2022-08-03	PR target/47949: Use xchg to move from/to AX_REG with -Oz on x86.	Roger Sayle
	This patch adds a peephole2 to i386.md to implement the suggestion in PR target/47949, of using xchg instead of mov for moving values to/from the %rax/%eax register, controlled by -Oz, as the xchg instruction is one byte shorter than the move it is replacing. The new test case is taken from the PR: int foo(int x) { return x; } where previously we'd generate: foo: mov %edi,%eax // 2 bytes ret but with this patch, using -Oz, we generate: foo: xchg %eax,%edi // 1 byte ret On the CSiBE benchmark, this saves a total of 10238 bytes (reducing the -Oz total from 3661796 bytes to 3651558 bytes, a 0.28% saving). Interestingly, some modern architectures (such as Zen 3) implement xchg using zero latency register renaming (just like mov), so in theory this transformation could be enabled when optimizing for speed, if benchmarking shows the improved code density produces consistently better performance. However, this is architecture dependent, and there may be interactions using xchg (instead a single_set) in the late RTL passes (such as cprop_hardreg), so for now I've restricted this to -Oz. 2022-08-03 Roger Sayle <roger@nextmovesoftware.com> Uroš Bizjak <ubizjak@gmail.com> gcc/ChangeLog PR target/47949 * config/i386/i386.md (peephole2): New peephole2 to convert SWI48 moves to/from %rax/%eax where the src is dead to xchg, when optimizing for minimal size with -Oz. gcc/testsuite/ChangeLog PR target/47949 * gcc.target/i386/pr47949.c: New test case.
2022-08-03	Improved pre-reload split of double word comparison against -1 on x86.	Roger Sayle
	This patch adds an extra optimization to cmp<dwi>_doubleword to improve the code generated for comparisons against -1. Hypothetically, if a comparison against -1 reached this splitter we'd currently generate code that looks like: notq %rdx ; 3 bytes notq %rax ; 3 bytes orq %rdx, %rax ; 3 bytes setne %al With this patch we would instead generate the superior: andq %rdx, %rax ; 3 bytes cmpq $-1, %rax ; 4 bytes setne %al which is both faster and smaller, and also what's currently generated thanks to the middle-end splitting double word comparisons against zero and minus one during RTL expansion. Should that change, this would become a missed-optimization regression, but this patch also (potentially) helps suitable comparisons created by CSE and combine. 2022-08-03 Roger Sayle <roger@nextmovesoftware.com> gcc/ChangeLog config/i386/i386.md (*cmp<dwi>_doubleword): Add a special case to split comparisons against -1 using AND and CMP -1 instructions.
2022-08-03	Support logical shifts by (some) integer constants in TImode STV on x86_64.	Roger Sayle
	This patch improves TImode STV by adding support for logical shifts by integer constants that are multiples of 8. For the test case: unsigned __int128 a, b; void foo() { a = b << 16; } on x86_64, gcc -O2 currently generates: movq b(%rip), %rax movq b+8(%rip), %rdx shldq $16, %rax, %rdx salq $16, %rax movq %rax, a(%rip) movq %rdx, a+8(%rip) ret with this patch we now generate: movdqa b(%rip), %xmm0 pslldq $2, %xmm0 movaps %xmm0, a(%rip) ret 2022-08-03 Roger Sayle <roger@nextmovesoftware.com> gcc/ChangeLog * config/i386/i386-features.cc (compute_convert_gain): Add gain for converting suitable TImode shift to a V1TImode shift. (timode_scalar_chain::convert_insn): Add support for converting suitable ASHIFT and LSHIFTRT. (timode_scalar_to_vector_candidate_p): Consider logical shifts by integer constants that are multiples of 8 to be candidates. gcc/testsuite/ChangeLog * gcc.target/i386/sse4_1-stv-7.c: New test case.
2022-08-03	Some additional zero-extension related optimizations in simplify-rtx.	Roger Sayle
	This patch implements some additional zero-extension and sign-extension related optimizations in simplify-rtx.cc. The original motivation comes from PR rtl-optimization/71775, where in comment #2 Andrew Pinksi sees: Failed to match this instruction: (set (reg:DI 88 [ _1 ]) (sign_extend:DI (subreg:SI (ctz:DI (reg/v:DI 86 [ x ])) 0))) On many platforms the result of DImode CTZ is constrained to be a small unsigned integer (between 0 and 64), hence the truncation to 32-bits (using a SUBREG) and the following sign extension back to 64-bits are effectively a no-op, so the above should ideally (often) be simplified to "(set (reg:DI 88) (ctz:DI (reg/v:DI 86 [ x ]))". To implement this, and some closely related transformations, we build upon the existing val_signbit_known_clear_p predicate. In the first chunk, nonzero_bits knows that FFS and ABS can't leave the sign-bit bit set, so the simplification of of ABS (ABS (x)) and ABS (FFS (x)) can itself be simplified. The second transformation is that we can canonicalized SIGN_EXTEND to ZERO_EXTEND (as in the PR 71775 case above) when the operand's sign-bit is known to be clear. The final two chunks are for SIGN_EXTEND of a truncating SUBREG, and ZERO_EXTEND of a truncating SUBREG respectively. The nonzero_bits of a truncating SUBREG pessimistically thinks that the upper bits may have an arbitrary value (by taking the SUBREG), so we need look deeper at the SUBREG's operand to confirm that the high bits are known to be zero. Unfortunately, for PR rtl-optimization/71775, ctz:DI on x86_64 with default architecture options is undefined at zero, so we can't be sure the upper bits of reg:DI 88 will be sign extended (all zeros or all ones). nonzero_bits knows this, so the above transformations don't trigger, but the transformations themselves are perfectly valid for other operations such as FFS, POPCOUNT and PARITY, and on other targets/-march settings where CTZ is defined at zero. 2022-08-03 Roger Sayle <roger@nextmovesoftware.com> Segher Boessenkool <segher@kernel.crashing.org> Richard Sandiford <richard.sandiford@arm.com> gcc/ChangeLog * simplify-rtx.cc (simplify_unary_operation_1) <ABS>: Add optimizations for CLRSB, PARITY, POPCOUNT, SS_ABS and LSHIFTRT that are all positive to complement the existing FFS and idempotent ABS simplifications. <SIGN_EXTEND>: Canonicalize SIGN_EXTEND to ZERO_EXTEND when val_signbit_known_clear_p is true of the operand. Simplify sign extensions of SUBREG truncations of operands that are already suitably (zero) extended. <ZERO_EXTEND>: Simplify zero extensions of SUBREG truncations of operands that are already suitably zero extended.
2022-08-03	Daily bump.	GCC Administrator

2022-08-02	Do not register edges for statements not understood.	Andrew MacLeod
	Previously, all gimple_cond types were undserstoof, with float values, this is no longer true. We should gracefully do nothing if the gcond type is not supported. PR tree-optimization/106510 gcc/ * gimple-range-fold.cc (fur_source::register_outgoing_edges): Check for unsupported statements early. gcc/testsuite * gcc.dg/pr106510.c: New.
2022-08-02	Adjust testsuite/gcc.dg/tree-ssa/vrp-float-1.c	Aldy Hernandez
	I missed the -details dump flag, plus I wasn't checking the actual folding. As a bonus I had flipped the dump file name and the count, so the test was coming out as unresolved, which I missed because I was only checking for failures and passes. Whooops. gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/vrp-float-1.c: Adjust test so it passes.
2022-08-02	Check equivalencies when calculating range on entry.	Andrew MacLeod
	When propagating on-entry values in the cache, checking if any equivalence has a known value can improve results. No new calculations are made. Only queries via dominators which do not populate the cache are checked. PR tree-optimization/106474 gcc/ * gimple-range-cache.cc (ranger_cache::fill_block_cache): Query range of equivalences that may contribute to the range. gcc/testsuite/ * g++.dg/pr106474.C: New.
2022-08-02	btf: do not use the CHAR `encoding' bit for BTF	Jose E. Marchesi
	Contrary to CTF and our previous expectations, as per [1], turns out that in BTF: 1) The `encoding' field in integer types shall not be treated as a bitmap, but as an enumerated, i.e. these bits are exclusive to each other. 2) The CHAR bit in `encoding' shall _not_ be set when emitting types for char nor `unsigned char'. Consequently this patch clears the CHAR bit before emitting the variable part of BTF integral types. It also updates the testsuite accordingly, expanding it to check for BOOL bits. [1] https://lore.kernel.org/bpf/a73586ad-f2dc-0401-1eba-2004357b7edf@fb.com/T/#t gcc/ChangeLog: * btfout.cc (output_asm_btf_vlen_bytes): Do not use the CHAR encoding bit in BTF. gcc/testsuite/ChangeLog: * gcc.dg/debug/btf/btf-int-1.c: Do not check for char bits in bti_encoding and check for bool bits.
2022-08-02	analyzer: support for creat, dup, dup2 and dup3 [PR106298]	Immad Mir
	This patch extends the state machine in sm-fd.cc to support creat, dup, dup2 and dup3 functions. Lightly tested on x86_64 Linux. gcc/analyzer/ChangeLog: PR analyzer/106298 * sm-fd.cc (fd_state_machine::on_open): Add creat, dup, dup2 and dup3 functions. (enum dup): New. (fd_state_machine::valid_to_unchecked_state): New. (fd_state_machine::on_creat): New. (fd_state_machine::on_dup): New. gcc/testsuite/ChangeLog: PR analyzer/106298 * gcc.dg/analyzer/fd-1.c: Add tests for 'creat'. * gcc.dg/analyzer/fd-2.c: Likewise. * gcc.dg/analyzer/fd-4.c: Likewise. * gcc.dg/analyzer/fd-dup-1.c: New tests. Signed-off-by: Immad Mir <mirimmad@outlook.com>
2022-08-02	Make range_of_ssa_name_with_loop_info type agnostic.	Aldy Hernandez
	gcc/ChangeLog: * gimple-range-fold.cc (fold_using_range::range_of_phi): Remove irange check. (tree_lower_bound): New. (tree_upper_bound): New. (fold_using_range::range_of_ssa_name_with_loop_info): Convert to vrange. * gimple-range-fold.h (range_of_ssa_name_with_loop_info): Change argument to vrange.
2022-08-02	Properly honor param_max_fsm_thread_path_insns in backwards threader	Richard Biener
	I am trying to make sense of back_threader_profitability::profitable_path_p and the first thing I notice is that we do /* Threading is profitable if the path duplicated is hot but also in a case we separate cold path from hot path and permit optimization of the hot path later. Be on the agressive side here. In some testcases, as in PR 78407 this leads to noticeable improvements. / if (m_speed_p && ((taken_edge && optimize_edge_for_speed_p (taken_edge)) \|\| contains_hot_bb)) { if (n_insns >= param_max_fsm_thread_path_insns) { if (dump_file && (dump_flags & TDF_DETAILS)) fprintf (dump_file, " FAIL: Jump-thread path not considered: " "the number of instructions on the path " "exceeds PARAM_MAX_FSM_THREAD_PATH_INSNS.\n"); return false; } ... } else if (!m_speed_p && n_insns > 1) { if (dump_file && (dump_flags & TDF_DETAILS)) fprintf (dump_file, " FAIL: Jump-thread path not considered: " "duplication of %i insns is needed and optimizing for size.\n", n_insns); return false; } ... return true; thus we apply the n_insns >= param_max_fsm_thread_path_insns only to "hot paths". The comment above this isn't entirely clear whether this is by design ("Be on the aggressive side here ...") but I think this is a mistake. In fact the "hot path" check seems entirely useless since if the path is not hot we simply continue threading it. This was caused by r12-324-g69e5544210e3c0 and the following simply reverts the offending change. tree-ssa-threadbackward.cc (back_threader_profitability::profitable_path_p): Apply size constraints to all paths again.
2022-08-02	Implement basic range operators to enable floating point VRP.	Aldy Hernandez
	Without further ado, here is the implementation for floating point range operators, plus the switch to enable all ranger clients to handle floats. These are bare bone implementations good enough for relation operators to work, while keeping the NAN bits up to date in the frange. There is also minimal support for keeping track of +-INF when it is obvious. Tested on x86-64 Linux. gcc/ChangeLog: * range-op-float.cc (finite_operands_p): New. (frelop_early_resolve): New. (default_frelop_fold_range): New. (class foperator_equal): New. (class foperator_not_equal): New. (class foperator_lt): New. (class foperator_le): New. (class foperator_gt): New. (class foperator_ge): New. (class foperator_unordered): New. (class foperator_ordered): New. (class foperator_relop_unknown): New. (floating_op_table::floating_op_table): Add above classes to floating op table. * value-range.h (frange::supports_p): Enable. gcc/testsuite/ChangeLog: * g++.dg/opt/pr94589-2.C: XFAIL. * gcc.dg/tree-ssa/vrp-float-1.c: New test. * gcc.dg/tree-ssa/vrp-float-11.c: New test. * gcc.dg/tree-ssa/vrp-float-3.c: New test. * gcc.dg/tree-ssa/vrp-float-4.c: New test. * gcc.dg/tree-ssa/vrp-float-6.c: New test. * gcc.dg/tree-ssa/vrp-float-7.c: New test. * gcc.dg/tree-ssa/vrp-float-8.c: New test.
2022-08-02	Implement streamer for frange.	Aldy Hernandez
	This patch Allows us to export floating point ranges into the SSA name (SSA_NAME_RANGE_INFO). [Richi, in PR24021 you suggested that match.pd could use global float ranges, because it would generally not invoke ranger. This patch implements the boiler plate to save the frange globally.] [Jeff, we've also been talking in parallel of using NAN knowledge during expansion to RTL. This patch will provide the NAN bits in the SSA name.] Since frange's currently implementation is just a shell, with no actual endpoints, frange_storage_slot only contains frange_props which fits inside a byte. When we have endpoints, y'all can decide if it's worth saving them, or if the NAN/etc bits are good enough. gcc/ChangeLog: * tree-core.h (struct tree_ssa_name): Add frange_info and reshuffle the rest. * value-range-storage.cc (vrange_storage::alloc_slot): Add case for frange. (vrange_storage::set_vrange): Same. (vrange_storage::get_vrange): Same. (vrange_storage::fits_p): Same. (frange_storage_slot::alloc_slot): New. (frange_storage_slot::set_frange): New. (frange_storage_slot::get_frange): New. (frange_storage_slot::fits_p): New. * value-range-storage.h (class frange_storage_slot): New.
2022-08-02	Limit ranger query in ipa-prop.cc to integrals.	Aldy Hernandez
	ipa-* still works on legacy value_range's which only support integrals. This patch limits the query to integrals, as to not get a floating point range that can't exist in an irange. gcc/ChangeLog: * ipa-prop.cc (ipa_compute_jump_functions_for_edge): Limit ranger query to integrals.