gcc.git - Linaro gcc mirror (git://gcc.gnu.org/git/gcc.git) + linaro-local branches

Age	Commit message (Collapse)	Author
2023-07-27	RISC-V: splitter to generate high bit set for -0.0devel/vineetg/optim-double-const-m0	Vineet Gupta
	Signed-off-by: Vineet Gupta <vineetg@rivosinc.com>
2023-07-27	RISC-V: Allow later passes to recog() (set mem const_double -0.0)	Vineet Gupta
	Signed-off-by: Vineet Gupta <vineetg@rivosinc.com>
2023-07-27	RISC-V: Allow Expand to generate (set mem const_double -0.0)	Vineet Gupta
	\| (insn 6 3 0 2 (set (mem:DF (reg/v/f:DI 134 [ d ]) [1 *d_2(D)+0 S8 A64]) \| (const_double:DF -0.0 [-0x0.0p+0])) "neg.c":3:5 -1 The first change is to adjust rtx cost of -0.0 to prevent generic expand code from forcing const to literal pool emit_move_insn compress_float_constant targetm.legitimate_constant_p riscv_legitimate_constant_p riscv_const_insns second change ensures riscv_legitimize_move () doesn't force_reg () const early. Signed-off-by: Vineet Gupta <vineetg@rivosinc.com>
2023-07-27	RISC-V: add test case for store -0.0	Vineet Gupta
	Signed-off-by: Vineet Gupta <vineetg@rivosinc.com>
2023-07-27	RISC-V: add comment for TARGET_CANNOT_FORCE_CONST_MEM	Vineet Gupta
	Signed-off-by: Vineet Gupta <vineetg@rivosinc.com>
2023-07-27	RISC-V: rename constraint for DF +0.0 G to G0p	Vineet Gupta
	Signed-off-by: Vineet Gupta <vineetg@rivosinc.com>
2023-07-27	RISC-V: optim const DF +0.0 store to mem [PR/110748]	Vineet Gupta
	DF +0.0 is bitwise all zeros so int x0 store to mem can be used to optimize it. void zd(double ) { d = 0.0; } currently: \| fmv.d.x fa5,zero \| fsd fa5,0(a0) \| ret With patch \| sd zero,0(a0) \| ret Apparently this is a regression in gcc-13, introduced by commit ef85d150b5963 ("RISC-V: Enable TARGET_SUPPORTS_WIDE_INT") and the fix thus is a partial revert of that change. Ran thru full multilib testsuite and no regressions: gcc/Changelog: * config/riscv/predicates.md (const_0_operand): Add back const_double. gcc/testsuite/ChangeLog: * gcc.target/riscv/pr110748-1.c: New Test. Signed-off-by: Vineet Gupta <vineetg@rivosinc.com>
2023-07-27	Implement new RTL optimizations pass: fold-mem-offsets.	Manolis Tsamis
	This is a new RTL pass that tries to optimize memory offset calculations by moving them from add immediate instructions to the memory loads/stores. For example it can transform this: addi t4,sp,16 add t2,a6,t4 shl t3,t2,1 ld a2,0(t3) addi a2,1 sd a2,8(t2) into the following (one instruction less): add t2,a6,sp shl t3,t2,1 ld a2,32(t3) addi a2,1 sd a2,24(t2) Although there are places where this is done already, this pass is more powerful and can handle the more difficult cases that are currently not optimized. Also, it runs late enough and can optimize away unnecessary stack pointer calculations. gcc/ChangeLog: * Makefile.in: Add fold-mem-offsets.o. * passes.def: Schedule a new pass. * tree-pass.h (make_pass_fold_mem_offsets): Declare. * common.opt: New options. * doc/invoke.texi: Document new option. * fold-mem-offsets.cc: New file. gcc/testsuite/ChangeLog: * gcc.target/riscv/fold-mem-offsets-1.c: New test. * gcc.target/riscv/fold-mem-offsets-2.c: New test. * gcc.target/riscv/fold-mem-offsets-3.c: New test. Signed-off-by: Manolis Tsamis <manolis.tsamis@vrull.eu>
2023-07-07	Simplify force_edge_cold.	Jan Hubicka
	gcc/ChangeLog: * predict.cc (force_edge_cold): Use set_edge_probability_and_rescale_others; improve dumps.
2023-07-07	Fix some profile consistency testcases	Jan Hubicka
	Information about profile mismatches is printed only with -details-blocks for some time. I think it should be printed even with default to make it easier to spot when someone introduces new transform that breaks the profile, but I will send separate RFC for that. This patch enables details in all testcases that greps for Invalid sum. There are 4 testcases which fails: gcc.dg/tree-ssa/loop-ch-profile-1.c here the problem is that loop header dulication introduces loop invariant conditoinal that is later updated by tree-ssa-dom but dom does not take care of updating profile. Since loop-ch knows when it duplicates loop invariant, we may be able to get this right. The test is still useful since it tests that right after ch profile is consistent. gcc.dg/tree-prof/update-cunroll-2.c This is about profile updating code in duplicate_loop_body_to_header_edge being wrong when optimized out exit is not last in the loop. In that case the probability of later exits needs to be accounted in. I will think about making this better - in general this does not seem to have easy solution, but for special case of chained tests we can definitely account for the later exits. gcc.dg/tree-ssa/update-unroll-1.c This fails after aprefetch invoked unrolling. I did not look into details yet. gcc.dg/tree-prof/update-unroll-2.c This one seems similar as previous I decided to xfail these tests and deal with them incrementally and filled in PR110590. gcc/testsuite/ChangeLog: * g++.dg/tree-prof/indir-call-prof.C: Add block-details to dump flags. * gcc.dg/pr43864-2.c: Likewise. * gcc.dg/pr43864-3.c: Likewise. * gcc.dg/pr43864-4.c: Likewise. * gcc.dg/pr43864.c: Likewise. * gcc.dg/tree-prof/cold_partition_label.c: Likewise. * gcc.dg/tree-prof/indir-call-prof.c: Likewise. * gcc.dg/tree-prof/update-cunroll-2.c: Likewise. * gcc.dg/tree-prof/update-tailcall.c: Likewise. * gcc.dg/tree-prof/val-prof-1.c: Likewise. * gcc.dg/tree-prof/val-prof-2.c: Likewise. * gcc.dg/tree-prof/val-prof-3.c: Likewise. * gcc.dg/tree-prof/val-prof-4.c: Likewise. * gcc.dg/tree-prof/val-prof-5.c: Likewise. * gcc.dg/tree-ssa/fnsplit-1.c: Likewise. * gcc.dg/tree-ssa/loop-ch-profile-2.c: Likewise. * gcc.dg/tree-ssa/update-threading.c: Likewise. * gcc.dg/tree-ssa/update-unswitch-1.c: Likewise. * gcc.dg/unroll-7.c: Likewise. * gcc.dg/unroll-8.c: Likewise. * gfortran.dg/pr25623-2.f90: Likewise. * gfortran.dg/pr25623.f90: Likewise. * gcc.dg/tree-ssa/loop-ch-profile-1.c: Likewise; xfail. * gcc.dg/tree-ssa/update-cunroll.c: Likewise; xfail. * gcc.dg/tree-ssa/update-unroll-1.c: Likewise; xfail.
2023-07-07	Fix epilogue loop profile	Jan Hubicka
	Fix two bugs in scale_loop_profile which crept in during my cleanups and curiously enoug did not show on the testcases we have so far. The patch also adds the missing call to cap iteration count of the vectorized loop epilogues. Vectorizer profile needs more work, but I am trying to chase out obvious bugs first so the profile quality statistics become meaningful and we can try to improve on them. Now we get: Pass dump id and name \|static mismatcdynamic mismatch \|in count \|in count 107t cunrolli \| 3 +3\| 17251 +17251 116t vrp \| 5 +2\| 30908 +16532 118t dce \| 3 -2\| 17251 -13657 127t ch \| 13 +10\| 17251 131t dom \| 39 +26\| 17251 133t isolate-paths \| 47 +8\| 17251 134t reassoc \| 49 +2\| 17251 136t forwprop \| 53 +4\| 202501 +185250 159t cddce \| 61 +8\| 216211 +13710 161t ldist \| 62 +1\| 216211 172t ifcvt \| 66 +4\| 373711 +157500 173t vect \| 143 +77\| 9801947 +9428236 176t cunroll \| 149 +6\| 12006408 +2204461 183t loopdone \| 146 -3\| 11944469 -61939 195t fre \| 142 -4\| 11944469 197t dom \| 141 -1\| 13038435 +1093966 199t threadfull \| 143 +2\| 13246410 +207975 200t vrp \| 145 +2\| 13444579 +198169 204t dce \| 143 -2\| 13371315 -73264 206t sink \| 141 -2\| 13371315 211t cddce \| 147 +6\| 13372755 +1440 255t optimized \| 145 -2\| 13372755 256r expand \| 141 -4\| 13371197 -1558 258r into_cfglayout \| 139 -2\| 13371197 275r loop2_unroll \| 143 +4\| 16792056 +3420859 291r ce2 \| 141 -2\| 16811462 312r pro_and_epilogue \| 161 +20\| 16873400 +61938 315r jump2 \| 167 +6\| 20910158 +4036758 323r bbro \| 160 -7\| 16559844 -4350314 Vect still introduces 77 profile mismatches (same as without this patch) however subsequent cunroll works much better with 6 new mismatches compared to 78. Overall it reduces 229 mismatches to 160. Also overall runtime estimate is now reduced by 6.9%. Previously the overall runtime estimate grew by 11% which was result of the fat that the epilogue profile was pretty much the same as profile of the original loop. Bootstrapped/regtested x86_64-linux, comitted. gcc/ChangeLog: * cfgloopmanip.cc (scale_loop_profile): Fix computation of count_in and scaling blocks after exit. * tree-vect-loop-manip.cc (vect_do_peeling): Scale loop profile of the epilogue if bound is known. gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/vect-profile-upate.c: New test.
2023-07-07	IBM Z: Fix vec_init default expander	Juergen Christ
	Do not reinitialize vector lanes to zero since they are already initialized to zero. gcc/ChangeLog: * config/s390/s390.cc (vec_init): Fix default case gcc/testsuite/ChangeLog: * gcc.target/s390/vector/vec-init-3.c: New test.
2023-07-07	LRA: Refine reload pseudo class	Vladimir N. Makarov
	For given testcase a reload pseudo happened to occur only in reload insns created on one constraint sub-pass. Therefore its initial class (ALL_REGS) was not refined and the reload insns were not processed on the next constraint sub-passes. This resulted into the wrong insn. PR rtl-optimization/110372 gcc/ChangeLog: * lra-assigns.cc (assign_by_spills): Add reload insns involving reload pseudos with non-refined class to be processed on the next sub-pass. * lra-constraints.cc (enough_allocatable_hard_regs_p): New func. (in_class_p): Use it. (print_curr_insn_alt): New func. (process_alt_operands): Use it. Improve debug info. (curr_insn_transform): Use print_curr_insn_alt. Refine reload pseudo class if it is not refined yet. gcc/testsuite/ChangeLog: * gcc.target/i386/pr110372.c: New.
2023-07-07	A singleton irange has all known bits.	Aldy Hernandez
	gcc/ChangeLog: * value-range.cc (irange::get_bitmask_from_range): Return all the known bits for a singleton. (irange::set_range_from_bitmask): Set a range of a singleton when all bits are known.
2023-07-07	The caller to irange::intersect (wide_int, wide_int) must normalize the range.	Aldy Hernandez
	Per the function comment, the caller to intersect(wide_int, wide_int) must handle the mask. This means it must also normalize the range if anything changed. gcc/ChangeLog: * value-range.cc (irange::intersect): Leave normalization to caller.
2023-07-07	Implement value/mask tracking for irange.	Aldy Hernandez
	Integer ranges (irange) currently track known 0 bits. We've wanted to track known 1 bits for some time, and instead of tracking known 0 and known 1's separately, it has been suggested we track a value/mask pair similarly to what we do for CCP and RTL. This patch implements such a thing. With this we now track a VALUE integer which are the known values, and a MASK which tells us which bits contain meaningful information. This allows us to fix a handful of enhancement requests, such as PR107043 and PR107053. There is a 4.48% performance penalty for VRP and 0.42% in overall compilation for this entire patchset. It is expected and in line with the loss incurred when we started tracking known 0 bits. This patch just provides the value/mask tracking support. All the nonzero users (range-op, IPA, CCP, etc), are still using the nonzero nomenclature. For that matter, this patch reimplements the nonzero accessors with the value/mask functionality. In follow-up patches I will enhance these passes to use the value/mask information, and fix the aforementioned PRs. gcc/ChangeLog: * data-streamer-in.cc (streamer_read_value_range): Adjust for value/mask. * data-streamer-out.cc (streamer_write_vrange): Same. * range-op.cc (operator_cast::fold_range): Same. * value-range-pretty-print.cc (vrange_printer::print_irange_bitmasks): Same. * value-range-storage.cc (irange_storage::write_lengths_address): Same. (irange_storage::set_irange): Same. (irange_storage::get_irange): Same. (irange_storage::size): Same. (irange_storage::dump): Same. * value-range-storage.h: Same. * value-range.cc (debug): New. (irange_bitmask::dump): New. (add_vrange): Adjust for value/mask. (irange::operator=): Same. (irange::set): Same. (irange::verify_range): Same. (irange::operator==): Same. (irange::contains_p): Same. (irange::irange_single_pair_union): Same. (irange::union_): Same. (irange::intersect): Same. (irange::invert): Same. (irange::get_nonzero_bits_from_range): Rename to... (irange::get_bitmask_from_range): ...this. (irange::set_range_from_nonzero_bits): Rename to... (irange::set_range_from_bitmask): ...this. (irange::set_nonzero_bits): Rename to... (irange::update_bitmask): ...this. (irange::get_nonzero_bits): Rename to... (irange::get_bitmask): ...this. (irange::intersect_nonzero_bits): Rename to... (irange::intersect_bitmask): ...this. (irange::union_nonzero_bits): Rename to... (irange::union_bitmask): ...this. (irange_bitmask::verify_mask): New. * value-range.h (class irange_bitmask): New. (irange_bitmask::set_unknown): New. (irange_bitmask::unknown_p): New. (irange_bitmask::irange_bitmask): New. (irange_bitmask::get_precision): New. (irange_bitmask::get_nonzero_bits): New. (irange_bitmask::set_nonzero_bits): New. (irange_bitmask::operator==): New. (irange_bitmask::union_): New. (irange_bitmask::intersect): New. (class irange): Friend vrange_printer. (irange::varying_compatible_p): Adjust for bitmask. (irange::set_varying): Same. (irange::set_nonzero): Same. gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/pr107009.c: Adjust irange dumping for value/mask changes. * gcc.dg/tree-ssa/vrp-unreachable.c: Same. * gcc.dg/tree-ssa/vrp122.c: Same.
2023-07-07	x86: slightly correct / simplify *vec_extractv2ti	Jan Beulich
	V2TImode values cannot appear in the upper 16 YMM registers without AVX512VL being enabled. Therefore forcing 512-bit mode (also not reflected in the "mode" attribute) is pointless. gcc/ * config/i386/sse.md (*vec_extractv2ti): Drop g modifiers.
2023-07-07	x86: correct / simplify @vec_extract_hi_<mode> and vec_extract_hi_v32qi	Jan Beulich
	The middle alternative each was unusable without enabling AVX512DQ (in addition to AVX512VL), which is entirely unrelated here. The last alternative is usable with AVX512VL only (due to type restrictions on what may be put in the upper 16 YMM registers), and hence is pointlessly forcing 512-bit mode (without actually reflecting that in the "mode" attribute). gcc/ * config/i386/sse.md (@vec_extract_hi_<mode>): Drop last alternative. Switch new last alternative's "isa" attribute to "avx512vl". (vec_extract_hi_v32qi): Likewise.
2023-07-07	Closing the GCC 10 branch	Richard Biener
	contrib/ * gcc-changelog/git_update_version.py: Remove GCC 10 from active_refs. maintainer-scripts/ * crontab: Remove entry for GCC 10.
2023-07-07	RISC-V: Fix one bug for floating-point static frm	Pan Li
	This patch would like to fix one bug to align below items of spec. RVV floating-point instructions always (implicitly) use the dynamic rounding mode. This implies that rounding is performed according to the rounding mode set in the FRM register. The FRM register itself only holds proper rounding modes and never the dynamic rounding mode. Signed-off-by: Pan Li <pan2.li@intel.com> Co-Authored-By: Robin Dapp <rdapp@ventanamicro.com> gcc/ChangeLog: * config/riscv/riscv.cc (riscv_emit_mode_set): Avoid emit insn when FRM_MODE_DYN. (riscv_mode_entry): Take FRM_MODE_DYN as entry mode. (riscv_mode_exit): Likewise for exit mode. (riscv_mode_needed): Likewise for needed mode. (riscv_mode_after): Likewise for after mode. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/base/float-point-frm-insert-6.c: New test.
2023-07-07	RISC-V: Fix one typo of FRM dynamic definition	Pan Li
	This patch would like to fix one typo that take rdn instead of dyn by mistake. Signed-off-by: Pan Li <pan2.li@intel.com> gcc/ChangeLog: * config/riscv/vector.md: Fix typo.
2023-07-07	Daily bump.	GCC Administrator

2023-07-06	libstdc++: Fix fwrite error parameter	Tianqiang Shuai
	The first parameter of fwrite should be the const char* __s which want write to FILE __file, rather than the FILE __file write to the FILE __file. libstdc++-v3/ChangeLog: config/io/basic_file_stdio.cc (xwrite) [USE_STDIO_PURE]: Fix first argument.
2023-07-06	Improve profile updates after loop-ch and cunroll	Jan Hubicka
	Extend loop-ch and loop unrolling to fix profile in case the loop is known to not iterate at all (or iterate few times) while profile claims it iterates more. While this is kind of symptomatic fix, it is best we can do incase profile was originally esitmated incorrectly. In the testcase the problematic loop is produced by vectorizer and I think vectorizer should know and account into its costs that vectorizer loop and/or epilogue is not going to loop after the transformation. So it would be nice to fix it on that side, too. The patch avoids about half of profile mismatches caused by cunroll. Pass dump id and name \|static mismatcdynamic mismatch \|in count \|in count 107t cunrolli \| 3 +3\| 17251 +17251 115t threadfull \| 3 \| 14376 -2875 116t vrp \| 5 +2\| 30908 +16532 117t dse \| 5 \| 30908 118t dce \| 3 -2\| 17251 -13657 127t ch \| 13 +10\| 17251 131t dom \| 39 +26\| 17251 133t isolate-paths \| 47 +8\| 17251 134t reassoc \| 49 +2\| 17251 136t forwprop \| 53 +4\| 202501 +185250 159t cddce \| 61 +8\| 216211 +13710 161t ldist \| 62 +1\| 216211 172t ifcvt \| 66 +4\| 373711 +157500 173t vect \| 143 +77\| 9802097 +9428386 176t cunroll \| 221 +78\| 15639591 +5837494 183t loopdone \| 218 -3\| 15577640 -61951 195t fre \| 214 -4\| 15577640 197t dom \| 213 -1\| 16671606 +1093966 199t threadfull \| 215 +2\| 16879581 +207975 200t vrp \| 217 +2\| 17077750 +198169 204t dce \| 215 -2\| 17004486 -73264 206t sink \| 213 -2\| 17004486 211t cddce \| 219 +6\| 17005926 +1440 255t optimized \| 217 -2\| 17005926 256r expand \| 210 -7\| 19571573 +2565647 258r into_cfglayout \| 208 -2\| 19571573 275r loop2_unroll \| 212 +4\| 22992432 +3420859 291r ce2 \| 210 -2\| 23011838 312r pro_and_epilogue \| 230 +20\| 23073776 +61938 315r jump2 \| 236 +6\| 27110534 +4036758 323r bbro \| 229 -7\| 21826835 -5283699 W/o the patch cunroll does: 176t cunroll \| 294 +151\|126548439 +116746342 and we end up with 291 mismatches at bbro. Bootstrapped/regtested x86_64-linux. Plan to commit it after the scale_loop_frequency patch. gcc/ChangeLog: PR middle-end/25623 * tree-ssa-loop-ch.cc (ch_base::copy_headers): Scale loop frequency to maximal number of iterations determined. * tree-ssa-loop-ivcanon.cc (try_unroll_loop_completely): Likewise. gcc/testsuite/ChangeLog: PR middle-end/25623 * gfortran.dg/pr25623-2.f90: New test.
2023-07-06	Improve scale_loop_profile	Jan Hubicka
	Original scale_loop_profile was implemented to only handle very simple loops produced by vectorizer at that time (basically loops with only one exit and no subloops). It also has not been updated to new profile-count API very carefully. The function does two thigs 1) scales down the loop profile by a given probability. This is useful, for example, to scale down profile after peeling when loop body is executed less often than before 2) update profile to cap iteration count by ITERATION_BOUND parameter. I changed ITERATION_BOUND to be actual bound on number of iterations as used elsewhere (i.e. number of executions of latch edge) rather then number of iterations + 1 as it was before. To do 2) one needs to do the following a) scale own loop profile so frquency o header is at most the sum of in-edge counts * (iteration_bound + 1) b) update loop exit probabilities so their count is the same as before scaling. c) reduce frequencies of basic blocks after loop exit old code did b) by setting probability to 1 / iteration_bound which is correctly only of the basic block containing exit executes precisely one per iteration (it is not insie other conditional or inner loop). This is fixed now by using set_edge_probability_and_rescale_others aldo c) was implemented only for special case when the exit was just before latch bacis block. I now use dominance info to get right some of addional case. I still did not try to do anything for multiple exit loops, though the implementatoin could be generalized. Bootstrapped/regtested x86_64-linux. Plan to cmmit it tonight if there are no complains. gcc/ChangeLog: * cfgloopmanip.cc (scale_loop_profile): Rewrite exit edge probability update to be safe on loops with subloops. Make bound parameter to be iteration bound. * tree-ssa-loop-ivcanon.cc (try_peel_loop): Update call of scale_loop_profile. * tree-vect-loop-manip.cc (vect_do_peeling): Likewise.
2023-07-06	Vect: use a small step to calculate induction for the unrolled loop (PR ↵	Hao Liu OS
	tree-optimization/110449) If a loop is unrolled by n times during vectoriation, two steps are used to calculate the induction variable: - The small step for the unrolled ith-copy: vec_1 = vec_iv + (VF/n * Step) - The large step for the whole loop: vec_loop = vec_iv + (VF * Step) This patch calculates an extra vec_n to replace vec_loop: vec_n = vec_prev + (VF/n * S) = vec_iv + (VF/n * S) * n = vec_loop. So that we can save the large step register and related operations. gcc/ChangeLog: PR tree-optimization/110449 * tree-vect-loop.cc (vectorizable_induction): use vec_n to replace vec_loop for the unrolled loop. gcc/testsuite/ChangeLog: * gcc.target/aarch64/pr110449.c: New testcase.
2023-07-06	libstdc++: Document --enable-cstdio=stdio_pure [PR110574]	Jonathan Wakely
	libstdc++-v3/ChangeLog: PR libstdc++/110574 * doc/xml/manual/configure.xml: Describe stdio_pure argument to --enable-cstdio. * doc/html/manual/configure.html: Regenerate.
2023-07-06	updat_bb_profile_for_threading TLC	Jan Hubicka
	Apply some TLC to update_bb_profile_for_threading. The function resales probabilities by: FOR_EACH_EDGE (c, ei, bb->succs) c->probability /= prob; which is correct but in case prob is 0 (took all execution counts to the newly constructed path), this leads to undefined results which do not sum to 100%. In several other plpaces we need to change probability of one edge and rescale remaining to sum to 100% so I decided to break this off to helper function set_edge_probability_and_rescale_others For jump threading the probability of edge is always reduced, so division is right update, however in general case we also may want to increase probability of the edge which needs different scalling. This is bit hard to do staying with probabilities in range 0...1 for all temporaries. For this reason I decided to add profile_probability::apply_scale which is symmetric to what we already have in profile_count::apply_scale and does right thing in both directions. Finally I added few early exits so we do not produce confused dumps when profile is missing and special case the common situation where edges out of BB are precisely two. In this case we can set the other edge to inverter probability which. Saling drop probability quality from PRECISE to ADJUSTED. Bootstrapped/regtested x86_64-linux. The patch has no effect on in count mismatches in tramp3d build and improves out-count. Will commit it shortly. gcc/ChangeLog: * cfg.cc (set_edge_probability_and_rescale_others): New function. (update_bb_profile_for_threading): Use it; simplify the rest. * cfg.h (set_edge_probability_and_rescale_others): Declare. * profile-count.h (profile_probability::apply_scale): New.
2023-07-06	arc: Update builtin documentation	Claudiu Zissulescu
	gcc/ChangeLog: * doc/extend.texi (ARC Built-in Functions): Update documentation with missing builtins.
2023-07-06	tree-optimization/110556 - tail merging still pre-tuples	Richard Biener
	The stmt comparison function for GIMPLE_ASSIGNs for tail merging still looks like it deals with pre-tuples IL. The following attempts to fix this, not only comparing the first operand (sic!) of stmts but all of them plus also compare the operation code. PR tree-optimization/110556 * tree-ssa-tail-merge.cc (gimple_equal_p): Check assign code and all operands of non-stores. * gcc.dg/torture/pr110556.c: New testcase.
2023-07-06	ada: Add specification source files of runtime units	Claire Dross
	gcc/ada/ * gcc-interface/Make-lang.in: Add object files of specification files.
2023-07-06	ada: Refactor the proof of the Value and Image runtime units	Claire Dross
	The aim of this refactoring is to avoid unnecessary dependencies between Image and Value units even though they share the same specification functions. These functions are grouped inside ghost packages which are then withed by Image and Value units. gcc/ada/ * libgnat/s-vs_int.ads: Instance of Value_I_Spec for Integer. * libgnat/s-vs_lli.ads: Instance of Value_I_Spec for Long_Long_Integer. * libgnat/s-vsllli.ads: Instance of Value_I_Spec for Long_Long_Long_Integer. * libgnat/s-vs_uns.ads: Instance of Value_U_Spec for Unsigned. * libgnat/s-vs_llu.ads: Instance of Value_U_Spec for Long_Long_Unsigned. * libgnat/s-vslllu.ads: Instance of Value_U_Spec for Long_Long_Long_Unsigned. * libgnat/s-imagei.ads: Take instances of Value__Spec as parameters. libgnat/s-imagei.adb: Idem. * libgnat/s-imageu.ads: Idem. * libgnat/s-imageu.adb: Idem. * libgnat/s-valuei.ads: Idem. * libgnat/s-valuei.adb: Idem. * libgnat/s-valueu.ads: Idem. * libgnat/s-valueu.adb: Idem. * libgnat/s-imgint.ads: Adapt instance to new ghost parameters. * libgnat/s-imglli.ads: Adapt instance to new ghost parameters. * libgnat/s-imgllli.ads: Adapt instance to new ghost parameters. * libgnat/s-imglllu.ads: Adapt instance to new ghost parameters. * libgnat/s-imgllu.ads: Adapt instance to new ghost parameters. * libgnat/s-imguns.ads: Adapt instance to new ghost parameters. * libgnat/s-valint.ads: Adapt instance to new ghost parameters. * libgnat/s-vallli.ads: Adapt instance to new ghost parameters. * libgnat/s-valllli.ads: Adapt instance to new ghost parameters. * libgnat/s-vallllu.ads: Adapt instance to new ghost parameters. * libgnat/s-valllu.ads: Adapt instance to new ghost parameters. * libgnat/s-valuns.ads: Adapt instance to new ghost parameters. * libgnat/s-vaispe.ads: Take instance of Value_U_Spec as parameter and remove unused declaration. * libgnat/s-vaispe.adb: Idem. * libgnat/s-vauspe.ads: Remove unused declaration. * libgnat/s-valspe.ads: Factor out the specification part of Val_Util. * libgnat/s-valspe.adb: Idem. * libgnat/s-valuti.ads: Move specification to Val_Spec. * libgnat/s-valuti.adb: Idem. * libgnat/s-valboo.ads: Use Val_Spec. * libgnat/s-valboo.adb: Idem. * libgnat/s-imgboo.adb: Idem. * libgnat/s-imagef.adb: Adapt instances to new ghost parameters. * Makefile.rtl: List new files.
2023-07-06	ada: Evaluate static expressions in Range attributes	Viljar Indus
	Gigi assumes that the value of range expressions is an integer literal. Force evaluation of such expressions since static non-literal expressions are not always evaluated to a literal form by gnat. gcc/ada/ * sem_attr.adb (analyze_attribute.check_array_type): Replace valid indexes with their staticly evaluated values.
2023-07-06	ada: Refer to non-Ada binding limitations in user guide	Viljar Indus
	The limitation of resetting the FPU mode for non 80-bit precision was not referenced from "Creating a Stand-alone Library to be used in a non-Ada context". Reference it the same way it is already referenced from "Interfacing to C". gcc/ada/ * doc/gnat_ugn/the_gnat_compilation_model.rst: Reference "Binding with Non-Ada Main Programs" from "Creating a Stand-alone Library to be used in a non-Ada context". * gnat_ugn.texi: Regenerate.
2023-07-06	ada: Reuse code in Is_Fully_Initialized_Type	Viljar Indus
	gcc/ada/ * sem_util.adb (Is_Fully_Initialized_Type): Avoid recalculating the underlying type twice.
2023-07-06	ada: Avoid crash in Find_Optional_Prim_Op	Viljar Indus
	Find_Optional_Prim_Op can crash when the Underlying_Type is Empty. This can happen when you are dealing with a structure type with a private part that does not have its Full_View set yet. gcc/ada/ * exp_util.adb (Find_Optional_Prim_Op): Stop deriving primitive operation if there is no underlying type to derive it from.
2023-07-06	ada: Improve error message on violation of SPARK_Mode rules	Yannick Moy
	SPARK_Mode On can only be used on library-level entities. Improve the error message here. gcc/ada/ * errout.ads: Add explain code. * sem_prag.adb (Check_Library_Level_Entity): Refine error message and add explain code.
2023-07-06	ada: Finalization not performed for component of protected type	Steve Baird
	In some cases involving a discriminated protected type with an array component that is subject to a discriminant-dependent index constraint, where the element type of the array requires finalization and the array type has not yet been frozen at the point of the declaration of the protected type, finalization of an object of the protected type may incorrectly omit finalization of the array component. One case where this scenario can arise is an instantiation of Ada.Containers.Bounded_Synchronized_Queues, passing in an Element type that requires finalization. gcc/ada/ * exp_ch7.adb (Make_Final_Call): Add assertion that if no finalization call is generated, then the type of the object being finalized does not require finalization. * freeze.adb (Freeze_Entity): If freezing an already-frozen subtype, do not assume that nothing needs to be done. In the case of a frozen subtype of a non-frozen type or subtype (which is possible), freeze the non-frozen entity.
2023-07-06	tree-optimization/110563 - simplify epilogue VF checks	Richard Biener
	The following consolidates an assert that now hits for ppc64le with an earlier check we already do, simplifying vect_determine_partial_vectors_and_peeling and getting rid of its now redundant argument. PR tree-optimization/110563 * tree-vectorizer.h (vect_determine_partial_vectors_and_peeling): Remove second argument. * tree-vect-loop.cc (vect_determine_partial_vectors_and_peeling): Remove for_epilogue_p argument. Merge assert ... (vect_analyze_loop_2): ... with check done before determining partial vectors by moving it after. * tree-vect-loop-manip.cc (vect_do_peeling): Adjust.
2023-07-06	GGC, GTY: Tighten up a few things re 'reorder' option and strings	Thomas Schwinge
	..., which doesn't make sense in combination. This, again, is primarily preparational for another change. gcc/ * ggc-common.cc (gt_pch_note_reorder, gt_pch_save): Tighten up a few things re 'reorder' option and strings. * stringpool.cc (gt_pch_p_S): This is now 'gcc_unreachable'.
2023-07-06	GTY: Clean up obsolete parametrized structs remnants	Thomas Schwinge
	Support removed in 2014 with commit 63f5d5b818319129217e41bcb23db53f99ff11b0 (Subversion r218558) "remove gengtype support for param_is use_param, if_marked and splay tree allocators". gcc/ * gengtype-parse.cc: Clean up obsolete parametrized structs remnants. * gengtype.cc: Likewise. * gengtype.h: Likewise.
2023-07-06	GTY: Clean up obsolete 'bool needs_cast_p' field of 'gcc/gengtype.cc:struct ↵	Thomas Schwinge
	walk_type_data' Last use disappeared in 2014 with commit 63f5d5b818319129217e41bcb23db53f99ff11b0 (Subversion r218558) "remove gengtype support for param_is use_param, if_marked and splay tree allocators". gcc/ * gengtype.cc (struct walk_type_data): Remove 'needs_cast_p'. Adjust all users.
2023-07-06	GTY: Repair 'enum gty_token', 'token_names' desynchronization	Thomas Schwinge
	For example, for the following (made-up) changes: --- gcc/ggc-tests.cc +++ gcc/ggc-tests.cc @@ -258 +258 @@ class GTY((tag("1"))) some_subclass : public example_base -class GTY((tag("2"))) some_other_subclass : public example_base +class GTY((tag(user))) some_other_subclass : public example_base @@ -384 +384 @@ test_chain_next () -struct GTY((user)) user_struct +struct GTY((user user)) user_struct ..., we get unexpected "have a param<N>_is option" diagnostics: [...] build/gengtype \ -S [...]/source-gcc/gcc -I gtyp-input.list -w tmp-gtype.state [...]/source-gcc/gcc/ggc-tests.cc:258: parse error: expected a string constant, have a param<N>_is option [...]/source-gcc/gcc/ggc-tests.cc:384: parse error: expected ')', have a param<N>_is option make[2]: *** [Makefile:2888: s-gtype] Error 1 [...] This traces back to 2012 "Support garbage-collected C++ templates", which got incorporated in commit 0823efedd0fb8669b7e840954bc54c3b2cf08d67 (Subversion r190402), which did add 'USER_GTY' to what nowadays is known as 'enum gty_token', but didn't accordingly update 'gcc/gengtype-parse.c:token_names', leaving those out of sync. Updating 'gcc/gengtype-parse.c:token_value_format' wasn't necessary, as: /* print_token assumes that any token >= FIRST_TOKEN_WITH_VALUE may have a meaningful value to be printed. / FIRST_TOKEN_WITH_VALUE = PARAM_IS This, in turn, got further confused -- or "fixed" -- by later changes: 2014 commit 63f5d5b818319129217e41bcb23db53f99ff11b0 (Subversion r218558) "remove gengtype support for param_is use_param, if_marked and splay tree allocators", which reciprocally missed corresponding clean-up. With that addressed via adding the missing '"user"' to 'token_names', and, until that is properly fixed, a temporary 'UNUSED_PARAM_IS' (re-)added for use with 'FIRST_TOKEN_WITH_VALUE', we then get the expected: [...]/source-gcc/gcc/ggc-tests.cc:258: parse error: expected a string constant, have 'user' [...]/source-gcc/gcc/ggc-tests.cc:384: parse error: expected ')', have 'user' gcc/ gengtype-parse.cc (token_names): Add '"user"'. * gengtype.h (gty_token): Add 'UNUSED_PARAM_IS' for use with 'FIRST_TOKEN_WITH_VALUE'.
2023-07-06	GTY: Enhance 'string_length' option documentation	Thomas Schwinge
	We're (currently) not aware of any actual use of 'ht_identifier's with NUL characters embedded; its 'len' field appears to exist for optimization purposes, since "forever". Before 'struct ht_identifier' was added in commit 2a967f3d3a45294640e155381ef549e0b8090ad4 (Subversion r42334), we had in 'gcc/cpplib.h:struct cpp_hashnode': 'unsigned short len', or earlier 'length', earlier in 'gcc/cpphash.h:struct hashnode': 'unsigned short length', earlier 'size_t length' with comment: "length of token, for quick comparison", earlier 'int length', ever since the 'gcc/cpp' files were added in commit 7f2935c734c36f84ab62b20a04de465e19061333 (Subversion r9191). This amends commit f3b957ea8b9dadfb1ed30f24f463529684b7a36a "pch: Fix streaming of strings with embedded null bytes". gcc/ doc/gty.texi (GTY Options) <string_length>: Enhance. libcpp/ * include/symtab.h (struct ht_identifier): Document different rationale.
2023-07-06	GTY: Explicitly reject 'string_length' option for (fields in) global variables	Thomas Schwinge
	This is preparational for another thing that I'm working on. No change in behavior -- other than a more explicit error message. The 'string_length' option currently is not supported for (fields in) global variables. For example, if we apply the following (made-up) changes: --- gcc/c-family/c-cppbuiltin.cc +++ gcc/c-family/c-cppbuiltin.cc @@ -1777 +1777 @@ struct GTY(()) lazy_hex_fp_value_struct - const char hex_str; + const char GTY((string_length("strlen(%h.hex_str) + 1"))) hex_str; --- gcc/varasm.cc +++ gcc/varasm.cc @@ -66 +66 @@ along with GCC; see the file COPYING3. If not see -extern GTY(()) const char first_global_object_name; +extern GTY((string_length("strlen(%h.first_global_object_name) + 1"))) const char first_global_object_name; ..., we get: [...] build/gengtype \ -S [...]/source-gcc/gcc -I gtyp-input.list -w tmp-gtype.state /bin/sh [...]/source-gcc/gcc/../move-if-change tmp-gtype.state gtype.state build/gengtype \ -r gtype.state [...]/source-gcc/gcc/varasm.cc:66: global `first_global_object_name' has unknown option `string_length' [...]/source-gcc/gcc/c-family/c-cppbuiltin.cc:1789: field `hex_str' of global `lazy_hex_fp_values[0]' has unknown option `string_length' make[2]: *** [Makefile:2890: s-gtype] Error 1 [...] These errors occur when writing "GC roots", where -- per my understanding -- 'string_length' isn't relevant for actual GC purposes. However, like elsewhere, it is for PCH purposes, and simply accepting 'string_length' here isn't sufficient: we'll still get '(gt_pointer_walker) &gt_pch_n_S' used in the 'struct ggc_root_tab' instances, and there's no easy way to change that to instead use 'gt_pch_n_S2' with explicit 'size_t string_len' argument. (At least not sufficiently easy to justify spending any further time on, given that I don't have an actual use for that feature.) So, until an actual need arises, and/or to avoid the next person looking into this having to figure out the same thing again, let's just document this limitation: [...]/source-gcc/gcc/varasm.cc:66: option `string_length' not supported for global `first_global_object_name' [...]/source-gcc/gcc/c-family/c-cppbuiltin.cc:1789: option `string_length' not supported for field `hex_str' of global `lazy_hex_fp_values[0]' This amends commit f3b957ea8b9dadfb1ed30f24f463529684b7a36a "pch: Fix streaming of strings with embedded null bytes". gcc/ * gengtype.cc (write_root, write_roots): Explicitly reject 'string_length' option. * doc/gty.texi (GTY Options) <string_length>: Document.
2023-07-06	GGC: Remove unused 'bool is_string' arguments to ↵	Thomas Schwinge
	'ggc_pch_{count,alloc,write}_object' They're unused since the removal of 'gcc/ggc-zone.c' in 2013 Subversion r195426 (Git commit cd030c079e5e42fe3f49261fe01f384e6b7f0111) "Remove zone allocator". Should any future 'gcc/ggc-[...].cc' ever need this again, it'll be a conscious decision at that time. gcc/ * ggc-internal.h (ggc_pch_count_object, ggc_pch_alloc_object) (ggc_pch_write_object): Remove 'bool is_string' argument. * ggc-common.cc: Adjust. * ggc-page.cc: Likewise.
2023-07-06	[Committed] Handle COPYSIGN in dwarf2out.cc's mem_loc_descriptor.	Roger Sayle
	Many thanks to Hans-Peter Nilsson for reminding me that new RTX codes need to be added to dwarf2out.cc's mem_loc_descriptor, and for doing this for BITREVERSE. This patch does the same for the recently added COPYSIGN. I'd been testing these on a target that doesn't use DWARF (nvptx-none) and so didn't exhibit the issue, and my additional testing on x86_64-pc-linux-gnu to double check that changes were safe, doesn't (yet) trigger the problematic assert in dwarf2out.cc's mem_loc_descriptor. 2023-07-06 Roger Sayle <roger@nextmovesoftware.com> gcc/ChangeLog * dwarf2out.cc (mem_loc_descriptor): Handle COPYSIGN.
2023-07-06	i386: Update document for inlining rules	Hongyu Wang
	gcc/ChangeLog: * doc/extend.texi: Move x86 inlining rule to a new subsubsection and add description for inling of function with arch and tune attributes.
2023-07-06	tree-optimization/110515 - wrong code with LIM + PRE	Richard Biener
	In this PR we face the issue that LIM speculates a load when hoisting it out of the loop (since it knows it cannot trap). Unfortunately this exposes undefined behavior when the load accesses memory with the wrong dynamic type. This later makes PRE use that representation instead of the original which accesses the same memory location but using a different dynamic type leading to a wrong disambiguation of that original access against another and thus a wrong-code transform. Fortunately there already is code in PRE dealing with a similar situation for code hoisting but that left a small gap which when fixed also fixes the wrong-code transform in this bug even if it doesn't address the underlying issue of LIM speculating that load. The upside is this fix is trivially safe to backport and chances of code generation regressions are very low. PR tree-optimization/110515 * tree-ssa-pre.cc (compute_avail): Make code dealing with hoisting loads with different alias-sets more robust. * g++.dg/opt/pr110515.C: New testcase.
2023-07-06	VECT: Fix ICE of variable stride on strieded load/store with SELECT_VL loop ↵	Ju-Zhe Zhong
	control. Hi, Richi. Sorry for making mistake on LEN_MASK_GATHER_LOAD/LEN_MASK_SCATTER_STORE with SELECT_VL loop control. Consider this following case: void __attribute__ ((noinline, noclone)) \ f_##DATA_TYPE##_##BITS (DATA_TYPE restrict dest, DATA_TYPE restrict src, \ INDEX##BITS stride, INDEX##BITS n) \ { \ for (INDEX##BITS i = 0; i < n; ++i) \ dest[i] += src[i * stride]; \ } When "stride" is a constant, current flow works fine. However, when "stride" is a variable. It causes an ICE: ... _96 = .SELECT_VL (ivtmp_94, 4); ... ivtmp_78 = ((sizetype) _39 * (sizetype) _96) * 4; vect__11.69_87 = .LEN_MASK_GATHER_LOAD (vectp_src.67_85, _84, 4, { 0, 0, 0, 0 }, { -1, -1, -1, -1 }, _96, 0); ... vectp_src.67_86 = vectp_src.67_85 + ivtmp_78; Becase the IR: ivtmp_78 = ((sizetype) _39 * (sizetype) _96) * 4; Instead, I split the IR into: step_stride = _39 step = step_stride * 4 ivtmp_78 = step * _96 Thanks. gcc/ChangeLog: * tree-vect-stmts.cc (vect_get_strided_load_store_ops): Fix ICE.