Backport Maxim's scheduling improvements

Backport from trunk r220808. 2015-02-19 Maxim Kuvyrkov <maxim.kuvyrkov@linaro.org> * haifa-sched.c (enum rfs_decision, rfs_str): Remove RFS_DEBUG. (rank_for_schedule_debug): Update. (ready_sort): Make static. Move sorting logic to ... (ready_sort_debug, ready_sort_real): New static functions. (schedule_block): Sort both debug insns and real insns in preparation for ready list trimming. Improve debug output. * sched-int.h (ready_sort): Remove global declaration. Backport from trunk r220316. 2015-02-01 Maxim Kuvyrkov <maxim.kuvyrkov@linaro.org> * haifa-sched.c (INSN_RFS_DEBUG_ORIG_ORDER): New access macro. (rank_for_schedule_debug): Split from ... (rank_for_schedule): ... this. (ready_sort): Sort DEBUG_INSNs separately from normal INSNs. * sched-int.h (struct _haifa_insn_data): New field rfs_debug_orig_order. Backport from trunk r219893. 2015-01-20 Maxim Kuvyrkov <maxim.kuvyrkov@linaro.org> * config/arm/arm-protos.h (enum arm_sched_autopref): New constants. (struct tune_params): Use the enum. * arm.c (arm_*_tune): Update. (arm_option_override): Update. Backport from trunk r219789. * config/arm/arm-protos.h (struct tune_params): New field sched_autopref_queue_depth. * config/arm/arm.c (sched-int.h): Include header. (arm_first_cycle_multipass_dfa_lookahead_guard,) (TARGET_SCHED_FIRST_CYCLE_MULTIPASS_DFA_LOOKAHEAD_GUARD): Define hook. (arm_slowmul_tune, arm_fastmul_tune, arm_strongarm_tune,) (arm_xscale_tune, arm_9e_tune, arm_v6t2_tune, arm_cortex_tune,) (arm_cortex_a8_tune, arm_cortex_a7_tune, arm_cortex_a15_tune,) (arm_cortex_a53_tune, arm_cortex_a57_tune, arm_xgene1_tune,) (arm_cortex_a5_tune, arm_cortex_a9_tune, arm_cortex_a12_tune,) (arm_v7m_tune, arm_cortex_m7_tune, arm_v6m_tune, arm_fa726te_tune): Specify sched_autopref_queue_depth value. Enabled for A15 and A57. * config/arm/t-arm (arm.o): Update. * haifa-sched.c (update_insn_after_change): Update. (rank_for_schedule): Use auto-prefetcher model, if requested. (autopref_multipass_init): New static function. (autopref_rank_for_schedule): New rank_for_schedule heuristic. (autopref_multipass_dfa_lookahead_guard_started_dump_p): New static variable for debug dumps. (autopref_multipass_dfa_lookahead_guard_1): New static helper function. (autopref_multipass_dfa_lookahead_guard): New global function that implements TARGET_SCHED_FIRST_CYCLE_MULTIPASS_DFA_LOOKAHEAD_GUARD hook. (init_h_i_d): Update. * params.def (PARAM_SCHED_AUTOPREF_QUEUE_DEPTH): New tuning knob. * sched-int.h (enum autopref_multipass_data_status): New const enum. (autopref_multipass_data_): Structure for auto-prefetcher data. (autopref_multipass_data_def, autopref_multipass_data_t): New typedefs. (struct _haifa_insn_data:autopref_multipass_data): New field. (INSN_AUTOPREF_MULTIPASS_DATA): New access macro. (autopref_multipass_dfa_lookahead_guard): Declare. 2015-01-17 Maxim Kuvyrkov <maxim.kuvyrkov@linaro.org> Backport from trunk r219787. 2015-01-17 Maxim Kuvyrkov <maxim.kuvyrkov@linaro.org> * config/aarch64/aarch64.c (aarch64_sched_first_cycle_multipass_dfa_lookahead): Implement hook. (TARGET_SCHED_FIRST_CYCLE_MULTIPASS_DFA_LOOKAHEAD): Define. * config/arm/arm.c (arm_first_cycle_multipass_dfa_lookahead): Implement hook. (TARGET_SCHED_FIRST_CYCLE_MULTIPASS_DFA_LOOKAHEAD): Define. Backport from trunk r216624. * rtlanal.c (get_base_term): Handle SCRATCH. 2014-10-24 Maxim Kuvyrkov <maxim.kuvyrkov@gmail.com> Backport from trunk r216623. * haifa-sched.c (sched_init): Disable max_issue when scheduling for register pressure. 2014-10-24 Maxim Kuvyrkov <maxim.kuvyrkov@gmail.com> Backport from trunk r216622. * haifa-sched.c (cached_first_cycle_multipass_dfa_lookahead,) (cached_issue_rate): Remove. Use dfa_lookahead and issue_rate instead. (max_issue, choose_ready, sched_init): Update. 2014-10-24 Maxim Kuvyrkov <maxim.kuvyrkov@gmail.com> Backport from trunk r216621. * sched-int.h (struct _haifa_insn_data:last_rfs_win): New field. * haifa-sched.c (INSN_LAST_RFS_WIN): New access macro. (rfs_result): Set INSN_LAST_RFS_WIN. Update signature. (rank_for_schedule): Update calls to rfs_result to pass new parameters. (print_rank_for_schedule_stats): Print out elements of ready list that ended up on their respective places due to each of the sorting heuristics. (ready_sort): Update. (debug_ready_list_1): Improve printout for SCHED_PRESSURE_MODEL. (schedule_block): Update. 2014-10-24 Maxim Kuvyrkov <maxim.kuvyrkov@gmail.com> Backport from trunk r216620. 2014-10-24 Maxim Kuvyrkov <maxim.kuvyrkov@gmail.com> * haifa-sched.c (sched_class_regs_num, call_used_regs_num): New static arrays. Use sched_class_regs_num instead of ira_class_hard_regs_num. (print_curr_reg_pressure, setup_insn_reg_pressure_info,) (model_update_pressure, model_spill_cost): Use sched_class_regs_num. (model_start_schedule): Update. (sched_pressure_start_bb): New static function. Calculate sched_class_regs_num. (schedule_block): Use it. (alloc_global_sched_pressure_data): Calculate call_used_regs_num. Backport from trunk r213709. * haifa-sched.c (SCHED_SORT): Delete. Macro used exactly once. (enum rfs_decition:RFS_*): New constants wrapped in an enum. (rfs_str): String corresponding to RFS_* constants. (rank_for_schedule_stats_t): New typedef. (rank_for_schedule_stats): New static variable. (rfs_result): New static function. (rank_for_schedule): Track statistics for deciding heuristics. (rank_for_schedule_stats_diff, print_rank_for_schedule_stats): New static functions. (ready_sort): Use them for debug printouts. (schedule_block): Init statistics state. Print statistics on rank_for_schedule decisions. 2014-08-07 Maxim Kuvyrkov <maxim.kuvyrkov@linaro.org> Backport from trunk r213708. 2014-08-07 Maxim Kuvyrkov <maxim.kuvyrkov@linaro.org> * haifa-sched.c (rank_for_schedule): Fix INSN_TICK-based heuristics. Backport from trunk r210845. 2014-05-23 Maxim Kuvyrkov <maxim.kuvyrkov@linaro.org> Fix bootstrap error on ia64 * config/ia64/ia64.c (ia64_first_cycle_multipass_dfa_lookahead_guard): Return default value. Backport from trunk r210747. Cleanup and improve multipass_dfa_lookahead_guard * config/i386/i386.c (core2i7_first_cycle_multipass_filter_ready_try,) (core2i7_first_cycle_multipass_begin,) (core2i7_first_cycle_multipass_issue,) (core2i7_first_cycle_multipass_backtrack): Update signature. * config/ia64/ia64.c (ia64_first_cycle_multipass_dfa_lookahead_guard_spec): Remove. (ia64_first_cycle_multipass_dfa_lookahead_guard): Update signature. (TARGET_SCHED_FIRST_CYCLE_MULTIPASS_DFA_LOOKAHEAD_GUARD_SPEC): Remove hook definition. (ia64_first_cycle_multipass_dfa_lookahead_guard): Merge logic from ia64_first_cycle_multipass_dfa_lookahead_guard_spec. Update return values. * config/rs6000/rs6000.c (rs6000_use_sched_lookahead_guard): Update return values. * doc/tm.texi: Regenerate. * doc/tm.texi.in (TARGET_SCHED_FIRST_CYCLE_MULTIPASS_DFA_LOOKAHEAD_GUARD_SPEC): Remove. * haifa-sched.c (ready_try): Make signed to allow negative values. (rebug_ready_list_1): Update. (choose_ready): Simplify. (sched_extend_ready_list): Update. 2014-05-22 Maxim Kuvyrkov <maxim.kuvyrkov@linaro.org> Backport from trunk r210746. Remove IA64 speculation tweaking flags * config/ia64/ia64.c (ia64_set_sched_flags): Delete handling of speculation tuning flags. (msched-prefer-non-data-spec-insns,) (msched-prefer-non-control-spec-insns): Obsolete options. * haifa-sched.c (choose_ready): Remove handling of PREFER_NON_CONTROL_SPEC and PREFER_NON_DATA_SPEC. * sched-int.h (enum SPEC_SCHED_FLAGS): Remove PREFER_NON_CONTROL_SPEC and PREFER_NON_DATA_SPEC. * sel-sched.c (process_spec_exprs): Remove handling of PREFER_NON_CONTROL_SPEC and PREFER_NON_DATA_SPEC. Backport from trunk r210744. 2014-05-22 Maxim Kuvyrkov <maxim.kuvyrkov@linaro.org> Improve scheduling debug output * haifa-sched.c (debug_ready_list): Remove unnecessary prototype. (advance_one_cycle): Update. (schedule_insn, queue_to_ready): Add debug printouts. (debug_ready_list_1): New static function. (debug_ready_list): Update. (max_issue): Add debug printouts. (dump_insn_stream): New static function. (schedule_block): Use it. Also better indent printouts. 2014-05-22 Maxim Kuvyrkov <maxim.kuvyrkov@linaro.org> Fix sched_insn debug counter * haifa-sched.c (schedule_insn): Update. (struct haifa_saved_data): Add nonscheduled_insns_begin. (save_backtrack_point, restore_backtrack_point): Update. (first_nonscheduled_insn): New static function. (queue_to_ready, choose_ready): Use it. (schedule_block): Init nonscheduled_insns_begin. (sched_emit_insn): Update. Backport from trunk r220808. * gcc.dg/pr64935-1.c, gcc.dg/pr64935-2.c: New tests. git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/linaro/gcc-4_9-branch@221634 138bc75d-0d04-0410-961f-82ee72b054a4
author: mkuvyrkov <mkuvyrkov@138bc75d-0d04-0410-961f-82ee72b054a4> 2015-03-24 14:46:03 +0000
committer: mkuvyrkov <mkuvyrkov@138bc75d-0d04-0410-961f-82ee72b054a4> 2015-03-24 14:46:03 +0000
commit: 878923a26c3d37e2f9ebecb630965ba97ed70df0 (patch)
tree: 3b6ee34367f463b25293432d36f49aa5ce2e00f4
parent: 62fd83c70da56dd9a270cb4f026c24281b8c7ef5 (diff)
20 files changed, 1189 insertions, 358 deletions
diff --git a/gcc/ChangeLog.linaro b/gcc/ChangeLog.linaro
index c41618dba05..e3e5ee9774c 100644
--- a/gcc/ChangeLog.linaro
+++ b/gcc/ChangeLog.linaro
@@ -1,3 +1,213 @@
+2015-03-24  Maxim Kuvyrkov  <maxim.kuvyrkov@linaro.org>
+
+	Backport from trunk r220808.
+	2015-02-19  Maxim Kuvyrkov  <maxim.kuvyrkov@linaro.org>
+
+        * haifa-sched.c (enum rfs_decision, rfs_str): Remove RFS_DEBUG.
+        (rank_for_schedule_debug): Update.
+        (ready_sort): Make static.  Move sorting logic to ...
+        (ready_sort_debug, ready_sort_real): New static functions.
+        (schedule_block): Sort both debug insns and real insns in preparation
+        for ready list trimming.  Improve debug output.
+        * sched-int.h (ready_sort): Remove global declaration.
+
+	Backport from trunk r220316.
+	2015-02-01  Maxim Kuvyrkov  <maxim.kuvyrkov@linaro.org>
+
+	* haifa-sched.c (INSN_RFS_DEBUG_ORIG_ORDER): New access macro.
+	(rank_for_schedule_debug): Split from ...
+	(rank_for_schedule): ... this.
+	(ready_sort): Sort DEBUG_INSNs separately from normal INSNs.
+	* sched-int.h (struct _haifa_insn_data): New field rfs_debug_orig_order.
+
+	Backport from trunk r219893.
+	2015-01-20  Maxim Kuvyrkov  <maxim.kuvyrkov@linaro.org>
+
+	* config/arm/arm-protos.h (enum arm_sched_autopref): New constants.
+	(struct tune_params): Use the enum.
+	* arm.c (arm_*_tune): Update.
+	(arm_option_override): Update.
+
+	Backport from trunk r219789.
+	* config/arm/arm-protos.h (struct tune_params): New field
+	sched_autopref_queue_depth.
+	* config/arm/arm.c (sched-int.h): Include header.
+	(arm_first_cycle_multipass_dfa_lookahead_guard,)
+	(TARGET_SCHED_FIRST_CYCLE_MULTIPASS_DFA_LOOKAHEAD_GUARD): Define hook.
+	(arm_slowmul_tune, arm_fastmul_tune, arm_strongarm_tune,)
+	(arm_xscale_tune, arm_9e_tune, arm_v6t2_tune, arm_cortex_tune,)
+	(arm_cortex_a8_tune, arm_cortex_a7_tune, arm_cortex_a15_tune,)
+	(arm_cortex_a53_tune, arm_cortex_a57_tune, arm_xgene1_tune,)
+	(arm_cortex_a5_tune, arm_cortex_a9_tune, arm_cortex_a12_tune,)
+	(arm_v7m_tune, arm_cortex_m7_tune, arm_v6m_tune, arm_fa726te_tune):
+	Specify sched_autopref_queue_depth value.  Enabled for A15 and A57.
+	* config/arm/t-arm (arm.o): Update.
+	* haifa-sched.c (update_insn_after_change): Update.
+	(rank_for_schedule): Use auto-prefetcher model, if requested.
+	(autopref_multipass_init): New static function.
+	(autopref_rank_for_schedule): New rank_for_schedule heuristic.
+	(autopref_multipass_dfa_lookahead_guard_started_dump_p): New static
+	variable for debug dumps.
+	(autopref_multipass_dfa_lookahead_guard_1): New static helper function.
+	(autopref_multipass_dfa_lookahead_guard): New global function that
+	implements TARGET_SCHED_FIRST_CYCLE_MULTIPASS_DFA_LOOKAHEAD_GUARD hook.
+	(init_h_i_d): Update.
+	* params.def (PARAM_SCHED_AUTOPREF_QUEUE_DEPTH): New tuning knob.
+	* sched-int.h (enum autopref_multipass_data_status): New const enum.
+	(autopref_multipass_data_): Structure for auto-prefetcher data.
+	(autopref_multipass_data_def, autopref_multipass_data_t): New typedefs.
+	(struct _haifa_insn_data:autopref_multipass_data): New field.
+	(INSN_AUTOPREF_MULTIPASS_DATA): New access macro.
+	(autopref_multipass_dfa_lookahead_guard): Declare.
+
+	2015-01-17  Maxim Kuvyrkov  <maxim.kuvyrkov@linaro.org>
+
+	Backport from trunk r219787.
+	2015-01-17  Maxim Kuvyrkov  <maxim.kuvyrkov@linaro.org>
+
+	* config/aarch64/aarch64.c
+	(aarch64_sched_first_cycle_multipass_dfa_lookahead): Implement hook.
+	(TARGET_SCHED_FIRST_CYCLE_MULTIPASS_DFA_LOOKAHEAD): Define.
+	* config/arm/arm.c
+	(arm_first_cycle_multipass_dfa_lookahead): Implement hook.
+	(TARGET_SCHED_FIRST_CYCLE_MULTIPASS_DFA_LOOKAHEAD): Define.
+
+	Backport from trunk r216624.
+        * rtlanal.c (get_base_term): Handle SCRATCH.
+
+	2014-10-24  Maxim Kuvyrkov  <maxim.kuvyrkov@gmail.com>
+
+	Backport from trunk r216623.
+        * haifa-sched.c (sched_init): Disable max_issue when scheduling for
+        register pressure.
+
+	2014-10-24  Maxim Kuvyrkov  <maxim.kuvyrkov@gmail.com>
+
+	Backport from trunk r216622.
+        * haifa-sched.c (cached_first_cycle_multipass_dfa_lookahead,)
+        (cached_issue_rate): Remove.  Use dfa_lookahead and issue_rate instead.
+        (max_issue, choose_ready, sched_init): Update.
+
+	2014-10-24  Maxim Kuvyrkov  <maxim.kuvyrkov@gmail.com>
+
+	Backport from trunk r216621.
+	* sched-int.h (struct _haifa_insn_data:last_rfs_win): New field.
+	* haifa-sched.c (INSN_LAST_RFS_WIN): New access macro.
+	(rfs_result): Set INSN_LAST_RFS_WIN.  Update signature.
+	(rank_for_schedule): Update calls to rfs_result to pass new parameters.
+	(print_rank_for_schedule_stats): Print out elements of ready list that
+	ended up on their respective places due to each of the sorting
+	heuristics.
+	(ready_sort): Update.
+	(debug_ready_list_1): Improve printout for SCHED_PRESSURE_MODEL.
+	(schedule_block): Update.
+
+	2014-10-24  Maxim Kuvyrkov  <maxim.kuvyrkov@gmail.com>
+
+	Backport from trunk r216620.
+	2014-10-24  Maxim Kuvyrkov  <maxim.kuvyrkov@gmail.com>
+
+	* haifa-sched.c (sched_class_regs_num, call_used_regs_num): New static
+	arrays.  Use sched_class_regs_num instead of ira_class_hard_regs_num.
+	(print_curr_reg_pressure, setup_insn_reg_pressure_info,)
+	(model_update_pressure, model_spill_cost): Use sched_class_regs_num.
+	(model_start_schedule): Update.
+	(sched_pressure_start_bb): New static function.  Calculate
+	sched_class_regs_num.
+	(schedule_block): Use it.
+	(alloc_global_sched_pressure_data): Calculate call_used_regs_num.
+
+	Backport from trunk r213709.
+	* haifa-sched.c (SCHED_SORT): Delete.  Macro used exactly once.
+	(enum rfs_decition:RFS_*): New constants wrapped in an enum.
+	(rfs_str): String corresponding to RFS_* constants.
+	(rank_for_schedule_stats_t): New typedef.
+	(rank_for_schedule_stats): New static variable.
+	(rfs_result): New static function.
+	(rank_for_schedule): Track statistics for deciding heuristics.
+	(rank_for_schedule_stats_diff, print_rank_for_schedule_stats): New
+	static functions.
+	(ready_sort): Use them for debug printouts.
+	(schedule_block): Init statistics state.  Print statistics on
+	rank_for_schedule decisions.
+
+	2014-08-07  Maxim Kuvyrkov  <maxim.kuvyrkov@linaro.org>
+
+	Backport from trunk r213708.
+	2014-08-07  Maxim Kuvyrkov  <maxim.kuvyrkov@linaro.org>
+
+	* haifa-sched.c (rank_for_schedule): Fix INSN_TICK-based heuristics.
+
+	Backport from trunk r210845.
+	2014-05-23  Maxim Kuvyrkov  <maxim.kuvyrkov@linaro.org>
+
+	Fix bootstrap error on ia64
+	* config/ia64/ia64.c (ia64_first_cycle_multipass_dfa_lookahead_guard):
+	Return default value.
+
+	Backport from trunk r210747.
+	Cleanup and improve multipass_dfa_lookahead_guard
+        * config/i386/i386.c (core2i7_first_cycle_multipass_filter_ready_try,)
+        (core2i7_first_cycle_multipass_begin,)
+        (core2i7_first_cycle_multipass_issue,)
+        (core2i7_first_cycle_multipass_backtrack): Update signature.
+        * config/ia64/ia64.c
+        (ia64_first_cycle_multipass_dfa_lookahead_guard_spec): Remove.
+        (ia64_first_cycle_multipass_dfa_lookahead_guard): Update signature.
+        (TARGET_SCHED_FIRST_CYCLE_MULTIPASS_DFA_LOOKAHEAD_GUARD_SPEC): Remove
+        hook definition.
+        (ia64_first_cycle_multipass_dfa_lookahead_guard): Merge logic from
+        ia64_first_cycle_multipass_dfa_lookahead_guard_spec.  Update return
+        values.
+        * config/rs6000/rs6000.c (rs6000_use_sched_lookahead_guard): Update
+        return values.
+        * doc/tm.texi: Regenerate.
+        * doc/tm.texi.in
+        (TARGET_SCHED_FIRST_CYCLE_MULTIPASS_DFA_LOOKAHEAD_GUARD_SPEC): Remove.
+        * haifa-sched.c (ready_try): Make signed to allow negative values.
+        (rebug_ready_list_1): Update.
+        (choose_ready): Simplify.
+        (sched_extend_ready_list): Update.
+
+	2014-05-22  Maxim Kuvyrkov  <maxim.kuvyrkov@linaro.org>
+
+	Backport from trunk r210746.
+	Remove IA64 speculation tweaking flags
+        * config/ia64/ia64.c (ia64_set_sched_flags): Delete handling of
+        speculation tuning flags.
+        (msched-prefer-non-data-spec-insns,)
+        (msched-prefer-non-control-spec-insns): Obsolete options.
+        * haifa-sched.c (choose_ready): Remove handling of
+        PREFER_NON_CONTROL_SPEC and PREFER_NON_DATA_SPEC.
+        * sched-int.h (enum SPEC_SCHED_FLAGS): Remove PREFER_NON_CONTROL_SPEC
+        and PREFER_NON_DATA_SPEC.
+        * sel-sched.c (process_spec_exprs): Remove handling of
+        PREFER_NON_CONTROL_SPEC and PREFER_NON_DATA_SPEC.
+
+	Backport from trunk r210744.
+	2014-05-22  Maxim Kuvyrkov  <maxim.kuvyrkov@linaro.org>
+
+	Improve scheduling debug output
+	* haifa-sched.c (debug_ready_list): Remove unnecessary prototype.
+	(advance_one_cycle): Update.
+	(schedule_insn, queue_to_ready): Add debug printouts.
+	(debug_ready_list_1): New static function.
+	(debug_ready_list): Update.
+	(max_issue): Add debug printouts.
+	(dump_insn_stream): New static function.
+	(schedule_block): Use it.  Also better indent printouts.
+
+	2014-05-22  Maxim Kuvyrkov  <maxim.kuvyrkov@linaro.org>
+
+	Fix sched_insn debug counter
+	* haifa-sched.c (schedule_insn): Update.
+	(struct haifa_saved_data): Add nonscheduled_insns_begin.
+	(save_backtrack_point, restore_backtrack_point): Update.
+	(first_nonscheduled_insn): New static function.
+	(queue_to_ready, choose_ready): Use it.
+	(schedule_block): Init nonscheduled_insns_begin.
+	(sched_emit_insn): Update.
+
 2015-03-18  Michael Collison  <michael.collison@linaro.org>
 
 	Backport from trunk r218525.
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 8e9d2d4274a..1cfabdbc404 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -65,6 +65,7 @@
 #include "aarch64-cost-tables.h"
 #include "dumpfile.h"
 #include "tm-constrs.h"
+#include "sched-int.h"
 
 /* Defined for convenience.  */
 #define POINTER_BYTES (POINTER_SIZE / BITS_PER_UNIT)
@@ -6156,6 +6157,14 @@ aarch64_sched_issue_rate (void)
   return aarch64_tune_params->issue_rate;
 }
 
+static int
+aarch64_sched_first_cycle_multipass_dfa_lookahead (void)
+{
+  int issue_rate = aarch64_sched_issue_rate ();
+
+  return issue_rate > 1 ? issue_rate : 0;
+}
+
 /* Vectorizer cost model target hooks.  */
 
 /* Implement targetm.vectorize.builtin_vectorization_cost.  */
@@ -10395,6 +10404,10 @@ aarch_macro_fusion_pair_p (rtx prev, rtx curr)
 #undef TARGET_SCHED_ISSUE_RATE
 #define TARGET_SCHED_ISSUE_RATE aarch64_sched_issue_rate
 
+#undef TARGET_SCHED_FIRST_CYCLE_MULTIPASS_DFA_LOOKAHEAD
+#define TARGET_SCHED_FIRST_CYCLE_MULTIPASS_DFA_LOOKAHEAD \
+  aarch64_sched_first_cycle_multipass_dfa_lookahead
+
 #undef TARGET_TRAMPOLINE_INIT
 #define TARGET_TRAMPOLINE_INIT aarch64_trampoline_init
 
diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
index 76352f8357c..0e165ae9c0f 100644
--- a/gcc/config/arm/arm-protos.h
+++ b/gcc/config/arm/arm-protos.h
@@ -249,6 +249,13 @@ struct cpu_vec_costs {
 
 struct cpu_cost_table;
 
+enum arm_sched_autopref
+  {
+    ARM_SCHED_AUTOPREF_OFF,
+    ARM_SCHED_AUTOPREF_RANK,
+    ARM_SCHED_AUTOPREF_FULL
+  };
+
 struct tune_params
 {
   bool (*rtx_costs) (rtx, RTX_CODE, RTX_CODE, int *, bool);
@@ -277,6 +284,8 @@ struct tune_params
   /* Prefer 32-bit encoding instead of 16-bit encoding where subset of flags
      would be set.  */
   bool disparage_partial_flag_setting_t16_encodings;
+  /* Depth of scheduling queue to check for L2 autoprefetcher.  */
+  enum arm_sched_autopref sched_autopref;
 };
 
 extern const struct tune_params *current_tune;
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 0fcabee4ab9..6658a4fb201 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -61,6 +61,7 @@
 #include "opts.h"
 #include "dumpfile.h"
 #include "gimple-expr.h"
+#include "sched-int.h"
 
 /* Forward definitions of types.  */
 typedef struct minipool_node    Mnode;
@@ -239,6 +240,8 @@ static void arm_option_override (void);
 static unsigned HOST_WIDE_INT arm_shift_truncation_mask (enum machine_mode);
 static bool arm_cannot_copy_insn_p (rtx);
 static int arm_issue_rate (void);
+static int arm_first_cycle_multipass_dfa_lookahead (void);
+static int arm_first_cycle_multipass_dfa_lookahead_guard (rtx, int);
 static void arm_output_dwarf_dtprel (FILE *, int, rtx) ATTRIBUTE_UNUSED;
 static bool arm_output_addr_const_extra (FILE *, rtx);
 static bool arm_allocate_stack_slots_for_args (void);
@@ -584,6 +587,14 @@ static const struct attribute_spec arm_attribute_table[] =
 #undef TARGET_SCHED_ISSUE_RATE
 #define TARGET_SCHED_ISSUE_RATE arm_issue_rate
 
+#undef TARGET_SCHED_FIRST_CYCLE_MULTIPASS_DFA_LOOKAHEAD
+#define TARGET_SCHED_FIRST_CYCLE_MULTIPASS_DFA_LOOKAHEAD \
+  arm_first_cycle_multipass_dfa_lookahead
+
+#undef TARGET_SCHED_FIRST_CYCLE_MULTIPASS_DFA_LOOKAHEAD_GUARD
+#define TARGET_SCHED_FIRST_CYCLE_MULTIPASS_DFA_LOOKAHEAD_GUARD \
+  arm_first_cycle_multipass_dfa_lookahead_guard
+
 #undef TARGET_MANGLE_TYPE
 #define TARGET_MANGLE_TYPE arm_mangle_type
 
@@ -1703,7 +1714,8 @@ const struct tune_params arm_slowmul_tune =
   {true, true},					/* Prefer non short circuit.  */
   &arm_default_vec_cost,                        /* Vectorizer costs.  */
   false,                                        /* Prefer Neon for 64-bits bitops.  */
-  false, false                                  /* Prefer 32-bit encodings.  */
+  false, false,                                 /* Prefer 32-bit encodings.  */
+  ARM_SCHED_AUTOPREF_OFF			/* Sched L2 autopref.  */
 };
 
 const struct tune_params arm_fastmul_tune =
@@ -1720,7 +1732,8 @@ const struct tune_params arm_fastmul_tune =
   {true, true},					/* Prefer non short circuit.  */
   &arm_default_vec_cost,                        /* Vectorizer costs.  */
   false,                                        /* Prefer Neon for 64-bits bitops.  */
-  false, false                                  /* Prefer 32-bit encodings.  */
+  false, false,                                 /* Prefer 32-bit encodings.  */
+  ARM_SCHED_AUTOPREF_OFF			/* Sched L2 autopref.  */
 };
 
 /* StrongARM has early execution of branches, so a sequence that is worth
@@ -1740,7 +1753,8 @@ const struct tune_params arm_strongarm_tune =
   {true, true},					/* Prefer non short circuit.  */
   &arm_default_vec_cost,                        /* Vectorizer costs.  */
   false,                                        /* Prefer Neon for 64-bits bitops.  */
-  false, false                                  /* Prefer 32-bit encodings.  */
+  false, false,                                 /* Prefer 32-bit encodings.  */
+  ARM_SCHED_AUTOPREF_OFF			/* Sched L2 autopref.  */
 };
 
 const struct tune_params arm_xscale_tune =
@@ -1757,7 +1771,8 @@ const struct tune_params arm_xscale_tune =
   {true, true},					/* Prefer non short circuit.  */
   &arm_default_vec_cost,                        /* Vectorizer costs.  */
   false,                                        /* Prefer Neon for 64-bits bitops.  */
-  false, false                                  /* Prefer 32-bit encodings.  */
+  false, false,                                 /* Prefer 32-bit encodings.  */
+  ARM_SCHED_AUTOPREF_OFF			/* Sched L2 autopref.  */
 };
 
 const struct tune_params arm_9e_tune =
@@ -1774,7 +1789,8 @@ const struct tune_params arm_9e_tune =
   {true, true},					/* Prefer non short circuit.  */
   &arm_default_vec_cost,                        /* Vectorizer costs.  */
   false,                                        /* Prefer Neon for 64-bits bitops.  */
-  false, false                                  /* Prefer 32-bit encodings.  */
+  false, false,                                 /* Prefer 32-bit encodings.  */
+  ARM_SCHED_AUTOPREF_OFF			/* Sched L2 autopref.  */
 };
 
 const struct tune_params arm_v6t2_tune =
@@ -1791,7 +1807,8 @@ const struct tune_params arm_v6t2_tune =
   {true, true},					/* Prefer non short circuit.  */
   &arm_default_vec_cost,                        /* Vectorizer costs.  */
   false,                                        /* Prefer Neon for 64-bits bitops.  */
-  false, false                                  /* Prefer 32-bit encodings.  */
+  false, false,                                 /* Prefer 32-bit encodings.  */
+  ARM_SCHED_AUTOPREF_OFF			/* Sched L2 autopref.  */
 };
 
 /* Generic Cortex tuning.  Use more specific tunings if appropriate.  */
@@ -1809,7 +1826,8 @@ const struct tune_params arm_cortex_tune =
   {true, true},					/* Prefer non short circuit.  */
   &arm_default_vec_cost,                        /* Vectorizer costs.  */
   false,                                        /* Prefer Neon for 64-bits bitops.  */
-  false, false                                  /* Prefer 32-bit encodings.  */
+  false, false,                                 /* Prefer 32-bit encodings.  */
+  ARM_SCHED_AUTOPREF_OFF			/* Sched L2 autopref.  */
 };
 
 const struct tune_params arm_cortex_a8_tune =
@@ -1826,7 +1844,8 @@ const struct tune_params arm_cortex_a8_tune =
   {true, true},					/* Prefer non short circuit.  */
   &arm_default_vec_cost,                        /* Vectorizer costs.  */
   false,                                        /* Prefer Neon for 64-bits bitops.  */
-  false, false                                  /* Prefer 32-bit encodings.  */
+  false, false,                                 /* Prefer 32-bit encodings.  */
+  ARM_SCHED_AUTOPREF_OFF			/* Sched L2 autopref.  */
 };
 
 const struct tune_params arm_cortex_a7_tune =
@@ -1843,7 +1862,8 @@ const struct tune_params arm_cortex_a7_tune =
   {true, true},					/* Prefer non short circuit.  */
   &arm_default_vec_cost,			/* Vectorizer costs.  */
   false,					/* Prefer Neon for 64-bits bitops.  */
-  false, false                                  /* Prefer 32-bit encodings.  */
+  false, false,                                 /* Prefer 32-bit encodings.  */
+  ARM_SCHED_AUTOPREF_OFF			/* Sched L2 autopref.  */
 };
 
 const struct tune_params arm_cortex_a15_tune =
@@ -1860,7 +1880,8 @@ const struct tune_params arm_cortex_a15_tune =
   {true, true},					/* Prefer non short circuit.  */
   &arm_default_vec_cost,                        /* Vectorizer costs.  */
   false,                                        /* Prefer Neon for 64-bits bitops.  */
-  true, true                                    /* Prefer 32-bit encodings.  */
+  true, true,                                   /* Prefer 32-bit encodings.  */
+  ARM_SCHED_AUTOPREF_FULL			/* Sched L2 autopref.  */
 };
 
 const struct tune_params arm_cortex_a53_tune =
@@ -1877,7 +1898,8 @@ const struct tune_params arm_cortex_a53_tune =
   {true, true},					/* Prefer non short circuit.  */
   &arm_default_vec_cost,			/* Vectorizer costs.  */
   false,					/* Prefer Neon for 64-bits bitops.  */
-  false, false                                  /* Prefer 32-bit encodings.  */
+  false, false,                                 /* Prefer 32-bit encodings.  */
+  ARM_SCHED_AUTOPREF_OFF			/* Sched L2 autopref.  */
 };
 
 const struct tune_params arm_cortex_a57_tune =
@@ -1894,7 +1916,8 @@ const struct tune_params arm_cortex_a57_tune =
   {true, true},                                /* Prefer non short circuit.  */
   &arm_default_vec_cost,                       /* Vectorizer costs.  */
   false,                                       /* Prefer Neon for 64-bits bitops.  */
-  true, true                                   /* Prefer 32-bit encodings.  */
+  true, true,                                  /* Prefer 32-bit encodings.  */
+  ARM_SCHED_AUTOPREF_FULL			/* Sched L2 autopref.  */
 };
 
 /* Branches can be dual-issued on Cortex-A5, so conditional execution is
@@ -1914,7 +1937,8 @@ const struct tune_params arm_cortex_a5_tune =
   {false, false},				/* Prefer non short circuit.  */
   &arm_default_vec_cost,                        /* Vectorizer costs.  */
   false,                                        /* Prefer Neon for 64-bits bitops.  */
-  false, false                                  /* Prefer 32-bit encodings.  */
+  false, false,                                 /* Prefer 32-bit encodings.  */
+  ARM_SCHED_AUTOPREF_OFF			/* Sched L2 autopref.  */
 };
 
 const struct tune_params arm_cortex_a9_tune =
@@ -1931,7 +1955,8 @@ const struct tune_params arm_cortex_a9_tune =
   {true, true},					/* Prefer non short circuit.  */
   &arm_default_vec_cost,                        /* Vectorizer costs.  */
   false,                                        /* Prefer Neon for 64-bits bitops.  */
-  false, false                                  /* Prefer 32-bit encodings.  */
+  false, false,                                 /* Prefer 32-bit encodings.  */
+  ARM_SCHED_AUTOPREF_OFF			/* Sched L2 autopref.  */
 };
 
 const struct tune_params arm_cortex_a12_tune =
@@ -1948,7 +1973,8 @@ const struct tune_params arm_cortex_a12_tune =
   {true, true},					/* Prefer non short circuit.  */
   &arm_default_vec_cost,                        /* Vectorizer costs.  */
   false,                                        /* Prefer Neon for 64-bits bitops.  */
-  false, false                                  /* Prefer 32-bit encodings.  */
+  false, false,                                 /* Prefer 32-bit encodings.  */
+  ARM_SCHED_AUTOPREF_OFF			/* Sched L2 autopref.  */
 };
 
 /* armv7m tuning.  On Cortex-M4 cores for example, MOVW/MOVT take a single
@@ -1972,7 +1998,8 @@ const struct tune_params arm_v7m_tune =
   {false, false},				/* Prefer non short circuit.  */
   &arm_default_vec_cost,                        /* Vectorizer costs.  */
   false,                                        /* Prefer Neon for 64-bits bitops.  */
-  false, false                                  /* Prefer 32-bit encodings.  */
+  false, false,                                 /* Prefer 32-bit encodings.  */
+  ARM_SCHED_AUTOPREF_OFF			/* Sched L2 autopref.  */
 };
 
 /* Cortex-M7 tuning.  */
@@ -1991,7 +2018,8 @@ const struct tune_params arm_cortex_m7_tune =
   {true, true},					/* Prefer non short circuit.  */
   &arm_default_vec_cost,                        /* Vectorizer costs.  */
   false,                                        /* Prefer Neon for 64-bits bitops.  */
-  false, false                                 /* Prefer 32-bit encodings.  */
+  false, false,                                 /* Prefer 32-bit encodings.  */
+  ARM_SCHED_AUTOPREF_OFF			/* Sched L2 autopref.  */
 };
 
 /* The arm_v6m_tune is duplicated from arm_cortex_tune, rather than
@@ -2010,7 +2038,8 @@ const struct tune_params arm_v6m_tune =
   {false, false},				/* Prefer non short circuit.  */
   &arm_default_vec_cost,                        /* Vectorizer costs.  */
   false,                                        /* Prefer Neon for 64-bits bitops.  */
-  false, false                                  /* Prefer 32-bit encodings.  */
+  false, false,                                 /* Prefer 32-bit encodings.  */
+  ARM_SCHED_AUTOPREF_OFF			/* Sched L2 autopref.  */
 };
 
 const struct tune_params arm_fa726te_tune =
@@ -2027,7 +2056,8 @@ const struct tune_params arm_fa726te_tune =
   {true, true},					/* Prefer non short circuit.  */
   &arm_default_vec_cost,                        /* Vectorizer costs.  */
   false,                                        /* Prefer Neon for 64-bits bitops.  */
-  false, false                                  /* Prefer 32-bit encodings.  */
+  false, false,                                 /* Prefer 32-bit encodings.  */
+  ARM_SCHED_AUTOPREF_OFF			/* Sched L2 autopref.  */
 };
 
 
@@ -3083,6 +3113,22 @@ arm_option_override (void)
                          global_options.x_param_values,
                          global_options_set.x_param_values);
 
+  /* Look through ready list and all of queue for instructions
+     relevant for L2 auto-prefetcher.  */
+  int param_sched_autopref_queue_depth;
+  if (current_tune->sched_autopref == ARM_SCHED_AUTOPREF_OFF)
+    param_sched_autopref_queue_depth = -1;
+  else if (current_tune->sched_autopref == ARM_SCHED_AUTOPREF_RANK)
+    param_sched_autopref_queue_depth = 0;
+  else if (current_tune->sched_autopref == ARM_SCHED_AUTOPREF_FULL)
+    param_sched_autopref_queue_depth = max_insn_queue_index + 1;
+  else
+    gcc_unreachable ();
+  maybe_set_param_value (PARAM_SCHED_AUTOPREF_QUEUE_DEPTH,
+			 param_sched_autopref_queue_depth,
+                         global_options.x_param_values,
+                         global_options_set.x_param_values);
+
   /* Disable shrink-wrap when optimizing function for size, since it tends to
      generate additional returns.  */
   if (optimize_function_for_size_p (cfun) && TARGET_THUMB2)
@@ -29876,6 +29922,23 @@ arm_issue_rate (void)
     }
 }
 
+/* Return how many instructions should scheduler lookahead to choose the
+   best one.  */
+static int
+arm_first_cycle_multipass_dfa_lookahead (void)
+{
+  int issue_rate = arm_issue_rate ();
+
+  return issue_rate > 1 ? issue_rate : 0;
+}
+
+/* Enable modeling of L2 auto-prefetcher.  */
+static int
+arm_first_cycle_multipass_dfa_lookahead_guard (rtx insn, int ready_index)
+{
+  return autopref_multipass_dfa_lookahead_guard (insn, ready_index);
+}
+
 /* A table and a function to perform ARM-specific name mangling for
    NEON vector types in order to conform to the AAPCS (see "Procedure
    Call Standard for the ARM Architecture", Appendix A).  To qualify
diff --git a/gcc/config/arm/t-arm b/gcc/config/arm/t-arm
index 99bd696e411..2ad7bf3ec17 100644
--- a/gcc/config/arm/t-arm
+++ b/gcc/config/arm/t-arm
@@ -90,7 +90,8 @@ arm.o: $(srcdir)/config/arm/arm.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) \
   $(EXPR_H) $(OPTABS_H) $(RECOG_H) $(CGRAPH_H) \
   $(GGC_H) except.h $(C_PRAGMA_H) $(TM_P_H) \
   $(TARGET_H) $(TARGET_DEF_H) debug.h langhooks.h $(DF_H) \
-  intl.h libfuncs.h $(PARAMS_H) $(OPTS_H) $(srcdir)/config/arm/arm-cores.def \
+  intl.h libfuncs.h $(PARAMS_H) $(OPTS_H) sched-int.h \
+  $(srcdir)/config/arm/arm-cores.def \
   $(srcdir)/config/arm/arm-arches.def $(srcdir)/config/arm/arm-fpus.def \
   $(srcdir)/config/arm/arm_neon_builtins.def
 
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index d6201164dd4..69f4aafa07b 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -26377,7 +26377,7 @@ static int min_insn_size (rtx);
 static void
 core2i7_first_cycle_multipass_filter_ready_try
 (const_ix86_first_cycle_multipass_data_t data,
- char *ready_try, int n_ready, bool first_cycle_insn_p)
+ signed char *ready_try, int n_ready, bool first_cycle_insn_p)
 {
   while (n_ready--)
     {
@@ -26409,7 +26409,8 @@ core2i7_first_cycle_multipass_filter_ready_try
 
 /* Prepare for a new round of multipass lookahead scheduling.  */
 static void
-core2i7_first_cycle_multipass_begin (void *_data, char *ready_try, int n_ready,
+core2i7_first_cycle_multipass_begin (void *_data,
+				     signed char *ready_try, int n_ready,
 				     bool first_cycle_insn_p)
 {
   ix86_first_cycle_multipass_data_t data
@@ -26430,7 +26431,8 @@ core2i7_first_cycle_multipass_begin (void *_data, char *ready_try, int n_ready,
 /* INSN is being issued in current solution.  Account for its impact on
    the decoder model.  */
 static void
-core2i7_first_cycle_multipass_issue (void *_data, char *ready_try, int n_ready,
+core2i7_first_cycle_multipass_issue (void *_data,
+				     signed char *ready_try, int n_ready,
 				     rtx insn, const void *_prev_data)
 {
   ix86_first_cycle_multipass_data_t data
@@ -26468,7 +26470,7 @@ core2i7_first_cycle_multipass_issue (void *_data, char *ready_try, int n_ready,
 /* Revert the effect on ready_try.  */
 static void
 core2i7_first_cycle_multipass_backtrack (const void *_data,
-					 char *ready_try,
+					 signed char *ready_try,
 					 int n_ready ATTRIBUTE_UNUSED)
 {
   const_ix86_first_cycle_multipass_data_t data
diff --git a/gcc/config/ia64/ia64.c b/gcc/config/ia64/ia64.c
index 229a0f386b4..7e6ee17be6a 100644
--- a/gcc/config/ia64/ia64.c
+++ b/gcc/config/ia64/ia64.c
@@ -169,8 +169,7 @@ static int ia64_first_cycle_multipass_dfa_lookahead (void);
 static void ia64_dependencies_evaluation_hook (rtx, rtx);
 static void ia64_init_dfa_pre_cycle_insn (void);
 static rtx ia64_dfa_pre_cycle_insn (void);
-static int ia64_first_cycle_multipass_dfa_lookahead_guard (rtx);
-static bool ia64_first_cycle_multipass_dfa_lookahead_guard_spec (const_rtx);
+static int ia64_first_cycle_multipass_dfa_lookahead_guard (rtx, int);
 static int ia64_dfa_new_cycle (FILE *, int, rtx, int, int, int *);
 static void ia64_h_i_d_extended (void);
 static void * ia64_alloc_sched_context (void);
@@ -496,10 +495,6 @@ static const struct attribute_spec ia64_attribute_table[] =
 #undef TARGET_SCHED_GEN_SPEC_CHECK
 #define TARGET_SCHED_GEN_SPEC_CHECK ia64_gen_spec_check
 
-#undef TARGET_SCHED_FIRST_CYCLE_MULTIPASS_DFA_LOOKAHEAD_GUARD_SPEC
-#define TARGET_SCHED_FIRST_CYCLE_MULTIPASS_DFA_LOOKAHEAD_GUARD_SPEC\
-  ia64_first_cycle_multipass_dfa_lookahead_guard_spec
-
 #undef TARGET_SCHED_SKIP_RTX_P
 #define TARGET_SCHED_SKIP_RTX_P ia64_skip_rtx_p
 
@@ -7531,32 +7526,30 @@ ia64_variable_issue (FILE *dump ATTRIBUTE_UNUSED,
   return 1;
 }
 
-/* We are choosing insn from the ready queue.  Return nonzero if INSN
+/* We are choosing insn from the ready queue.  Return zero if INSN
    can be chosen.  */
 
 static int
-ia64_first_cycle_multipass_dfa_lookahead_guard (rtx insn)
+ia64_first_cycle_multipass_dfa_lookahead_guard (rtx insn, int ready_index)
 {
   gcc_assert (insn && INSN_P (insn));
-  return ((!reload_completed
-	   || !safe_group_barrier_needed (insn))
-	  && ia64_first_cycle_multipass_dfa_lookahead_guard_spec (insn)
-	  && (!mflag_sched_mem_insns_hard_limit
-	      || !is_load_p (insn)
-	      || mem_ops_in_group[current_cycle % 4] < ia64_max_memory_insns));
-}
 
-/* We are choosing insn from the ready queue.  Return nonzero if INSN
-   can be chosen.  */
+  /* Size of ALAT is 32.  As far as we perform conservative
+     data speculation, we keep ALAT half-empty.  */
+  if ((TODO_SPEC (insn) & BEGIN_DATA) && pending_data_specs >= 16)
+    return ready_index == 0 ? -1 : 1;
 
-static bool
-ia64_first_cycle_multipass_dfa_lookahead_guard_spec (const_rtx insn)
-{
-  gcc_assert (insn  && INSN_P (insn));
-  /* Size of ALAT is 32.  As far as we perform conservative data speculation,
-     we keep ALAT half-empty.  */
-  return (pending_data_specs < 16
-	  || !(TODO_SPEC (insn) & BEGIN_DATA));
+  if (ready_index == 0)
+    return 0;
+
+  if ((!reload_completed
+       || !safe_group_barrier_needed (insn))
+      && (!mflag_sched_mem_insns_hard_limit
+	  || !is_load_p (insn)
+	  || mem_ops_in_group[current_cycle % 4] < ia64_max_memory_insns))
+    return 0;
+
+  return 1;
 }
 
 /* The following variable value is pseudo-insn used by the DFA insn
@@ -7943,17 +7936,9 @@ ia64_set_sched_flags (spec_info_t spec_info)
 	  
 	  spec_info->flags = 0;
       
-	  if ((mask & DATA_SPEC) && mflag_sched_prefer_non_data_spec_insns)
-	    spec_info->flags |= PREFER_NON_DATA_SPEC;
-
-	  if (mask & CONTROL_SPEC)
-	    {
-	      if (mflag_sched_prefer_non_control_spec_insns)
-		spec_info->flags |= PREFER_NON_CONTROL_SPEC;
-
-	      if (sel_sched_p () && mflag_sel_sched_dont_check_control_spec)
-		spec_info->flags |= SEL_SCHED_SPEC_DONT_CHECK_CONTROL;
-	    }
+	  if ((mask & CONTROL_SPEC)
+	      && sel_sched_p () && mflag_sel_sched_dont_check_control_spec)
+	    spec_info->flags |= SEL_SCHED_SPEC_DONT_CHECK_CONTROL;
 
 	  if (sched_verbose >= 1)
 	    spec_info->dump = sched_dump;
diff --git a/gcc/config/ia64/ia64.opt b/gcc/config/ia64/ia64.opt
index 0fd43922685..60577a1528d 100644
--- a/gcc/config/ia64/ia64.opt
+++ b/gcc/config/ia64/ia64.opt
@@ -164,12 +164,10 @@ Target Report Var(mflag_sched_spec_control_ldc) Init(0)
 Use simple data speculation check for control speculation
 
 msched-prefer-non-data-spec-insns
-Target Report Var(mflag_sched_prefer_non_data_spec_insns) Init(0)
-If set, data speculative instructions will be chosen for schedule only if there are no other choices at the moment 
+Target Ignore Warn(switch %qs is no longer supported)
 
 msched-prefer-non-control-spec-insns
-Target Report Var(mflag_sched_prefer_non_control_spec_insns) Init(0)
-If set, control speculative instructions will be chosen for schedule only if there are no other choices at the moment 
+Target Ignore Warn(switch %qs is no longer supported)
 
 msched-count-spec-in-critical-path
 Target Report Var(mflag_sched_count_spec_in_critical_path) Init(0)
diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index f1068f7e9d8..8778011016a 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -26949,22 +26949,25 @@ rs6000_use_sched_lookahead (void)
     }
 }
 
-/* We are choosing insn from the ready queue.  Return nonzero if INSN can be chosen.  */
+/* We are choosing insn from the ready queue.  Return zero if INSN can be
+   chosen.  */
 static int
-rs6000_use_sched_lookahead_guard (rtx insn)
+rs6000_use_sched_lookahead_guard (rtx insn, int ready_index)
 {
+  if (ready_index == 0)
+    return 0;
+
   if (rs6000_cpu_attr != CPU_CELL)
-    return 1;
+    return 0;
 
-   if (insn == NULL_RTX || !INSN_P (insn))
-     abort ();
+  gcc_assert (insn != NULL_RTX && INSN_P (insn));
 
   if (!reload_completed
       || is_nonpipeline_insn (insn)
       || is_microcoded_insn (insn))
-    return 0;
+    return 1;
 
-  return 1;
+  return 0;
 }
 
 /* Determine if PAT refers to memory. If so, set MEM_REF to the MEM rtx
diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index 1720084c119..c5888c10fcb 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -6688,26 +6688,32 @@ schedules to choose the best one.
 The default is no multipass scheduling.
 @end deftypefn
 
-@deftypefn {Target Hook} int TARGET_SCHED_FIRST_CYCLE_MULTIPASS_DFA_LOOKAHEAD_GUARD (rtx @var{insn})
+@deftypefn {Target Hook} int TARGET_SCHED_FIRST_CYCLE_MULTIPASS_DFA_LOOKAHEAD_GUARD (rtx @var{insn}, int @var{ready_index})
 
 This hook controls what insns from the ready insn queue will be
 considered for the multipass insn scheduling.  If the hook returns
-zero for @var{insn}, the insn will be not chosen to
-be issued.
+zero for @var{insn}, the insn will be considered in multipass scheduling.
+Positive return values will remove @var{insn} from consideration on
+the current round of multipass scheduling.
+Negative return values will remove @var{insn} from consideration for given
+number of cycles.
+Backends should be careful about returning non-zero for highest priority
+instruction at position 0 in the ready list.  @var{ready_index} is passed
+to allow backends make correct judgements.
 
 The default is that any ready insns can be chosen to be issued.
 @end deftypefn
 
-@deftypefn {Target Hook} void TARGET_SCHED_FIRST_CYCLE_MULTIPASS_BEGIN (void *@var{data}, char *@var{ready_try}, int @var{n_ready}, bool @var{first_cycle_insn_p})
+@deftypefn {Target Hook} void TARGET_SCHED_FIRST_CYCLE_MULTIPASS_BEGIN (void *@var{data}, signed char *@var{ready_try}, int @var{n_ready}, bool @var{first_cycle_insn_p})
 This hook prepares the target backend for a new round of multipass
 scheduling.
 @end deftypefn
 
-@deftypefn {Target Hook} void TARGET_SCHED_FIRST_CYCLE_MULTIPASS_ISSUE (void *@var{data}, char *@var{ready_try}, int @var{n_ready}, rtx @var{insn}, const void *@var{prev_data})
+@deftypefn {Target Hook} void TARGET_SCHED_FIRST_CYCLE_MULTIPASS_ISSUE (void *@var{data}, signed char *@var{ready_try}, int @var{n_ready}, rtx @var{insn}, const void *@var{prev_data})
 This hook is called when multipass scheduling evaluates instruction INSN.
 @end deftypefn
 
-@deftypefn {Target Hook} void TARGET_SCHED_FIRST_CYCLE_MULTIPASS_BACKTRACK (const void *@var{data}, char *@var{ready_try}, int @var{n_ready})
+@deftypefn {Target Hook} void TARGET_SCHED_FIRST_CYCLE_MULTIPASS_BACKTRACK (const void *@var{data}, signed char *@var{ready_try}, int @var{n_ready})
 This is called when multipass scheduling backtracks from evaluation of
 an instruction.
 @end deftypefn
@@ -6815,19 +6821,6 @@ a pattern for a branchy check corresponding to a simple check denoted by
 @var{insn} should be generated.  In this case @var{label} can't be null.
 @end deftypefn
 
-@deftypefn {Target Hook} bool TARGET_SCHED_FIRST_CYCLE_MULTIPASS_DFA_LOOKAHEAD_GUARD_SPEC (const_rtx @var{insn})
-This hook is used as a workaround for
-@samp{TARGET_SCHED_FIRST_CYCLE_MULTIPASS_DFA_LOOKAHEAD_GUARD} not being
-called on the first instruction of the ready list.  The hook is used to
-discard speculative instructions that stand first in the ready list from
-being scheduled on the current cycle.  If the hook returns @code{false},
-@var{insn} will not be chosen to be issued.
-For non-speculative instructions,
-the hook should always return @code{true}.  For example, in the ia64 backend
-the hook is used to cancel data speculative insns when the ALAT table
-is nearly full.
-@end deftypefn
-
 @deftypefn {Target Hook} void TARGET_SCHED_SET_SCHED_FLAGS (struct spec_info_def *@var{spec_info})
 This hook is used by the insn scheduler to find out what features should be
 enabled/used.
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index 6c17f6d479e..7c3a4bc6a14 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -4968,8 +4968,6 @@ them: try the first ones in this list first.
 
 @hook TARGET_SCHED_GEN_SPEC_CHECK
 
-@hook TARGET_SCHED_FIRST_CYCLE_MULTIPASS_DFA_LOOKAHEAD_GUARD_SPEC
-
 @hook TARGET_SCHED_SET_SCHED_FLAGS
 
 @hook TARGET_SCHED_SMS_RES_MII
diff --git a/gcc/haifa-sched.c b/gcc/haifa-sched.c
index 55dc3e945fe..73d915ff261 100644
--- a/gcc/haifa-sched.c
+++ b/gcc/haifa-sched.c
@@ -239,6 +239,13 @@ struct common_sched_info_def *common_sched_info;
 /* The minimal value of the INSN_TICK of an instruction.  */
 #define MIN_TICK (-max_insn_queue_index)
 
+/* Original order of insns in the ready list.
+   Used to keep order of normal insns while separating DEBUG_INSNs.  */
+#define INSN_RFS_DEBUG_ORIG_ORDER(INSN) (HID (INSN)->rfs_debug_orig_order)
+
+/* The deciding reason for INSN's place in the ready list.  */
+#define INSN_LAST_RFS_WIN(INSN) (HID (INSN)->last_rfs_win)
+
 /* List of important notes we must keep around.  This is a pointer to the
    last element in the list.  */
 rtx note_list;
@@ -345,7 +352,7 @@ size_t dfa_state_size;
 
 /* The following array is used to find the best insn from ready when
    the automaton pipeline interface is used.  */
-char *ready_try = NULL;
+signed char *ready_try = NULL;
 
 /* The ready list.  */
 struct ready_list ready = {NULL, 0, 0, 0, 0};
@@ -827,6 +834,7 @@ add_delay_dependencies (rtx insn)
 /* Forward declarations.  */
 
 static int priority (rtx);
+static int autopref_rank_for_schedule (const rtx , const rtx);
 static int rank_for_schedule (const void *, const void *);
 static void swap_sort (rtx *, int);
 static void queue_insn (rtx, int, const char *);
@@ -859,8 +867,6 @@ static rtx ready_remove_first_dispatch (struct ready_list *ready);
 static void queue_to_ready (struct ready_list *);
 static int early_queue_to_ready (state_t, struct ready_list *);
 
-static void debug_ready_list (struct ready_list *);
-
 /* The following functions are used to implement multi-pass scheduling
    on the first cycle.  */
 static rtx ready_remove (struct ready_list *, int);
@@ -930,6 +936,13 @@ static bitmap saved_reg_live;
 /* Registers mentioned in the current region.  */
 static bitmap region_ref_regs;
 
+/* Effective number of available registers of a given class (see comment
+   in sched_pressure_start_bb).  */
+static int sched_class_regs_num[N_REG_CLASSES];
+/* Number of call_used_regs.  This is a helper for calculating of
+   sched_class_regs_num.  */
+static int call_used_regs_num[N_REG_CLASSES];
+
 /* Initiate register pressure relative info for scheduling the current
    region.  Currently it is only clearing register mentioned in the
    current region.  */
@@ -1113,7 +1126,7 @@ print_curr_reg_pressure (void)
       gcc_assert (curr_reg_pressure[cl] >= 0);
       fprintf (sched_dump, "  %s:%d(%d)", reg_class_names[cl],
 	       curr_reg_pressure[cl],
-	       curr_reg_pressure[cl] - ira_class_hard_regs_num[cl]);
+	       curr_reg_pressure[cl] - sched_class_regs_num[cl]);
     }
   fprintf (sched_dump, "\n");
 }
@@ -1165,6 +1178,12 @@ update_insn_after_change (rtx insn)
   INSN_COST (insn) = -1;
   /* Invalidate INSN_TICK, so it'll be recalculated.  */
   INSN_TICK (insn) = INVALID_TICK;
+
+  /* Invalidate autoprefetch data entry.  */
+  INSN_AUTOPREF_MULTIPASS_DATA (insn)[0].status
+    = AUTOPREF_MULTIPASS_DATA_UNINITIALIZED;
+  INSN_AUTOPREF_MULTIPASS_DATA (insn)[1].status
+    = AUTOPREF_MULTIPASS_DATA_UNINITIALIZED;
 }
 
 
@@ -1690,13 +1709,6 @@ priority (rtx insn)
 /* Macros and functions for keeping the priority queue sorted, and
    dealing with queuing and dequeuing of instructions.  */
 
-#define SCHED_SORT(READY, N_READY)                                   \
-do { if ((N_READY) == 2)				             \
-       swap_sort (READY, N_READY);			             \
-     else if ((N_READY) > 2)                                         \
-         qsort (READY, N_READY, sizeof (rtx), rank_for_schedule); }  \
-while (0)
-
 /* For each pressure class CL, set DEATH[CL] to the number of registers
    in that class that die in INSN.  */
 
@@ -1738,9 +1750,9 @@ setup_insn_reg_pressure_info (rtx insn)
       cl = ira_pressure_classes[i];
       gcc_assert (curr_reg_pressure[cl] >= 0);
       change = (int) pressure_info[i].set_increase - death[cl];
-      before = MAX (0, max_reg_pressure[i] - ira_class_hard_regs_num[cl]);
+      before = MAX (0, max_reg_pressure[i] - sched_class_regs_num[cl]);
       after = MAX (0, max_reg_pressure[i] + change
-		   - ira_class_hard_regs_num[cl]);
+		   - sched_class_regs_num[cl]);
       hard_regno = ira_class_hard_regs[cl][0];
       gcc_assert (hard_regno >= 0);
       mode = reg_raw_mode[hard_regno];
@@ -2077,7 +2089,7 @@ model_update_pressure (struct model_pressure_group *group,
 
       /* Check whether the maximum pressure in the overall schedule
 	 has increased.  (This means that the MODEL_MAX_PRESSURE of
-	 every point <= POINT will need to increae too; see below.)  */
+	 every point <= POINT will need to increase too; see below.)  */
       if (group->limits[pci].pressure < ref_pressure)
 	group->limits[pci].pressure = ref_pressure;
 
@@ -2354,7 +2366,7 @@ must_restore_pattern_p (rtx next, dep_t dep)
 /* Return the cost of increasing the pressure in class CL from FROM to TO.
 
    Here we use the very simplistic cost model that every register above
-   ira_class_hard_regs_num[CL] has a spill cost of 1.  We could use other
+   sched_class_regs_num[CL] has a spill cost of 1.  We could use other
    measures instead, such as one based on MEMORY_MOVE_COST.  However:
 
       (1) In order for an instruction to be scheduled, the higher cost
@@ -2378,7 +2390,7 @@ must_restore_pattern_p (rtx next, dep_t dep)
 static int
 model_spill_cost (int cl, int from, int to)
 {
-  from = MAX (from, ira_class_hard_regs_num[cl]);
+  from = MAX (from, sched_class_regs_num[cl]);
   return MAX (to, from) - from;
 }
 
@@ -2484,7 +2496,7 @@ model_set_excess_costs (rtx *insns, int count)
   bool print_p;
 
   /* Record the baseECC value for each instruction in the model schedule,
-     except that negative costs are converted to zero ones now rather thatn
+     except that negative costs are converted to zero ones now rather than
      later.  Do not assign a cost to debug instructions, since they must
      not change code-generation decisions.  Experiments suggest we also
      get better results by not assigning a cost to instructions from
@@ -2532,6 +2544,62 @@ model_set_excess_costs (rtx *insns, int count)
     }
 }
 
+
+/* Enum of rank_for_schedule heuristic decisions.  */
+enum rfs_decision {
+  RFS_LIVE_RANGE_SHRINK1, RFS_LIVE_RANGE_SHRINK2,
+  RFS_SCHED_GROUP, RFS_PRESSURE_DELAY, RFS_PRESSURE_TICK,
+  RFS_FEEDS_BACKTRACK_INSN, RFS_PRIORITY, RFS_SPECULATION,
+  RFS_SCHED_RANK, RFS_LAST_INSN, RFS_PRESSURE_INDEX,
+  RFS_DEP_COUNT, RFS_TIE, RFS_N };
+
+/* Corresponding strings for print outs.  */
+static const char *rfs_str[RFS_N] = {
+  "RFS_LIVE_RANGE_SHRINK1", "RFS_LIVE_RANGE_SHRINK2",
+  "RFS_SCHED_GROUP", "RFS_PRESSURE_DELAY", "RFS_PRESSURE_TICK",
+  "RFS_FEEDS_BACKTRACK_INSN", "RFS_PRIORITY", "RFS_SPECULATION",
+  "RFS_SCHED_RANK", "RFS_LAST_INSN", "RFS_PRESSURE_INDEX",
+  "RFS_DEP_COUNT", "RFS_TIE" };
+
+/* Statistical breakdown of rank_for_schedule decisions.  */
+typedef struct { unsigned stats[RFS_N]; } rank_for_schedule_stats_t;
+static rank_for_schedule_stats_t rank_for_schedule_stats;
+
+/* Return the result of comparing insns TMP and TMP2 and update
+   Rank_For_Schedule statistics.  */
+static int
+rfs_result (enum rfs_decision decision, int result, rtx tmp, rtx tmp2)
+{
+  ++rank_for_schedule_stats.stats[decision];
+  if (result < 0)
+    INSN_LAST_RFS_WIN (tmp) = decision;
+  else if (result > 0)
+    INSN_LAST_RFS_WIN (tmp2) = decision;
+  else
+    gcc_unreachable ();
+  return result;
+}
+
+/* Sorting predicate to move DEBUG_INSNs to the top of ready list, while
+   keeping normal insns in original order.  */
+
+static int
+rank_for_schedule_debug (const void *x, const void *y)
+{
+  rtx tmp = *(rtx const *) y;
+  rtx tmp2 = *(rtx const *) x;
+
+  /* Schedule debug insns as early as possible.  */
+  if (DEBUG_INSN_P (tmp) && !DEBUG_INSN_P (tmp2))
+    return -1;
+  else if (!DEBUG_INSN_P (tmp) && DEBUG_INSN_P (tmp2))
+    return 1;
+  else if (DEBUG_INSN_P (tmp) && DEBUG_INSN_P (tmp2))
+    return INSN_LUID (tmp) - INSN_LUID (tmp2);
+  else
+    return INSN_RFS_DEBUG_ORIG_ORDER (tmp2) - INSN_RFS_DEBUG_ORIG_ORDER (tmp);
+}
+
 /* Returns a positive value if x is preferred; returns a negative value if
    y is preferred.  Should never return 0, since that will make the sort
    unstable.  */
@@ -2544,17 +2612,6 @@ rank_for_schedule (const void *x, const void *y)
   int tmp_class, tmp2_class;
   int val, priority_val, info_val, diff;
 
-  if (MAY_HAVE_DEBUG_INSNS)
-    {
-      /* Schedule debug insns as early as possible.  */
-      if (DEBUG_INSN_P (tmp) && !DEBUG_INSN_P (tmp2))
-	return -1;
-      else if (!DEBUG_INSN_P (tmp) && DEBUG_INSN_P (tmp2))
-	return 1;
-      else if (DEBUG_INSN_P (tmp) && DEBUG_INSN_P (tmp2))
-	return INSN_LUID (tmp) - INSN_LUID (tmp2);
-    }
-
   if (live_range_shrinkage_p)
     {
       /* Don't use SCHED_PRESSURE_MODEL -- it results in much worse
@@ -2564,17 +2621,19 @@ rank_for_schedule (const void *x, const void *y)
 	   || INSN_REG_PRESSURE_EXCESS_COST_CHANGE (tmp2) < 0)
 	  && (diff = (INSN_REG_PRESSURE_EXCESS_COST_CHANGE (tmp)
 		      - INSN_REG_PRESSURE_EXCESS_COST_CHANGE (tmp2))) != 0)
-	return diff;
+	return rfs_result (RFS_LIVE_RANGE_SHRINK1, diff, tmp, tmp2);
       /* Sort by INSN_LUID (original insn order), so that we make the
 	 sort stable.  This minimizes instruction movement, thus
 	 minimizing sched's effect on debugging and cross-jumping.  */
-      return INSN_LUID (tmp) - INSN_LUID (tmp2);
+      return rfs_result (RFS_LIVE_RANGE_SHRINK2,
+			 INSN_LUID (tmp) - INSN_LUID (tmp2), tmp, tmp2);
     }
 
   /* The insn in a schedule group should be issued the first.  */
   if (flag_sched_group_heuristic &&
       SCHED_GROUP_P (tmp) != SCHED_GROUP_P (tmp2))
-    return SCHED_GROUP_P (tmp2) ? 1 : -1;
+    return rfs_result (RFS_SCHED_GROUP, SCHED_GROUP_P (tmp2) ? 1 : -1,
+		       tmp, tmp2);
 
   /* Make sure that priority of TMP and TMP2 are initialized.  */
   gcc_assert (INSN_PRIORITY_KNOWN (tmp) && INSN_PRIORITY_KNOWN (tmp2));
@@ -2587,18 +2646,15 @@ rank_for_schedule (const void *x, const void *y)
 		   + insn_delay (tmp)
 		   - INSN_REG_PRESSURE_EXCESS_COST_CHANGE (tmp2)
 		   - insn_delay (tmp2))))
-	return diff;
+	return rfs_result (RFS_PRESSURE_DELAY, diff, tmp, tmp2);
     }
 
   if (sched_pressure != SCHED_PRESSURE_NONE
-      && (INSN_TICK (tmp2) > clock_var || INSN_TICK (tmp) > clock_var))
+      && (INSN_TICK (tmp2) > clock_var || INSN_TICK (tmp) > clock_var)
+      && INSN_TICK (tmp2) != INSN_TICK (tmp))
     {
-      if (INSN_TICK (tmp) <= clock_var)
-	return -1;
-      else if (INSN_TICK (tmp2) <= clock_var)
-	return 1;
-      else
-	return INSN_TICK (tmp) - INSN_TICK (tmp2);
+      diff = INSN_TICK (tmp) - INSN_TICK (tmp2);
+      return rfs_result (RFS_PRESSURE_TICK, diff, tmp, tmp2);
     }
 
   /* If we are doing backtracking in this schedule, prefer insns that
@@ -2608,14 +2664,21 @@ rank_for_schedule (const void *x, const void *y)
     {
       priority_val = FEEDS_BACKTRACK_INSN (tmp2) - FEEDS_BACKTRACK_INSN (tmp);
       if (priority_val)
-	return priority_val;
+	return rfs_result (RFS_FEEDS_BACKTRACK_INSN, priority_val, tmp, tmp2);
     }
 
   /* Prefer insn with higher priority.  */
   priority_val = INSN_PRIORITY (tmp2) - INSN_PRIORITY (tmp);
 
   if (flag_sched_critical_path_heuristic && priority_val)
-    return priority_val;
+    return rfs_result (RFS_PRIORITY, priority_val, tmp, tmp2);
+
+  if (PARAM_VALUE (PARAM_SCHED_AUTOPREF_QUEUE_DEPTH) >= 0)
+    {
+      int autopref = autopref_rank_for_schedule (tmp, tmp2);
+      if (autopref != 0)
+	return autopref;
+    }
 
   /* Prefer speculative insn with greater dependencies weakness.  */
   if (flag_sched_spec_insn_heuristic && spec_info)
@@ -2638,12 +2701,12 @@ rank_for_schedule (const void *x, const void *y)
 
       dw = dw2 - dw1;
       if (dw > (NO_DEP_WEAK / 8) || dw < -(NO_DEP_WEAK / 8))
-	return dw;
+	return rfs_result (RFS_SPECULATION, dw, tmp, tmp2);
     }
 
   info_val = (*current_sched_info->rank) (tmp, tmp2);
   if (flag_sched_rank_heuristic && info_val)
-    return info_val;
+    return rfs_result (RFS_SCHED_RANK, info_val, tmp, tmp2);
 
   /* Compare insns based on their relation to the last scheduled
      non-debug insn.  */
@@ -2679,17 +2742,16 @@ rank_for_schedule (const void *x, const void *y)
 	tmp2_class = 2;
 
       if ((val = tmp2_class - tmp_class))
-	return val;
+	return rfs_result (RFS_LAST_INSN, val, tmp, tmp2);
     }
 
   /* Prefer instructions that occur earlier in the model schedule.  */
-  if (sched_pressure == SCHED_PRESSURE_MODEL)
+  if (sched_pressure == SCHED_PRESSURE_MODEL
+      && INSN_BB (tmp) == target_bb && INSN_BB (tmp2) == target_bb)
     {
-      int diff;
-
       diff = model_index (tmp) - model_index (tmp2);
-      if (diff != 0)
-	return diff;
+      gcc_assert (diff != 0);
+      return rfs_result (RFS_PRESSURE_INDEX, diff, tmp, tmp2);
     }
 
   /* Prefer the insn which has more later insns that depend on it.
@@ -2700,12 +2762,12 @@ rank_for_schedule (const void *x, const void *y)
 	 - dep_list_size (tmp, SD_LIST_FORW));
 
   if (flag_sched_dep_count_heuristic && val != 0)
-    return val;
+    return rfs_result (RFS_DEP_COUNT, val, tmp, tmp2);
 
   /* If insns are equally good, sort by INSN_LUID (original insn order),
      so that we make the sort stable.  This minimizes instruction movement,
      thus minimizing sched's effect on debugging and cross-jumping.  */
-  return INSN_LUID (tmp) - INSN_LUID (tmp2);
+  return rfs_result (RFS_TIE, INSN_LUID (tmp) - INSN_LUID (tmp2), tmp, tmp2);
 }
 
 /* Resort the array A in which only element at index N may be out of order.  */
@@ -2910,25 +2972,98 @@ ready_remove_insn (rtx insn)
   gcc_unreachable ();
 }
 
-/* Sort the ready list READY by ascending priority, using the SCHED_SORT
-   macro.  */
+/* Calculate difference of two statistics set WAS and NOW.
+   Result returned in WAS.  */
+static void
+rank_for_schedule_stats_diff (rank_for_schedule_stats_t *was,
+			      const rank_for_schedule_stats_t *now)
+{
+  for (int i = 0; i < RFS_N; ++i)
+    was->stats[i] = now->stats[i] - was->stats[i];
+}
 
-void
-ready_sort (struct ready_list *ready)
+/* Print rank_for_schedule statistics.  */
+static void
+print_rank_for_schedule_stats (const char *prefix,
+			       const rank_for_schedule_stats_t *stats,
+			       struct ready_list *ready)
+{
+  for (int i = 0; i < RFS_N; ++i)
+    if (stats->stats[i])
+      {
+	fprintf (sched_dump, "%s%20s: %u", prefix, rfs_str[i], stats->stats[i]);
+
+	if (ready != NULL)
+	  /* Print out insns that won due to RFS_<I>.  */
+	  {
+	    rtx *p = ready_lastpos (ready);
+
+	    fprintf (sched_dump, ":");
+	    /* Start with 1 since least-priority insn didn't have any wins.  */
+	    for (int j = 1; j < ready->n_ready; ++j)
+	      if (INSN_LAST_RFS_WIN (p[j]) == i)
+		fprintf (sched_dump, " %s",
+			 (*current_sched_info->print_insn) (p[j], 0));
+	  }
+	fprintf (sched_dump, "\n");
+      }
+}
+
+/* Separate DEBUG_INSNS from normal insns.  DEBUG_INSNs go to the end
+   of array.  */
+static void
+ready_sort_debug (struct ready_list *ready)
+{
+  int i;
+  rtx *first = ready_lastpos (ready);
+
+  for (i = 0; i < ready->n_ready; ++i)
+    if (!DEBUG_INSN_P (first[i]))
+      INSN_RFS_DEBUG_ORIG_ORDER (first[i]) = i;
+
+  qsort (first, ready->n_ready, sizeof (rtx), rank_for_schedule_debug);
+}
+
+/* Sort non-debug insns in the ready list READY by ascending priority.
+   Assumes that all debug insns are separated from the real insns.  */
+static void
+ready_sort_real (struct ready_list *ready)
 {
   int i;
   rtx *first = ready_lastpos (ready);
+  int n_ready_real = ready->n_ready - ready->n_debug;
 
   if (sched_pressure == SCHED_PRESSURE_WEIGHTED)
+    for (i = 0; i < n_ready_real; ++i)
+      setup_insn_reg_pressure_info (first[i]);
+  else if (sched_pressure == SCHED_PRESSURE_MODEL
+	   && model_curr_point < model_num_insns)
+    model_set_excess_costs (first, n_ready_real);
+
+  rank_for_schedule_stats_t stats1;
+  if (sched_verbose >= 4)
+    stats1 = rank_for_schedule_stats;
+
+  if (n_ready_real == 2)
+    swap_sort (first, n_ready_real);
+  else if (n_ready_real > 2)
+    qsort (first, n_ready_real, sizeof (rtx), rank_for_schedule);
+
+  if (sched_verbose >= 4)
     {
-      for (i = 0; i < ready->n_ready; i++)
-	if (!DEBUG_INSN_P (first[i]))
-	  setup_insn_reg_pressure_info (first[i]);
+      rank_for_schedule_stats_diff (&stats1, &rank_for_schedule_stats);
+      print_rank_for_schedule_stats (";;\t\t", &stats1, ready);
     }
-  if (sched_pressure == SCHED_PRESSURE_MODEL
-      && model_curr_point < model_num_insns)
-    model_set_excess_costs (first, ready->n_ready);
-  SCHED_SORT (first, ready->n_ready);
+}
+
+/* Sort the ready list READY by ascending priority.  */
+static void
+ready_sort (struct ready_list *ready)
+{
+  if (ready->n_debug > 0)
+    ready_sort_debug (ready);
+  else
+    ready_sort_real (ready);
 }
 
 /* PREV is an insn that is ready to execute.  Adjust its priority if that
@@ -2976,7 +3111,7 @@ HAIFA_INLINE static void
 advance_one_cycle (void)
 {
   advance_state (curr_state);
-  if (sched_verbose >= 6)
+  if (sched_verbose >= 4)
     fprintf (sched_dump, ";;\tAdvance the current state.\n");
 }
 
@@ -3675,15 +3810,13 @@ model_dump_pressure_summary (void)
    scheduling region.  */
 
 static void
-model_start_schedule (void)
+model_start_schedule (basic_block bb)
 {
-  basic_block bb;
-
   model_next_priority = 1;
   model_schedule.create (sched_max_luid);
   model_insns = XCNEWVEC (struct model_insn_info, sched_max_luid);
 
-  bb = BLOCK_FOR_INSN (NEXT_INSN (current_sched_info->prev_head));
+  gcc_assert (bb == BLOCK_FOR_INSN (NEXT_INSN (current_sched_info->prev_head)));
   initiate_reg_pressure_info (df_get_live_in (bb));
 
   model_analyze_insns ();
@@ -3721,6 +3854,53 @@ model_end_schedule (void)
   model_finalize_pressure_group (&model_before_pressure);
   model_schedule.release ();
 }
+
+/* Prepare reg pressure scheduling for basic block BB.  */
+static void
+sched_pressure_start_bb (basic_block bb)
+{
+  /* Set the number of available registers for each class taking into account
+     relative probability of current basic block versus function prologue and
+     epilogue.
+     * If the basic block executes much more often than the prologue/epilogue
+     (e.g., inside a hot loop), then cost of spill in the prologue is close to
+     nil, so the effective number of available registers is
+     (ira_class_hard_regs_num[cl] - 0).
+     * If the basic block executes as often as the prologue/epilogue,
+     then spill in the block is as costly as in the prologue, so the effective
+     number of available registers is
+     (ira_class_hard_regs_num[cl] - call_used_regs_num[cl]).
+     Note that all-else-equal, we prefer to spill in the prologue, since that
+     allows "extra" registers for other basic blocks of the function.
+     * If the basic block is on the cold path of the function and executes
+     rarely, then we should always prefer to spill in the block, rather than
+     in the prologue/epilogue.  The effective number of available register is
+     (ira_class_hard_regs_num[cl] - call_used_regs_num[cl]).  */
+  {
+    int i;
+    int entry_freq = ENTRY_BLOCK_PTR_FOR_FN (cfun)->frequency;
+    int bb_freq = bb->frequency;
+
+    if (bb_freq == 0)
+      {
+	if (entry_freq == 0)
+	  entry_freq = bb_freq = 1;
+      }
+    if (bb_freq < entry_freq)
+      bb_freq = entry_freq;
+
+    for (i = 0; i < ira_pressure_classes_num; ++i)
+      {
+	enum reg_class cl = ira_pressure_classes[i];
+	sched_class_regs_num[cl] = ira_class_hard_regs_num[cl];
+	sched_class_regs_num[cl]
+	  -= (call_used_regs_num[cl] * entry_freq) / bb_freq;
+      }
+  }
+
+  if (sched_pressure == SCHED_PRESSURE_MODEL)
+    model_start_schedule (bb);
+}
 
 /* A structure that holds local state for the loop in schedule_block.  */
 struct sched_block_state
@@ -3755,7 +3935,7 @@ schedule_insn (rtx insn)
   if (sched_verbose >= 1)
     {
       struct reg_pressure_data *pressure_info;
-      fprintf (sched_dump, ";;\t%3i--> %s%-40s:",
+      fprintf (sched_dump, ";;\t%3i--> %s %-40s:",
 	       clock_var, (*current_sched_info->print_insn) (insn, 1),
 	       str_pattern_slim (PATTERN (insn)));
 
@@ -3949,6 +4129,10 @@ schedule_insn (rtx insn)
       last_clock_var = clock_var;
     }
 
+  if (nonscheduled_insns_begin != NULL_RTX)
+    /* Indicate to debug counters that INSN is scheduled.  */
+    nonscheduled_insns_begin = insn;
+
   return advance;
 }
 
@@ -4053,6 +4237,7 @@ struct haifa_saved_data
 
   rtx last_scheduled_insn;
   rtx last_nondebug_scheduled_insn;
+  rtx nonscheduled_insns_begin;
   int cycle_issued_insns;
 
   /* Copies of state used in the inner loop of schedule_block.  */
@@ -4125,6 +4310,7 @@ save_backtrack_point (struct delay_pair *pair,
   save->cycle_issued_insns = cycle_issued_insns;
   save->last_scheduled_insn = last_scheduled_insn;
   save->last_nondebug_scheduled_insn = last_nondebug_scheduled_insn;
+  save->nonscheduled_insns_begin = nonscheduled_insns_begin;
 
   save->sched_block = sched_block;
 
@@ -4380,6 +4566,7 @@ restore_last_backtrack_point (struct sched_block_state *psched_block)
   cycle_issued_insns = save->cycle_issued_insns;
   last_scheduled_insn = save->last_scheduled_insn;
   last_nondebug_scheduled_insn = save->last_nondebug_scheduled_insn;
+  nonscheduled_insns_begin = save->nonscheduled_insns_begin;
 
   *psched_block = save->sched_block;
 
@@ -4848,6 +5035,24 @@ undo_all_replacements (void)
     }
 }
 
+/* Return first non-scheduled insn in the current scheduling block.
+   This is mostly used for debug-counter purposes.  */
+static rtx
+first_nonscheduled_insn (void)
+{
+  rtx insn = (nonscheduled_insns_begin != NULL_RTX
+	      ? nonscheduled_insns_begin
+	      : current_sched_info->prev_head);
+
+  do
+    {
+      insn = next_nonnote_nondebug_insn (insn);
+    }
+  while (QUEUE_INDEX (insn) == QUEUE_SCHEDULED);
+
+  return insn;
+}
+
 /* Move insns that became ready to fire from queue to ready list.  */
 
 static void
@@ -4860,16 +5065,9 @@ queue_to_ready (struct ready_list *ready)
   q_ptr = NEXT_Q (q_ptr);
 
   if (dbg_cnt (sched_insn) == false)
-    {
-      /* If debug counter is activated do not requeue the first
-	 nonscheduled insn.  */
-      skip_insn = nonscheduled_insns_begin;
-      do
-	{
-	  skip_insn = next_nonnote_nondebug_insn (skip_insn);
-	}
-      while (QUEUE_INDEX (skip_insn) == QUEUE_SCHEDULED);
-    }
+    /* If debug counter is activated do not requeue the first
+       nonscheduled insn.  */
+    skip_insn = first_nonscheduled_insn ();
   else
     skip_insn = NULL_RTX;
 
@@ -4899,7 +5097,11 @@ queue_to_ready (struct ready_list *ready)
 	       && model_index (insn) == model_curr_point)
 	  && !SCHED_GROUP_P (insn)
 	  && insn != skip_insn)
-	queue_insn (insn, 1, "ready full");
+	{
+	  if (sched_verbose >= 2)
+	    fprintf (sched_dump, "keeping in queue, ready full\n");
+	  queue_insn (insn, 1, "ready full");
+	}
       else
 	{
 	  ready_add (ready, insn, false);
@@ -4944,6 +5146,9 @@ queue_to_ready (struct ready_list *ready)
 
       q_ptr = NEXT_Q_AFTER (q_ptr, stalls);
       clock_var += stalls;
+      if (sched_verbose >= 2)
+	fprintf (sched_dump, ";;\tAdvancing clock by %d cycle[s] to %d\n",
+		 stalls, clock_var);
     }
 }
 
@@ -5104,10 +5309,11 @@ early_queue_to_ready (state_t state, struct ready_list *ready)
 }
 
 
-/* Print the ready list for debugging purposes.  Callable from debugger.  */
-
+/* Print the ready list for debugging purposes.
+   If READY_TRY is non-zero then only print insns that max_issue
+   will consider.  */
 static void
-debug_ready_list (struct ready_list *ready)
+debug_ready_list_1 (struct ready_list *ready, signed char *ready_try)
 {
   rtx *p;
   int i;
@@ -5121,20 +5327,34 @@ debug_ready_list (struct ready_list *ready)
   p = ready_lastpos (ready);
   for (i = 0; i < ready->n_ready; i++)
     {
+      if (ready_try != NULL && ready_try[ready->n_ready - i - 1])
+	continue;
+
       fprintf (sched_dump, "  %s:%d",
 	       (*current_sched_info->print_insn) (p[i], 0),
 	       INSN_LUID (p[i]));
       if (sched_pressure != SCHED_PRESSURE_NONE)
 	fprintf (sched_dump, "(cost=%d",
 		 INSN_REG_PRESSURE_EXCESS_COST_CHANGE (p[i]));
+      fprintf (sched_dump, ":prio=%d", INSN_PRIORITY (p[i]));
       if (INSN_TICK (p[i]) > clock_var)
 	fprintf (sched_dump, ":delay=%d", INSN_TICK (p[i]) - clock_var);
+      if (sched_pressure == SCHED_PRESSURE_MODEL)
+	fprintf (sched_dump, ":idx=%d",
+		 model_index (p[i]));
       if (sched_pressure != SCHED_PRESSURE_NONE)
 	fprintf (sched_dump, ")");
     }
   fprintf (sched_dump, "\n");
 }
 
+/* Print the ready list.  Callable from debugger.  */
+static void
+debug_ready_list (struct ready_list *ready)
+{
+  debug_ready_list_1 (ready, NULL);
+}
+
 /* Search INSN for REG_SAVE_NOTE notes and convert them back into insn
    NOTEs.  This is used for NOTE_INSN_EPILOGUE_BEG, so that sched-ebb
    replaces the epilogue note in the correct basic block.  */
@@ -5257,6 +5477,241 @@ insn_finishes_cycle_p (rtx insn)
   return false;
 }
 
+/* Functions to model cache auto-prefetcher.
+
+   Some of the CPUs have cache auto-prefetcher, which /seems/ to initiate
+   memory prefetches if it sees instructions with consequitive memory accesses
+   in the instruction stream.  Details of such hardware units are not published,
+   so we can only guess what exactly is going on there.
+   In the scheduler, we model abstract auto-prefetcher.  If there are memory
+   insns in the ready list (or the queue) that have same memory base, but
+   different offsets, then we delay the insns with larger offsets until insns
+   with smaller offsets get scheduled.  If PARAM_SCHED_AUTOPREF_QUEUE_DEPTH
+   is "1", then we look at the ready list; if it is N>1, then we also look
+   through N-1 queue entries.
+   If the param is N>=0, then rank_for_schedule will consider auto-prefetching
+   among its heuristics.
+   Param value of "-1" disables modelling of the auto-prefetcher.  */
+
+/* Initialize autoprefetcher model data for INSN.  */
+static void
+autopref_multipass_init (const rtx insn, int write)
+{
+  autopref_multipass_data_t data = &INSN_AUTOPREF_MULTIPASS_DATA (insn)[write];
+
+  gcc_assert (data->status == AUTOPREF_MULTIPASS_DATA_UNINITIALIZED);
+  data->base = NULL_RTX;
+  data->offset = 0;
+  /* Set insn entry initialized, but not relevant for auto-prefetcher.  */
+  data->status = AUTOPREF_MULTIPASS_DATA_IRRELEVANT;
+
+  rtx set = single_set (insn);
+  if (set == NULL_RTX)
+    return;
+
+  rtx mem = write ? SET_DEST (set) : SET_SRC (set);
+  if (!MEM_P (mem))
+    return;
+
+  struct address_info info;
+  decompose_mem_address (&info, mem);
+
+  /* TODO: Currently only (base+const) addressing is supported.  */
+  if (info.base == NULL || !REG_P (*info.base)
+      || (info.disp != NULL && !CONST_INT_P (*info.disp)))
+    return;
+
+  /* This insn is relevant for auto-prefetcher.  */
+  data->base = *info.base;
+  data->offset = info.disp ? INTVAL (*info.disp) : 0;
+  data->status = AUTOPREF_MULTIPASS_DATA_NORMAL;
+}
+
+/* Helper function for rank_for_schedule sorting.  */
+static int
+autopref_rank_for_schedule (const rtx insn1, const rtx insn2)
+{
+  for (int write = 0; write < 2; ++write)
+    {
+      autopref_multipass_data_t data1
+	= &INSN_AUTOPREF_MULTIPASS_DATA (insn1)[write];
+      autopref_multipass_data_t data2
+	= &INSN_AUTOPREF_MULTIPASS_DATA (insn2)[write];
+
+      if (data1->status == AUTOPREF_MULTIPASS_DATA_UNINITIALIZED)
+	autopref_multipass_init (insn1, write);
+      if (data1->status == AUTOPREF_MULTIPASS_DATA_IRRELEVANT)
+	continue;
+
+      if (data2->status == AUTOPREF_MULTIPASS_DATA_UNINITIALIZED)
+	autopref_multipass_init (insn2, write);
+      if (data2->status == AUTOPREF_MULTIPASS_DATA_IRRELEVANT)
+	continue;
+
+      if (!rtx_equal_p (data1->base, data2->base))
+	continue;
+
+      return data1->offset - data2->offset;
+    }
+
+  return 0;
+}
+
+/* True if header of debug dump was printed.  */
+static bool autopref_multipass_dfa_lookahead_guard_started_dump_p;
+
+/* Helper for autopref_multipass_dfa_lookahead_guard.
+   Return "1" if INSN1 should be delayed in favor of INSN2.  */
+static int
+autopref_multipass_dfa_lookahead_guard_1 (const rtx insn1,
+					  const rtx insn2, int write)
+{
+  autopref_multipass_data_t data1
+    = &INSN_AUTOPREF_MULTIPASS_DATA (insn1)[write];
+  autopref_multipass_data_t data2
+    = &INSN_AUTOPREF_MULTIPASS_DATA (insn2)[write];
+
+  if (data2->status == AUTOPREF_MULTIPASS_DATA_UNINITIALIZED)
+    autopref_multipass_init (insn2, write);
+  if (data2->status == AUTOPREF_MULTIPASS_DATA_IRRELEVANT)
+    return 0;
+
+  if (rtx_equal_p (data1->base, data2->base)
+      && data1->offset > data2->offset)
+    {
+      if (sched_verbose >= 2)
+	{
+          if (!autopref_multipass_dfa_lookahead_guard_started_dump_p)
+	    {
+	      fprintf (sched_dump,
+		       ";;\t\tnot trying in max_issue due to autoprefetch "
+		       "model: ");
+	      autopref_multipass_dfa_lookahead_guard_started_dump_p = true;
+	    }
+
+	  fprintf (sched_dump, " %d(%d)", INSN_UID (insn1), INSN_UID (insn2));
+	}
+
+      return 1;
+    }
+
+  return 0;
+}
+
+/* General note:
+
+   We could have also hooked autoprefetcher model into
+   first_cycle_multipass_backtrack / first_cycle_multipass_issue hooks
+   to enable intelligent selection of "[r1+0]=r2; [r1+4]=r3" on the same cycle
+   (e.g., once "[r1+0]=r2" is issued in max_issue(), "[r1+4]=r3" gets
+   unblocked).  We don't bother about this yet because target of interest
+   (ARM Cortex-A15) can issue only 1 memory operation per cycle.  */
+
+/* Implementation of first_cycle_multipass_dfa_lookahead_guard hook.
+   Return "1" if INSN1 should not be considered in max_issue due to
+   auto-prefetcher considerations.  */
+int
+autopref_multipass_dfa_lookahead_guard (rtx insn1, int ready_index)
+{
+  int r = 0;
+
+  if (PARAM_VALUE (PARAM_SCHED_AUTOPREF_QUEUE_DEPTH) <= 0)
+    return 0;
+
+  if (sched_verbose >= 2 && ready_index == 0)
+    autopref_multipass_dfa_lookahead_guard_started_dump_p = false;
+
+  for (int write = 0; write < 2; ++write)
+    {
+      autopref_multipass_data_t data1
+	= &INSN_AUTOPREF_MULTIPASS_DATA (insn1)[write];
+
+      if (data1->status == AUTOPREF_MULTIPASS_DATA_UNINITIALIZED)
+	autopref_multipass_init (insn1, write);
+      if (data1->status == AUTOPREF_MULTIPASS_DATA_IRRELEVANT)
+	continue;
+
+      if (ready_index == 0
+	  && data1->status == AUTOPREF_MULTIPASS_DATA_DONT_DELAY)
+	/* We allow only a single delay on priviledged instructions.
+	   Doing otherwise would cause infinite loop.  */
+	{
+	  if (sched_verbose >= 2)
+	    {
+	      if (!autopref_multipass_dfa_lookahead_guard_started_dump_p)
+		{
+		  fprintf (sched_dump,
+			   ";;\t\tnot trying in max_issue due to autoprefetch "
+			   "model: ");
+		  autopref_multipass_dfa_lookahead_guard_started_dump_p = true;
+		}
+
+	      fprintf (sched_dump, " *%d*", INSN_UID (insn1));
+	    }
+	  continue;
+	}
+
+      for (int i2 = 0; i2 < ready.n_ready; ++i2)
+	{
+	  rtx insn2 = get_ready_element (i2);
+	  if (insn1 == insn2)
+	    continue;
+	  r = autopref_multipass_dfa_lookahead_guard_1 (insn1, insn2, write);
+	  if (r)
+	    {
+	      if (ready_index == 0)
+		{
+		  r = -1;
+		  data1->status = AUTOPREF_MULTIPASS_DATA_DONT_DELAY;
+		}
+	      goto finish;
+	    }
+	}
+
+      if (PARAM_VALUE (PARAM_SCHED_AUTOPREF_QUEUE_DEPTH) == 1)
+	continue;
+
+      /* Everything from the current queue slot should have been moved to
+	 the ready list.  */
+      gcc_assert (insn_queue[NEXT_Q_AFTER (q_ptr, 0)] == NULL_RTX);
+
+      int n_stalls = PARAM_VALUE (PARAM_SCHED_AUTOPREF_QUEUE_DEPTH) - 1;
+      if (n_stalls > max_insn_queue_index)
+	n_stalls = max_insn_queue_index;
+
+      for (int stalls = 1; stalls <= n_stalls; ++stalls)
+	{
+	  for (rtx link = insn_queue[NEXT_Q_AFTER (q_ptr, stalls)];
+	       link != NULL_RTX;
+	       link = XEXP (link, 1))
+	    {
+	      rtx insn2 = XEXP (link, 0);
+	      r = autopref_multipass_dfa_lookahead_guard_1 (insn1, insn2,
+							    write);
+	      if (r)
+		{
+		  /* Queue INSN1 until INSN2 can issue.  */
+		  r = -stalls;
+		  if (ready_index == 0)
+		    data1->status = AUTOPREF_MULTIPASS_DATA_DONT_DELAY;
+		  goto finish;
+		}
+	    }
+	}
+    }
+
+    finish:
+  if (sched_verbose >= 2
+      && autopref_multipass_dfa_lookahead_guard_started_dump_p
+      && (ready_index == ready.n_ready - 1 || r < 0))
+    /* This does not /always/ trigger.  We don't output EOL if the last
+       insn is not recognized (INSN_CODE < 0) and lookahead_guard is not
+       called.  We can live with this.  */
+    fprintf (sched_dump, "\n");
+
+  return r;
+}
+
 /* Define type for target data used in multipass scheduling.  */
 #ifndef TARGET_SCHED_FIRST_CYCLE_MULTIPASS_DATA_T
 # define TARGET_SCHED_FIRST_CYCLE_MULTIPASS_DATA_T int
@@ -5296,15 +5751,6 @@ int dfa_lookahead;
    could achieve DFA_LOOKAHEAD ** N , where N is the queue length.  */
 static int max_lookahead_tries;
 
-/* The following value is value of hook
-   `first_cycle_multipass_dfa_lookahead' at the last call of
-   `max_issue'.  */
-static int cached_first_cycle_multipass_dfa_lookahead = 0;
-
-/* The following value is value of `issue_rate' at the last call of
-   `sched_init'.  */
-static int cached_issue_rate = 0;
-
 /* The following function returns maximal (or close to maximal) number
    of insns which can be issued on the same cycle and one of which
    insns is insns with the best rank (the first insn in READY).  To
@@ -5333,9 +5779,8 @@ max_issue (struct ready_list *ready, int privileged_n, state_t state,
 	      && privileged_n <= n_ready);
 
   /* Init MAX_LOOKAHEAD_TRIES.  */
-  if (cached_first_cycle_multipass_dfa_lookahead != dfa_lookahead)
+  if (max_lookahead_tries == 0)
     {
-      cached_first_cycle_multipass_dfa_lookahead = dfa_lookahead;
       max_lookahead_tries = 100;
       for (i = 0; i < issue_rate; i++)
 	max_lookahead_tries *= dfa_lookahead;
@@ -5364,6 +5809,12 @@ max_issue (struct ready_list *ready, int privileged_n, state_t state,
     if (!ready_try [i])
       all++;
 
+  if (sched_verbose >= 2)
+    {
+      fprintf (sched_dump, ";;\t\tmax_issue among %d insns:", all);
+      debug_ready_list_1 (ready, ready_try);
+    }
+
   /* I is the index of the insn to try next.  */
   i = 0;
   tries_num = 0;
@@ -5492,35 +5943,27 @@ static int
 choose_ready (struct ready_list *ready, bool first_cycle_insn_p,
 	      rtx *insn_ptr)
 {
-  int lookahead;
-
   if (dbg_cnt (sched_insn) == false)
     {
-      rtx insn = nonscheduled_insns_begin;
-      do
-	{
-	  insn = next_nonnote_insn (insn);
-	}
-      while (QUEUE_INDEX (insn) == QUEUE_SCHEDULED);
+      if (nonscheduled_insns_begin == NULL_RTX)
+	nonscheduled_insns_begin = current_sched_info->prev_head;
+
+      rtx insn = first_nonscheduled_insn ();
 
       if (QUEUE_INDEX (insn) == QUEUE_READY)
 	/* INSN is in the ready_list.  */
 	{
-	  nonscheduled_insns_begin = insn;
 	  ready_remove_insn (insn);
 	  *insn_ptr = insn;
 	  return 0;
 	}
 
       /* INSN is in the queue.  Advance cycle to move it to the ready list.  */
+      gcc_assert (QUEUE_INDEX (insn) >= 0);
       return -1;
     }
 
-  lookahead = 0;
-
-  if (targetm.sched.first_cycle_multipass_dfa_lookahead)
-    lookahead = targetm.sched.first_cycle_multipass_dfa_lookahead ();
-  if (lookahead <= 0 || SCHED_GROUP_P (ready_element (ready, 0))
+  if (dfa_lookahead <= 0 || SCHED_GROUP_P (ready_element (ready, 0))
       || DEBUG_INSN_P (ready_element (ready, 0)))
     {
       if (targetm.sched.dispatch (NULL_RTX, IS_DISPATCH_ON))
@@ -5532,11 +5975,9 @@ choose_ready (struct ready_list *ready, bool first_cycle_insn_p,
     }
   else
     {
-      /* Try to choose the better insn.  */
-      int index = 0, i, n;
+      /* Try to choose the best insn.  */
+      int index = 0, i;
       rtx insn;
-      int try_data = 1, try_control = 1;
-      ds_t ts;
 
       insn = ready_element (ready, 0);
       if (INSN_CODE (insn) < 0)
@@ -5545,84 +5986,57 @@ choose_ready (struct ready_list *ready, bool first_cycle_insn_p,
 	  return 0;
 	}
 
-      if (spec_info
-	  && spec_info->flags & (PREFER_NON_DATA_SPEC
-				 | PREFER_NON_CONTROL_SPEC))
+      /* Filter the search space.  */
+      for (i = 0; i < ready->n_ready; i++)
 	{
-	  for (i = 0, n = ready->n_ready; i < n; i++)
-	    {
-	      rtx x;
-	      ds_t s;
+	  ready_try[i] = 0;
 
-	      x = ready_element (ready, i);
-	      s = TODO_SPEC (x);
+	  insn = ready_element (ready, i);
 
-	      if (spec_info->flags & PREFER_NON_DATA_SPEC
-		  && !(s & DATA_SPEC))
-		{
-		  try_data = 0;
-		  if (!(spec_info->flags & PREFER_NON_CONTROL_SPEC)
-		      || !try_control)
-		    break;
-		}
+	  /* If this insn is recognizable we should have already
+	     recognized it earlier.
+	     ??? Not very clear where this is supposed to be done.
+	     See dep_cost_1.  */
+	  gcc_checking_assert (INSN_CODE (insn) >= 0
+			       || recog_memoized (insn) < 0);
+	  if (INSN_CODE (insn) < 0)
+	    {
+	      /* Non-recognized insns at position 0 are handled above.  */
+	      gcc_assert (i > 0);
+	      ready_try[i] = 1;
+	      continue;
+	    }
 
-	      if (spec_info->flags & PREFER_NON_CONTROL_SPEC
-		  && !(s & CONTROL_SPEC))
+	  if (targetm.sched.first_cycle_multipass_dfa_lookahead_guard)
+	    {
+	      ready_try[i]
+		= (targetm.sched.first_cycle_multipass_dfa_lookahead_guard
+		    (insn, i));
+
+	      if (ready_try[i] < 0)
+		/* Queue instruction for several cycles.
+		   We need to restart choose_ready as we have changed
+		   the ready list.  */
 		{
-		  try_control = 0;
-		  if (!(spec_info->flags & PREFER_NON_DATA_SPEC) || !try_data)
-		    break;
+		  change_queue_index (insn, -ready_try[i]);
+		  return 1;
 		}
-	    }
-	}
 
-      ts = TODO_SPEC (insn);
-      if ((ts & SPECULATIVE)
-	  && (((!try_data && (ts & DATA_SPEC))
-	       || (!try_control && (ts & CONTROL_SPEC)))
-	      || (targetm.sched.first_cycle_multipass_dfa_lookahead_guard_spec
-		  && !targetm.sched
-		  .first_cycle_multipass_dfa_lookahead_guard_spec (insn))))
-	/* Discard speculative instruction that stands first in the ready
-	   list.  */
-	{
-	  change_queue_index (insn, 1);
-	  return 1;
-	}
-
-      ready_try[0] = 0;
-
-      for (i = 1; i < ready->n_ready; i++)
-	{
-	  insn = ready_element (ready, i);
+	      /* Make sure that we didn't end up with 0'th insn filtered out.
+		 Don't be tempted to make life easier for backends and just
+		 requeue 0'th insn if (ready_try[0] == 0) and restart
+		 choose_ready.  Backends should be very considerate about
+		 requeueing instructions -- especially the highest priority
+		 one at position 0.  */
+	      gcc_assert (ready_try[i] == 0 || i > 0);
+	      if (ready_try[i])
+		continue;
+	    }
 
-	  ready_try [i]
-	    = ((!try_data && (TODO_SPEC (insn) & DATA_SPEC))
-               || (!try_control && (TODO_SPEC (insn) & CONTROL_SPEC)));
+	  gcc_assert (ready_try[i] == 0);
+	  /* INSN made it through the scrutiny of filters!  */
 	}
 
-      /* Let the target filter the search space.  */
-      for (i = 1; i < ready->n_ready; i++)
-	if (!ready_try[i])
-	  {
-	    insn = ready_element (ready, i);
-
-	    /* If this insn is recognizable we should have already
-	       recognized it earlier.
-	       ??? Not very clear where this is supposed to be done.
-	       See dep_cost_1.  */
-	    gcc_checking_assert (INSN_CODE (insn) >= 0
-				 || recog_memoized (insn) < 0);
-
-	    ready_try [i]
-	      = (/* INSN_CODE check can be omitted here as it is also done later
-		    in max_issue ().  */
-		 INSN_CODE (insn) < 0
-		 || (targetm.sched.first_cycle_multipass_dfa_lookahead_guard
-		     && !targetm.sched.first_cycle_multipass_dfa_lookahead_guard
-		     (insn)));
-	  }
-
       if (max_issue (ready, 1, curr_state, first_cycle_insn_p, &index) == 0)
 	{
 	  *insn_ptr = ready_remove_first (ready);
@@ -5870,6 +6284,35 @@ verify_shadows (void)
   return earliest_fail;
 }
 
+/* Print instructions together with useful scheduling information between
+   HEAD and TAIL (inclusive).  */
+static void
+dump_insn_stream (rtx head, rtx tail)
+{
+  fprintf (sched_dump, ";;\t| insn | prio |\n");
+
+  rtx next_tail = NEXT_INSN (tail);
+  for (rtx insn = head; insn != next_tail; insn = NEXT_INSN (insn))
+    {
+      int priority = NOTE_P (insn) ? 0 : INSN_PRIORITY (insn);
+      const char *pattern = (NOTE_P (insn)
+			     ? "note"
+			     : str_pattern_slim (PATTERN (insn)));
+
+      fprintf (sched_dump, ";;\t| %4d | %4d | %-30s ",
+	       INSN_UID (insn), priority, pattern);
+
+      if (sched_verbose >= 4)
+	{
+	  if (NOTE_P (insn) || recog_memoized (insn) < 0)
+	    fprintf (sched_dump, "nothing");
+	  else
+	    print_reservation (sched_dump, insn);
+	}
+      fprintf (sched_dump, "\n");
+    }
+}
+
 /* Use forward list scheduling to rearrange insns of block pointed to by
    TARGET_BB, possibly bringing insns from subsequent blocks in the same
    region.  */
@@ -5908,7 +6351,16 @@ schedule_block (basic_block *target_bb, state_t init_state)
 
   /* Debug info.  */
   if (sched_verbose)
-    dump_new_block_header (0, *target_bb, head, tail);
+    {
+      dump_new_block_header (0, *target_bb, head, tail);
+
+      if (sched_verbose >= 2)
+	{
+	  dump_insn_stream (head, tail);
+	  memset (&rank_for_schedule_stats, 0,
+		  sizeof (rank_for_schedule_stats));
+	}
+    }
 
   if (init_state == NULL)
     state_reset (curr_state);
@@ -5927,8 +6379,9 @@ schedule_block (basic_block *target_bb, state_t init_state)
     targetm.sched.init (sched_dump, sched_verbose, ready.veclen);
 
   /* We start inserting insns after PREV_HEAD.  */
-  last_scheduled_insn = nonscheduled_insns_begin = prev_head;
+  last_scheduled_insn = prev_head;
   last_nondebug_scheduled_insn = NULL_RTX;
+  nonscheduled_insns_begin = NULL_RTX;
 
   gcc_assert ((NOTE_P (last_scheduled_insn)
 	       || DEBUG_INSN_P (last_scheduled_insn))
@@ -5949,8 +6402,8 @@ schedule_block (basic_block *target_bb, state_t init_state)
      in try_ready () (which is called through init_ready_list ()).  */
   (*current_sched_info->init_ready_list) ();
 
-  if (sched_pressure == SCHED_PRESSURE_MODEL)
-    model_start_schedule ();
+  if (sched_pressure)
+    sched_pressure_start_bb (*target_bb);
 
   /* The algorithm is O(n^2) in the number of ready insns at any given
      time in the worst case.  Before reload we are more likely to have
@@ -5958,7 +6411,8 @@ schedule_block (basic_block *target_bb, state_t init_state)
   if (!reload_completed
       && ready.n_ready - ready.n_debug > MAX_SCHED_READY_INSNS)
     {
-      ready_sort (&ready);
+      ready_sort_debug (&ready);
+      ready_sort_real (&ready);
 
       /* Find first free-standing insn past MAX_SCHED_READY_INSNS.
          If there are debug insns, we know they're first.  */
@@ -5969,7 +6423,8 @@ schedule_block (basic_block *target_bb, state_t init_state)
       if (sched_verbose >= 2)
 	{
 	  fprintf (sched_dump,
-		   ";;\t\tReady list on entry: %d insns\n", ready.n_ready);
+		   ";;\t\tReady list on entry: %d insns:  ", ready.n_ready);
+	  debug_ready_list (&ready);
 	  fprintf (sched_dump,
 		   ";;\t\t before reload => truncated to %d insns\n", i);
 	}
@@ -5981,7 +6436,7 @@ schedule_block (basic_block *target_bb, state_t init_state)
 	rtx skip_insn;
 
 	if (dbg_cnt (sched_insn) == false)
-	  skip_insn = next_nonnote_insn (nonscheduled_insns_begin);
+	  skip_insn = first_nonscheduled_insn ();
 	else
 	  skip_insn = NULL_RTX;
 
@@ -6036,7 +6491,7 @@ schedule_block (basic_block *target_bb, state_t init_state)
 
 	  if (sched_verbose >= 2)
 	    {
-	      fprintf (sched_dump, ";;\t\tReady list after queue_to_ready:  ");
+	      fprintf (sched_dump, ";;\t\tReady list after queue_to_ready:");
 	      debug_ready_list (&ready);
 	    }
 	  advance -= clock_var - start_clock_var;
@@ -6102,7 +6557,8 @@ schedule_block (basic_block *target_bb, state_t init_state)
 
 	      if (sched_verbose >= 2)
 		{
-		  fprintf (sched_dump, ";;\t\tReady list after ready_sort:  ");
+		  fprintf (sched_dump,
+			   ";;\t\tReady list after ready_sort:    ");
 		  debug_ready_list (&ready);
 		}
 	    }
@@ -6493,14 +6949,25 @@ schedule_block (basic_block *target_bb, state_t init_state)
       sched_extend_luids ();
     }
 
-  if (sched_verbose)
-    fprintf (sched_dump, ";;   new head = %d\n;;   new tail = %d\n\n",
-	     INSN_UID (head), INSN_UID (tail));
-
   /* Update head/tail boundaries.  */
   head = NEXT_INSN (prev_head);
   tail = last_scheduled_insn;
 
+  if (sched_verbose)
+    {
+      fprintf (sched_dump, ";;   new head = %d\n;;   new tail = %d\n",
+	       INSN_UID (head), INSN_UID (tail));
+
+      if (sched_verbose >= 2)
+	{
+	  dump_insn_stream (head, tail);
+	  print_rank_for_schedule_stats (";; TOTAL ", &rank_for_schedule_stats,
+					 NULL);
+	}
+
+      fprintf (sched_dump, "\n");
+    }
+
   head = restore_other_notes (head, NULL);
 
   current_sched_info->head = head;
@@ -6586,6 +7053,19 @@ alloc_global_sched_pressure_data (void)
 	  saved_reg_live = BITMAP_ALLOC (NULL);
 	  region_ref_regs = BITMAP_ALLOC (NULL);
 	}
+
+      /* Calculate number of CALL_USED_REGS in register classes that
+	 we calculate register pressure for.  */
+      for (int c = 0; c < ira_pressure_classes_num; ++c)
+	{
+	  enum reg_class cl = ira_pressure_classes[c];
+
+	  call_used_regs_num[cl] = 0;
+
+	  for (int i = 0; i < ira_class_hard_regs_num[cl]; ++i)
+	    if (call_used_regs[ira_class_hard_regs[cl][i]])
+	      ++call_used_regs_num[cl];
+	}
     }
 }
 
@@ -6665,18 +7145,17 @@ sched_init (void)
   else
     issue_rate = 1;
 
-  if (cached_issue_rate != issue_rate)
-    {
-      cached_issue_rate = issue_rate;
-      /* To invalidate max_lookahead_tries:  */
-      cached_first_cycle_multipass_dfa_lookahead = 0;
-    }
-
-  if (targetm.sched.first_cycle_multipass_dfa_lookahead)
+  if (targetm.sched.first_cycle_multipass_dfa_lookahead
+      /* Don't use max_issue with reg_pressure scheduling.  Multipass
+	 scheduling and reg_pressure scheduling undo each other's decisions.  */
+      && sched_pressure == SCHED_PRESSURE_NONE)
     dfa_lookahead = targetm.sched.first_cycle_multipass_dfa_lookahead ();
   else
     dfa_lookahead = 0;
 
+  /* Set to "0" so that we recalculate.  */
+  max_lookahead_tries = 0;
+
   if (targetm.sched.init_dfa_pre_cycle_insn)
     targetm.sched.init_dfa_pre_cycle_insn ();
 
@@ -7160,8 +7639,9 @@ sched_extend_ready_list (int new_sched_ready_n_insns)
 
   gcc_assert (new_sched_ready_n_insns >= sched_ready_n_insns);
 
-  ready_try = (char *) xrecalloc (ready_try, new_sched_ready_n_insns,
-                                  sched_ready_n_insns, sizeof (*ready_try));
+  ready_try = (signed char *) xrecalloc (ready_try, new_sched_ready_n_insns,
+					 sched_ready_n_insns,
+					 sizeof (*ready_try));
 
   /* We allocate +1 element to save initial state in the choice_stack[0]
      entry.  */
@@ -8434,6 +8914,10 @@ init_h_i_d (rtx insn)
       INSN_EXACT_TICK (insn) = INVALID_TICK;
       INTER_TICK (insn) = INVALID_TICK;
       TODO_SPEC (insn) = HARD_DEP;
+      INSN_AUTOPREF_MULTIPASS_DATA (insn)[0].status
+	= AUTOPREF_MULTIPASS_DATA_UNINITIALIZED;
+      INSN_AUTOPREF_MULTIPASS_DATA (insn)[1].status
+	= AUTOPREF_MULTIPASS_DATA_UNINITIALIZED;
     }
 }
 
@@ -8539,7 +9023,7 @@ sched_create_empty_bb_1 (basic_block after)
 rtx
 sched_emit_insn (rtx pat)
 {
-  rtx insn = emit_insn_before (pat, nonscheduled_insns_begin);
+  rtx insn = emit_insn_before (pat, first_nonscheduled_insn ());
   haifa_init_insn (insn);
 
   if (current_sched_info->add_remove_insn)
diff --git a/gcc/params.def b/gcc/params.def
index 04a13232695..70c01f65926 100644
--- a/gcc/params.def
+++ b/gcc/params.def
@@ -668,6 +668,11 @@ DEFPARAM (PARAM_SCHED_MEM_TRUE_DEP_COST,
 	  "Minimal distance between possibly conflicting store and load",
 	  1, 0, 0)
 
+DEFPARAM (PARAM_SCHED_AUTOPREF_QUEUE_DEPTH,
+	  "sched-autopref-queue-depth",
+	  "Hardware autoprefetcher scheduler model control flag.  Number of lookahead cycles the model looks into; at '0' only enable instruction sorting heuristic.  Disabled by default.",
+	  -1, 0, 0)
+
 DEFPARAM(PARAM_MAX_LAST_VALUE_RTL,
 	 "max-last-value-rtl",
 	 "The maximum number of RTL nodes that can be recorded as combiner's last value",
diff --git a/gcc/rtlanal.c b/gcc/rtlanal.c
index e99ef7bf772..094202d1af4 100644
--- a/gcc/rtlanal.c
+++ b/gcc/rtlanal.c
@@ -5606,7 +5606,8 @@ get_base_term (rtx *inner)
     inner = strip_address_mutations (&XEXP (*inner, 0));
   if (REG_P (*inner)
       || MEM_P (*inner)
-      || GET_CODE (*inner) == SUBREG)
+      || GET_CODE (*inner) == SUBREG
+      || GET_CODE (*inner) == SCRATCH)
     return inner;
   return 0;
 }
diff --git a/gcc/sched-int.h b/gcc/sched-int.h
index d04bf0876b1..4955c0a746c 100644
--- a/gcc/sched-int.h
+++ b/gcc/sched-int.h
@@ -170,7 +170,7 @@ struct ready_list
   int n_debug;
 };
 
-extern char *ready_try;
+extern signed char *ready_try;
 extern struct ready_list ready;
 
 extern int max_issue (struct ready_list *, int, state_t, bool, int *);
@@ -794,6 +794,32 @@ struct reg_set_data
   struct reg_set_data *next_insn_set;
 };
 
+enum autopref_multipass_data_status {
+  /* Entry is irrelevant for auto-prefetcher.  */
+  AUTOPREF_MULTIPASS_DATA_IRRELEVANT = -2,
+  /* Entry is uninitialized.  */
+  AUTOPREF_MULTIPASS_DATA_UNINITIALIZED = -1,
+  /* Entry is relevant for auto-prefetcher and insn can be delayed
+     to allow another insn through.  */
+  AUTOPREF_MULTIPASS_DATA_NORMAL = 0,
+  /* Entry is relevant for auto-prefetcher, but insn should not be
+     delayed as that will break scheduling.  */
+  AUTOPREF_MULTIPASS_DATA_DONT_DELAY = 1
+};
+
+/* Data for modeling cache auto-prefetcher.  */
+struct autopref_multipass_data_
+{
+  /* Base part of memory address.  */
+  rtx base;
+  /* Memory offset.  */
+  int offset;
+  /* Entry status.  */
+  enum autopref_multipass_data_status status;
+};
+typedef struct autopref_multipass_data_ autopref_multipass_data_def;
+typedef autopref_multipass_data_def *autopref_multipass_data_t;
+
 struct _haifa_insn_data
 {
   /* We can't place 'struct _deps_list' into h_i_d instead of deps_list_t
@@ -888,6 +914,16 @@ struct _haifa_insn_data
      pressure excess (between source and target).  */
   int reg_pressure_excess_cost_change;
   int model_index;
+
+  /* Original order of insns in the ready list.  */
+  int rfs_debug_orig_order;
+
+  /* The deciding reason for INSN's place in the ready list.  */
+  int last_rfs_win;
+
+  /* Two entries for cache auto-prefetcher model: one for mem reads,
+     and one for mem writes.  */
+  autopref_multipass_data_def autopref_multipass_data[2];
 };
 
 typedef struct _haifa_insn_data haifa_insn_data_def;
@@ -909,6 +945,8 @@ extern vec<haifa_insn_data_def> h_i_d;
   (HID (INSN)->reg_pressure_excess_cost_change)
 #define INSN_PRIORITY_STATUS(INSN) (HID (INSN)->priority_status)
 #define INSN_MODEL_INDEX(INSN) (HID (INSN)->model_index)
+#define INSN_AUTOPREF_MULTIPASS_DATA(INSN) \
+  (HID (INSN)->autopref_multipass_data)
 
 typedef struct _haifa_deps_insn_data haifa_deps_insn_data_def;
 typedef haifa_deps_insn_data_def *haifa_deps_insn_data_t;
@@ -1141,9 +1179,7 @@ enum SCHED_FLAGS {
 
 enum SPEC_SCHED_FLAGS {
   COUNT_SPEC_IN_CRITICAL_PATH = 1,
-  PREFER_NON_DATA_SPEC = COUNT_SPEC_IN_CRITICAL_PATH << 1,
-  PREFER_NON_CONTROL_SPEC = PREFER_NON_DATA_SPEC << 1,
-  SEL_SCHED_SPEC_DONT_CHECK_CONTROL = PREFER_NON_CONTROL_SPEC << 1
+  SEL_SCHED_SPEC_DONT_CHECK_CONTROL = COUNT_SPEC_IN_CRITICAL_PATH << 1
 };
 
 #define NOTE_NOT_BB_P(NOTE) (NOTE_P (NOTE) && (NOTE_KIND (NOTE)	\
@@ -1357,7 +1393,8 @@ extern int cycle_issued_insns;
 extern int issue_rate;
 extern int dfa_lookahead;
 
-extern void ready_sort (struct ready_list *);
+extern int autopref_multipass_dfa_lookahead_guard (rtx, int);
+
 extern rtx ready_element (struct ready_list *, int);
 extern rtx *ready_lastpos (struct ready_list *);
 
diff --git a/gcc/sel-sched.c b/gcc/sel-sched.c
index 241bdad146a..0c864acd7f7 100644
--- a/gcc/sel-sched.c
+++ b/gcc/sel-sched.c
@@ -3502,8 +3502,6 @@ process_pipelined_exprs (av_set_t *av_ptr)
 static void
 process_spec_exprs (av_set_t *av_ptr)
 {
-  bool try_data_p = true;
-  bool try_control_p = true;
   expr_t expr;
   av_set_iterator si;
 
@@ -3529,34 +3527,6 @@ process_spec_exprs (av_set_t *av_ptr)
           av_set_iter_remove (&si);
           continue;
         }
-
-      if ((spec_info->flags & PREFER_NON_DATA_SPEC)
-          && !(ds & BEGIN_DATA))
-        try_data_p = false;
-
-      if ((spec_info->flags & PREFER_NON_CONTROL_SPEC)
-          && !(ds & BEGIN_CONTROL))
-        try_control_p = false;
-    }
-
-  FOR_EACH_EXPR_1 (expr, si, av_ptr)
-    {
-      ds_t ds;
-
-      ds = EXPR_SPEC_DONE_DS (expr);
-
-      if (ds & SPECULATIVE)
-        {
-          if ((ds & BEGIN_DATA) && !try_data_p)
-            /* We don't want any data speculative instructions right
-               now.  */
-            av_set_iter_remove (&si);
-
-          if ((ds & BEGIN_CONTROL) && !try_control_p)
-            /* We don't want any control speculative instructions right
-               now.  */
-            av_set_iter_remove (&si);
-        }
     }
 }
 
@@ -4255,7 +4225,7 @@ invoke_dfa_lookahead_guard (void)
       if (! have_hook || i == 0)
         r = 0;
       else
-        r = !targetm.sched.first_cycle_multipass_dfa_lookahead_guard (insn);
+        r = targetm.sched.first_cycle_multipass_dfa_lookahead_guard (insn, i);
 
       gcc_assert (INSN_CODE (insn) >= 0);
 
diff --git a/gcc/target.def b/gcc/target.def
index cf46a691138..944ec7dc0c6 100644
--- a/gcc/target.def
+++ b/gcc/target.def
@@ -1176,11 +1176,17 @@ DEFHOOK
  "\n\
 This hook controls what insns from the ready insn queue will be\n\
 considered for the multipass insn scheduling.  If the hook returns\n\
-zero for @var{insn}, the insn will be not chosen to\n\
-be issued.\n\
+zero for @var{insn}, the insn will be considered in multipass scheduling.\n\
+Positive return values will remove @var{insn} from consideration on\n\
+the current round of multipass scheduling.\n\
+Negative return values will remove @var{insn} from consideration for given\n\
+number of cycles.\n\
+Backends should be careful about returning non-zero for highest priority\n\
+instruction at position 0 in the ready list.  @var{ready_index} is passed\n\
+to allow backends make correct judgements.\n\
 \n\
 The default is that any ready insns can be chosen to be issued.",
- int, (rtx insn), NULL)
+ int, (rtx insn, int ready_index), NULL)
 
 /* This hook prepares the target for a new round of multipass
    scheduling.
@@ -1195,7 +1201,7 @@ DEFHOOK
 (first_cycle_multipass_begin,
  "This hook prepares the target backend for a new round of multipass\n\
 scheduling.",
- void, (void *data, char *ready_try, int n_ready, bool first_cycle_insn_p),
+ void, (void *data, signed char *ready_try, int n_ready, bool first_cycle_insn_p),
  NULL)
 
 /* This hook is called when multipass scheduling evaluates instruction INSN.
@@ -1211,7 +1217,7 @@ scheduling.",
 DEFHOOK
 (first_cycle_multipass_issue,
  "This hook is called when multipass scheduling evaluates instruction INSN.",
- void, (void *data, char *ready_try, int n_ready, rtx insn,
+ void, (void *data, signed char *ready_try, int n_ready, rtx insn,
 	const void *prev_data), NULL)
 
 /* This hook is called when multipass scheduling backtracks from evaluation of
@@ -1227,7 +1233,7 @@ DEFHOOK
 (first_cycle_multipass_backtrack,
  "This is called when multipass scheduling backtracks from evaluation of\n\
 an instruction.",
- void, (const void *data, char *ready_try, int n_ready), NULL)
+ void, (const void *data, signed char *ready_try, int n_ready), NULL)
 
 /* This hook notifies the target about the result of the concluded current
    round of multipass scheduling.
@@ -1423,26 +1429,6 @@ a pattern for a branchy check corresponding to a simple check denoted by\n\
 @var{insn} should be generated.  In this case @var{label} can't be null.",
  rtx, (rtx insn, rtx label, unsigned int ds), NULL)
 
-/* The following member value is a pointer to a function controlling
-   what insns from the ready insn queue will be considered for the
-   multipass insn scheduling.  If the hook returns zero for the insn
-   passed as the parameter, the insn will not be chosen to be
-   issued.  This hook is used to discard speculative instructions,
-   that stand at the first position of the ready list.  */
-DEFHOOK
-(first_cycle_multipass_dfa_lookahead_guard_spec,
- "This hook is used as a workaround for\n\
-@samp{TARGET_SCHED_FIRST_CYCLE_MULTIPASS_DFA_LOOKAHEAD_GUARD} not being\n\
-called on the first instruction of the ready list.  The hook is used to\n\
-discard speculative instructions that stand first in the ready list from\n\
-being scheduled on the current cycle.  If the hook returns @code{false},\n\
-@var{insn} will not be chosen to be issued.\n\
-For non-speculative instructions,\n\
-the hook should always return @code{true}.  For example, in the ia64 backend\n\
-the hook is used to cancel data speculative insns when the ALAT table\n\
-is nearly full.",
- bool, (const_rtx insn), NULL)
-
 /* The following member value is a pointer to a function that provides
    information about the speculation capabilities of the target.
    The parameter is a pointer to spec_info variable.  */
diff --git a/gcc/testsuite/ChangeLog.linaro b/gcc/testsuite/ChangeLog.linaro
index 3f3d73be739..c17d1b26358 100644
--- a/gcc/testsuite/ChangeLog.linaro
+++ b/gcc/testsuite/ChangeLog.linaro
@@ -1,3 +1,8 @@
+2015-03-24  Maxim Kuvyrkov  <maxim.kuvyrkov@linaro.org>
+
+	Backport from trunk r220808.
+        * gcc.dg/pr64935-1.c, gcc.dg/pr64935-2.c: New tests.
+
 2015-03-18  Michael Collison  <michael.collison@linaro.org>
 
 	Backport from trunk r218012.
diff --git a/gcc/testsuite/gcc.dg/pr64935-1.c b/gcc/testsuite/gcc.dg/pr64935-1.c
new file mode 100644
index 00000000000..0fc6b58caed
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr64935-1.c
@@ -0,0 +1,54 @@
+/* PR rtl-optimization/64935 */
+/* { dg-do compile } */
+/* { dg-options "-std=gnu89 -Wno-shift-count-overflow -O2 -fcompare-debug" } */
+
+int a[] = {}, b[] = {}, c[] = {}, d[] = {}, e[] = {}, f[] = {}, h[] = {};
+int g[] = { 0 };
+int i, l, s, w, x, y, z, t2, t3, t5;
+unsigned long j, m, o, t4;
+long k, n, p, q, r, t, u, v, t1;
+fn1 ()
+{
+  int t6;
+  for (; i; i++)
+    {
+      t5 = a[q] ^ b[p >> 1] ^ c[o >> 1 & 1] ^ d[n >> 1 & 1] ^ e[m >> 1 & 1]
+           ^ f[l >> 1 & 1] ^ g[0] ^ h[j & 1];
+      t4 = a[j] ^ b[q >> 1] ^ c[p] ^ d[o] ^ e[n] ^ f[m] ^ g[l >> 8] ^ h[k];
+      t3 = a[k >> 1] ^ b[j & 5] ^ d[p >> 32] ^ e[o >> 4] ^ f[n >> 6]
+           ^ g[m >> 8] ^ h[l];
+      t2 = a[l >> 6] ^ b[k & 1] ^ c[j >> 1] ^ d[q >> 32] ^ e[p >> 4]
+           ^ f[o >> 6] ^ g[n >> 8] ^ h[m & 1];
+      t1 = a[m >> 6] ^ b[l & 1] ^ c[k & 15] ^ d[j >> 2] ^ e[q >> 4] ^ f[p >> 6]
+           ^ g[o >> 8] ^ h[n & 1];
+      z = a[n >> 56] ^ b[m & 15] ^ c[l & 15] ^ d[k >> 2] ^ e[j >> 4]
+          ^ f[q >> 6] ^ g[p >> 8] ^ h[o & 1];
+      y = a[o >> 56] ^ b[n & 15] ^ c[m >> 40] ^ d[l >> 2] ^ e[k >> 4]
+          ^ f[j >> 6] ^ g[q >> 8] ^ h[p & 1];
+      x = a[p >> 56] ^ b[o & 15] ^ c[n >> 40] ^ d[m >> 2] ^ e[l >> 4]
+          ^ f[k >> 6] ^ g[j >> 8] ^ h[q & 1];
+      q = j = t4;
+      k = t3;
+      l = t2;
+      m = t1;
+      n = z;
+      o = y;
+      p = a[t6] ^ b[0] ^ c[w] ^ d[v] ^ e[u] ^ f[t] ^ g[s] ^ h[r];
+      t4 = a[r >> 1] ^ b[t6 & 1] ^ d[w >> 1] ^ e[v >> 1] ^ f[u >> 1]
+           ^ g[t >> 1] ^ h[s];
+      t3 = a[s >> 6] ^ b[r & 1] ^ c[t6 & 5] ^ d[0] ^ e[w >> 4] ^ f[v >> 6]
+           ^ g[u >> 8] ^ h[t & 1];
+      t2 = a[t >> 6] ^ b[s] ^ c[r & 15] ^ d[t6 >> 1] ^ e[0] ^ f[w >> 6]
+           ^ g[v >> 8] ^ h[u & 1];
+      t1 = a[u >> 6] ^ b[t & 15] ^ c[s & 5] ^ d[r >> 32] ^ e[t6 >> 4]
+           ^ g[w >> 8] ^ h[v & 1];
+      z = a[v >> 56] ^ b[u >> 48 & 1] ^ c[t >> 40 & 1] ^ d[s] ^ e[r >> 1 & 1]
+          ^ f[t6 >> 1 & 1] ^ g[0] ^ h[w & 1] ^ z;
+      t6 = t5;
+      r = t4;
+      s = 0;
+      t = u = t1;
+      v = z;
+      w = y;
+    }
+}
diff --git a/gcc/testsuite/gcc.dg/pr64935-2.c b/gcc/testsuite/gcc.dg/pr64935-2.c
new file mode 100644
index 00000000000..6921a21d76a
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr64935-2.c
@@ -0,0 +1,14 @@
+/* PR rtl-optimization/64935 */
+/* { dg-do compile } */
+/* { dg-options "-O -fschedule-insns --param=max-sched-ready-insns=0 -fcompare-debug" } */
+
+void
+foo (int *data, unsigned len, const int qlp_coeff[],
+     unsigned order, int lp, int residual[], int i)
+{
+  int sum;
+  sum = 0;
+  sum += qlp_coeff[1] * data[i - 2];
+  sum += qlp_coeff[0] * data[i - 1];
+  residual[i] = data[i] - (sum >> lp);
+}
author	mkuvyrkov <mkuvyrkov@138bc75d-0d04-0410-961f-82ee72b054a4>	2015-03-24 14:46:03 +0000
committer	mkuvyrkov <mkuvyrkov@138bc75d-0d04-0410-961f-82ee72b054a4>	2015-03-24 14:46:03 +0000
commit	878923a26c3d37e2f9ebecb630965ba97ed70df0 (patch)
tree	3b6ee34367f463b25293432d36f49aa5ce2e00f4
parent	62fd83c70da56dd9a270cb4f026c24281b8c7ef5 (diff)