Age | Commit message (Collapse) | Author |
|
Some architectures and platforms perform CPU frequency transitions
through a non-blocking method, while some might block or sleep. This
distinction is important when trying to change frequency from interrupt
context or in any other non-interruptable context, such as from the
Linux scheduler.
Describe this distinction with a cpufreq driver flag,
CPUFREQ_DRIVER_WILL_NOT_SLEEP. The default is to not have this flag set,
thus erring on the side of caution.
cpufreq_driver_might_sleep() is also introduced in this patch. Setting
the above flag will allow this function to return false.
Cc: Rafael J. Wysocki <rafael@kernel.org>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Michael Turquette <mturquette@baylibre.com>
|
|
Scenarios with the busiest group having just one task and the local
being idle on topologies with sched groups with different numbers of
cpus manage to dodge all load-balance bailout conditions resulting the
nr_balance_failed counter to be incremented. This eventually causes an
pointless active migration of the task. This patch prevents this by not
incrementing the counter when the busiest group only has one task.
ASYM_PACKING migrations and migrations due to reduced capacity should
still take place as these are explicitly captured by
need_active_balance().
A better solution would be to not attempt the load-balance in the first
place, but that requires significant changes to the order of bailout
conditions and statistics gathering.
cc: Ingo Molnar <mingo@redhat.com>
cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Morten Rasmussen <morten.rasmussen@arm.com>
|
|
With energy-aware scheduling enabled nohz_kick_needed() generates many
nohz idle-balance kicks which lead to nothing when multiple tasks get
packed on a single cpu to save energy. This causes unnecessary wake-ups
and hence wastes energy. Make these conditions depend on !energy_aware()
for now until the energy-aware nohz story gets sorted out.
cc: Ingo Molnar <mingo@redhat.com>
cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Morten Rasmussen <morten.rasmussen@arm.com>
|
|
We do not want to miss out on the ability to pull a single remaining
task from a potential source cpu towards an idle destination cpu if the
energy aware system operates above the tipping point.
Add an extra criteria to need_active_balance() to kick off active load
balance if the source cpu is over-utilized and has lower capacity than
the destination cpu.
cc: Ingo Molnar <mingo@redhat.com>
cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Morten Rasmussen <morten.rasmussen@arm.com>
Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
|
|
In case the system operates below the tipping point indicator,
introduced in ("sched: Add over-utilization/tipping point
indicator"), bail out in find_busiest_group after the dst and src
group statistics have been checked.
There is simply no need to move usage around because all involved
cpus still have spare cycles available.
For an energy-aware system below its tipping point, we rely on the
task placement of the wakeup path. This works well for short running
tasks.
The existence of long running tasks on one of the involved cpus lets
the system operate over its tipping point. To be able to move such
a task (whose load can't be used to average the load among the cpus)
from a src cpu with lower capacity than the dst_cpu, an additional
rule has to be implemented in need_active_balance.
Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
|
|
Let available compute capacity and estimated energy impact select
wake-up target cpu when energy-aware scheduling is enabled and the
system in not over-utilized (above the tipping point).
energy_aware_wake_cpu() attempts to find group of cpus with sufficient
compute capacity to accommodate the task and find a cpu with enough spare
capacity to handle the task within that group. Preference is given to
cpus with enough spare capacity at the current OPP. Finally, the energy
impact of the new target and the previous task cpu is compared to select
the wake-up target cpu.
cc: Ingo Molnar <mingo@redhat.com>
cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Morten Rasmussen <morten.rasmussen@arm.com>
|
|
In mainline find_idlest_group() selects the wake-up target group purely
based on group load which leads to suboptimal choices in low load
scenarios. An idle group with reduced capacity (due to RT tasks or
different cpu type) isn't necessarily a better target than a lightly
loaded group with higher capacity.
The patch adds spare capacity as an additional group selection
parameter. The target group is now selected based on the following
criteria listed by highest priority first:
1. If energy-aware scheduling is enabled the group with the lowest
capacity containing a cpu with enough spare capacity to accommodate the
task (with a bit to spare) is selected if such exists.
2. Return the group with the cpu with most spare capacity and this
capacity is significant if such group exists. Significant spare capacity
is currently at least 20% to spare.
3. Return the group with the lowest load, unless it is the local group
in which case NULL is returned and the search is continued at the next
(lower) level.
cc: Ingo Molnar <mingo@redhat.com>
cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Morten Rasmussen <morten.rasmussen@arm.com>
|
|
Wakeup balancing is completely unaware of cpu capacity, cpu usage and
task utilization. The task is preferably placed on a cpu which is idle
in the instant the wakeup happens. New tasks (SD_BALANCE_{FORK,EXEC} are
placed on an idle cpu in the idlest group if such can be found, otherwise
it goes on the least loaded one. Existing tasks (SD_BALANCE_WAKE) are
placed on the previous cpu or an idle cpu sharing the same last level
cache. Hence existing tasks don't get a chance to migrate to a different
group at wakeup in case the current one has reduced cpu capacity (due
RT/IRQ pressure or different uarch e.g. ARM big.LITTLE). They may
eventually get pulled by other cpus doing periodic/idle/nohz_idle
balance, but it may take quite a while before it happens.
This patch adds capacity awareness to find_idlest_{group,queue} (used by
SD_BALANCE_{FORK,EXEC}) such that groups/cpus that can accommodate the
waking task based on task utilization are preferred. In addition, wakeup
of existing tasks (SD_BALANCE_WAKE) is sent through
find_idlest_{group,queue} if the task doesn't fit the capacity of the
previous cpu to allow it to escape (override wake_affine) when
necessary instead of relying on periodic/idle/nohz_idle balance to
eventually sort it out.
The patch doesn't depend on any energy model infrastructure, but it is
kept behind the energy_aware() static key despite being primarily a
performance optimization as it may increase scheduler overhead slightly.
cc: Ingo Molnar <mingo@redhat.com>
cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Morten Rasmussen <morten.rasmussen@arm.com>
|
|
To estimate the energy consumption of a sched_group in
sched_group_energy() it is necessary to know which idle-state the group
is in when it is idle. For now, it is assumed that this is the current
idle-state (though it might be wrong). Based on the individual cpu
idle-states group_idle_state() finds the group idle-state.
cc: Ingo Molnar <mingo@redhat.com>
cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Morten Rasmussen <morten.rasmussen@arm.com>
Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
|
|
cpuidle associates all idle-states with each cpu while the energy model
associates them with the sched_group covering the cpus coordinating
entry to the idle-state. To look up the idle-state power consumption in
the energy model it is therefore necessary to translate from cpuidle
idle-state index to energy model index. For this purpose it is helpful
to know how many idle-states that are listed in lower level sched_groups
(in struct sched_group_energy).
Example: ARMv8 big.LITTLE JUNO (Cortex A57, A53) idle-states:
Idle-state cpuidle Energy model table indices
index per-cpu sg per-cluster sg
WFI 0 0 (0)
Core power-down 1 1 0*
Cluster power-down 2 (1) 1
For per-cpu sgs no translation is required. If cpuidle reports state
index 0 or 1, the cpu is in WFI or core power-down, respectively. We can
look the idle-power up directly in the sg energy model table. Idle-state
cluster power-down, is represented in the per-cluster sg energy model
table as index 1. Index 0* is reserved for cluster power consumption
when the cpus all are in state 0 or 1, but cpuidle decided not to go for
cluster power-down. Given the index from cpuidle we can compute the
correct index in the energy model tables for the sgs at each level if we
know how many states are in the tables in the child sgs. The actual
translation is implemented in a later patch.
cc: Ingo Molnar <mingo@redhat.com>
cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Morten Rasmussen <morten.rasmussen@arm.com>
|
|
The idle-state of each cpu is currently pointed to by rq->idle_state but
there isn't any information in the struct cpuidle_state that can used to
look up the idle-state energy model data stored in struct
sched_group_energy. For this purpose is necessary to store the idle
state index as well. Ideally, the idle-state data should be unified.
cc: Ingo Molnar <mingo@redhat.com>
cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Morten Rasmussen <morten.rasmussen@arm.com>
|
|
To be able to compare the capacity of the target cpu with the highest
cpu capacity of the system in the wakeup path, store the system-wide
maximum cpu capacity in the root domain.
cc: Ingo Molnar <mingo@redhat.com>
cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
|
|
Energy-aware scheduling is only meant to be active while the system is
_not_ over-utilized. That is, there are spare cycles available to shift
tasks around based on their actual utilization to get a more
energy-efficient task distribution without depriving any tasks. When
above the tipping point task placement is done the traditional way,
spreading the tasks across as many cpus as possible based on priority
scaled load to preserve smp_nice.
The over-utilization condition is conservatively chosen to indicate
over-utilization as soon as one cpu is fully utilized at it's highest
frequency. We don't consider groups as lumping usage and capacity
together for a group of cpus may hide the fact that one or more cpus in
the group are over-utilized while group-siblings are partially idle. The
tasks could be served better if moved to another group with completely
idle cpus. This is particularly problematic if some cpus have a
significantly reduced capacity due to RT/IRQ pressure or if the system
has cpus of different capacity (e.g. ARM big.LITTLE).
cc: Ingo Molnar <mingo@redhat.com>
cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Morten Rasmussen <morten.rasmussen@arm.com>
|
|
Adds a generic energy-aware helper function, energy_diff(), that
calculates energy impact of adding, removing, and migrating utilization
in the system.
cc: Ingo Molnar <mingo@redhat.com>
cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Morten Rasmussen <morten.rasmussen@arm.com>
|
|
Extended sched_group_energy() to support energy prediction with usage
(tasks) added/removed from a specific cpu or migrated between a pair of
cpus. Useful for load-balancing decision making.
cc: Ingo Molnar <mingo@redhat.com>
cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Morten Rasmussen <morten.rasmussen@arm.com>
|
|
For energy-aware load-balancing decisions it is necessary to know the
energy consumption estimates of groups of cpus. This patch introduces a
basic function, sched_group_energy(), which estimates the energy
consumption of the cpus in the group and any resources shared by the
members of the group.
NOTE: The function has five levels of identation and breaks the 80
character limit. Refactoring is necessary.
cc: Ingo Molnar <mingo@redhat.com>
cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Morten Rasmussen <morten.rasmussen@arm.com>
|
|
Add another member to the family of per-cpu sched_domain shortcut
pointers. This one, sd_ea, points to the highest level at which energy
model is provided. At this level and all levels below all sched_groups
have energy model data attached.
Partial energy model information is possible but restricted to providing
energy model data for lower level sched_domains (sd_ea and below) and
leaving load-balancing on levels above to non-energy-aware
load-balancing. For example, it is possible to apply energy-aware
scheduling within each socket on a multi-socket system and let normal
scheduling handle load-balancing between sockets.
cc: Ingo Molnar <mingo@redhat.com>
cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Morten Rasmussen <morten.rasmussen@arm.com>
|
|
Move get_cpu_usage() to an earlier position in fair.c and change return
type to unsigned long as negative usage doesn't make much sense. All
other load and capacity related functions use unsigned long including
the caller of get_cpu_usage().
cc: Ingo Molnar <mingo@redhat.com>
cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Morten Rasmussen <morten.rasmussen@arm.com>
|
|
capacity_orig_of() returns the max available compute capacity of a cpu.
For scale-invariant utilization tracking and energy-aware scheduling
decisions it is useful to know the compute capacity available at the
current OPP of a cpu.
cc: Ingo Molnar <mingo@redhat.com>
cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Morten Rasmussen <morten.rasmussen@arm.com>
|
|
This patch is only here to be able to test provisioning of energy related
data from an arch topology shim layer to the scheduler. Since there is no
code today which deals with extracting energy related data from the dtb or
acpi, and process it in the topology shim layer, the content of the
sched_group_energy structures as well as the idle_state and capacity_state
arrays are hard-coded here.
This patch defines the sched_group_energy structure as well as the
idle_state and capacity_state array for the cluster (relates to sched
groups (sgs) in DIE sched domain level) and for the core (relates to sgs
in MC sd level) for a Cortex A7 as well as for a Cortex A15.
It further provides related implementations of the sched_domain_energy_f
functions (cpu_cluster_energy() and cpu_core_energy()).
To be able to propagate this information from the topology shim layer to
the scheduler, the elements of the arm_topology[] table have been
provisioned with the appropriate sched_domain_energy_f functions.
cc: Russell King <linux@arm.linux.org.uk>
Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
|
|
cpufreq is currently keeping it a secret which cpus are sharing
clock source. The scheduler needs to know about clock domains as well
to become more energy aware. The SD_SHARE_CAP_STATES domain flag
indicates whether cpus belonging to the sched_domain share capacity
states (P-states).
There is no connection with cpufreq (yet). The flag must be set by
the arch specific topology code.
cc: Russell King <linux@arm.linux.org.uk>
cc: Ingo Molnar <mingo@redhat.com>
cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Morten Rasmussen <morten.rasmussen@arm.com>
|
|
The per sched group sched_group_energy structure plus the related
idle_state and capacity_state arrays are allocated like the other sched
domain (sd) hierarchy data structures. This includes the freeing of
sched_group_energy structures which are not used.
Energy-aware scheduling allows that a system only has energy model data
up to a certain sd level (so called highest energy aware balancing sd
level). A check in init_sched_energy enforces that all sd's below this
sd level contain energy model data.
One problem is that the number of elements of the idle_state and the
capacity_state arrays is not fixed and has to be retrieved in
__sdt_alloc() to allocate memory for the sched_group_energy structure and
the two arrays in one chunk. The array pointers (idle_states and
cap_states) are initialized here to point to the correct place inside the
memory chunk.
The new function init_sched_energy() initializes the sched_group_energy
structure and the two arrays in case the sd topology level contains energy
information.
This patch has been tested with scheduler feature flag FORCE_SD_OVERLAP
enabled as well.
cc: Ingo Molnar <mingo@redhat.com>
cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
|
|
The struct sched_group_energy represents the per sched_group related
data which is needed for energy aware scheduling. It contains:
(1) atomic reference counter for scheduler internal bookkeeping of
data allocation and freeing
(2) number of elements of the idle state array
(3) pointer to the idle state array which comprises 'power consumption'
for each idle state
(4) number of elements of the capacity state array
(5) pointer to the capacity state array which comprises 'compute
capacity and power consumption' tuples for each capacity state
Allocation and freeing of struct sched_group_energy utilizes the existing
infrastructure of the scheduler which is currently used for the other sd
hierarchy data structures (e.g. struct sched_domain) as well. That's why
struct sd_data is provisioned with a per cpu struct sched_group_energy
double pointer.
The struct sched_group obtains a pointer to a struct sched_group_energy.
The function pointer sched_domain_energy_f is introduced into struct
sched_domain_topology_level which will allow the arch to pass a particular
struct sched_group_energy from the topology shim layer into the scheduler
core.
The function pointer sched_domain_energy_f has an 'int cpu' parameter
since the folding of two adjacent sd levels via sd degenerate doesn't work
for all sd levels. I.e. it is not possible for example to use this feature
to provide per-cpu energy in sd level DIE on ARM's TC2 platform.
It was discussed that the folding of sd levels approach is preferable
over the cpu parameter approach, simply because the user (the arch
specifying the sd topology table) can introduce less errors. But since
it is not working, the 'int cpu' parameter is the only way out. It's
possible to use the folding of sd levels approach for
sched_domain_flags_f and the cpu parameter approach for the
sched_domain_energy_f at the same time though. With the use of the
'int cpu' parameter, an extra check function has to be provided to make
sure that all cpus spanned by a sched group are provisioned with the same
energy data.
cc: Ingo Molnar <mingo@redhat.com>
cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
|
|
This patch introduces the ENERGY_AWARE sched feature, which is
implemented using jump labels when SCHED_DEBUG is defined. It is
statically set false when SCHED_DEBUG is not defined. Hence this doesn't
allow energy awareness to be enabled without SCHED_DEBUG. This
sched_feature knob will be replaced later with a more appropriate
control knob when things have matured a bit.
ENERGY_AWARE is based on per-entity load-tracking hence FAIR_GROUP_SCHED
must be enable. This dependency isn't checked at compile time yet.
cc: Ingo Molnar <mingo@redhat.com>
cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Morten Rasmussen <morten.rasmussen@arm.com>
|
|
This documentation patch provides an overview of the experimental
scheduler energy costing model, associated data structures, and a
reference recipe on how platforms can be characterized to derive energy
models.
Signed-off-by: Morten Rasmussen <morten.rasmussen@arm.com>
|
|
Task load or usage is not currently considered in select_task_rq_fair(),
but if we want that in the future we should make sure it is not zero for
new tasks.
The load-tracking sums are currently initialized using sched_slice(),
that won't work before the task has been assigned a rq. Initialization
is therefore changed to another semi-arbitrary value, sched_latency,
instead.
cc: Ingo Molnar <mingo@redhat.com>
cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Morten Rasmussen <morten.rasmussen@arm.com>
|
|
Tasks being dequeued for the last time (state == TASK_DEAD) are dequeued
with the DEQUEUE_SLEEP flag which causes their load and utilization
contributions to be added to the runqueue blocked load and utilization.
Hence they will contain load or utilization that is gone away. The issue
only exists for the root cfs_rq as cgroup_exit() doesn't set
DEQUEUE_SLEEP for task group exits.
If runnable+blocked load is to be used as a better estimate for cpu
load the dead task contributions need to be removed to prevent
load_balance() (idle_balance() in particular) from over-estimating the
cpu load.
cc: Ingo Molnar <mingo@redhat.com>
cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Morten Rasmussen <morten.rasmussen@arm.com>
|
|
Add the blocked utilization contribution to group sched_entity
utilization (se->avg.utilization_avg_contrib) and to get_cpu_usage().
With this change cpu usage now includes recent usage by currently
non-runnable tasks, hence it provides a more stable view of the cpu
usage. It does, however, also mean that the meaning of usage is changed:
A cpu may be momentarily idle while usage is >0. It can no longer be
assumed that cpu usage >0 implies runnable tasks on the rq.
cfs_rq->utilization_load_avg or nr_running should be used instead to get
the current rq status.
cc: Ingo Molnar <mingo@redhat.com>
cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Morten Rasmussen <morten.rasmussen@arm.com>
|
|
Introduces the blocked utilization, the utilization counter-part to
cfs_rq->blocked_load_avg. It is the sum of sched_entity utilization
contributions of entities that were recently on the cfs_rq and are
currently blocked. Combined with the sum of utilization of entities
currently on the cfs_rq or currently running
(cfs_rq->utilization_load_avg) this provides a more stable average
view of the cpu usage.
cc: Ingo Molnar <mingo@redhat.com>
cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Morten Rasmussen <morten.rasmussen@arm.com>
|
|
Since now we have besides frequency invariant also cpu (uarch plus max
system frequency) invariant cfs_rq::utilization_load_avg both, frequency
and cpu scaling happens as part of the load tracking.
So cfs_rq::utilization_load_avg does not have to be scaled by the original
capacity of the cpu again.
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
|
|
Reuses the existing infrastructure for cpu_scale to provide the scheduler
with a cpu scaling correction factor for more accurate load-tracking.
This factor comprises a micro-architectural part, which is based on the
cpu efficiency value of a cpu as well as a platform-wide max frequency
part, which relates to the dtb property clock-frequency of a cpu node.
The calculation of cpu_scale, return value of arch_scale_cpu_capacity,
changes from
capacity / middle_capacity
with capacity = (clock_frequency >> 20) * cpu_efficiency
to
SCHED_CAPACITY_SCALE * cpu_perf / max_cpu_perf
The range of the cpu_scale value changes from
[0..3*SCHED_CAPACITY_SCALE/2] to [0..SCHED_CAPACITY_SCALE].
The functionality to calculate the middle_capacity which corresponds to an
'average' cpu has been taken out since the scaling is now done
differently.
In the case that either the cpu efficiency or the clock-frequency value
for a cpu is missing, no cpu scaling is done for any cpu.
The platform-wide max frequency part of the factor should not be confused
with the frequency invariant scheduler load-tracking support which deals
with frequency related scaling due to DFVS functionality on a cpu.
Cc: Russell King <linux@arm.linux.org.uk>
Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
|
|
Besides the existing frequency scale-invariance correction factor, apply
cpu scale-invariance correction factor to usage tracking.
Cpu scale-invariance takes cpu performance deviations due to
micro-architectural differences (i.e. instructions per seconds) between
cpus in HMP systems (e.g. big.LITTLE) and differences in the frequency
value of the highest OPP between cpus in SMP systems into consideration.
Each segment of the sched_avg::running_avg_sum geometric series is now
scaled by the cpu performance factor too so the
sched_avg::utilization_avg_contrib of each entity will be invariant from
the particular cpu of the HMP/SMP system it is gathered on.
So the usage level that is returned by get_cpu_usage stays relative to
the max cpu performance of the system.
In contrast to usage, load (sched_avg::runnable_avg_sum) is currently
not considered to be made cpu scale-invariant because this will have a
negative effect on the the existing load balance code based on
s[dg]_lb_stats::avg_load in overload scenarios.
example: 7 always running tasks
4 on cluster 0 (2 cpus w/ cpu_capacity=512)
3 on cluster 1 (1 cpu w/ cpu_capacity=1024)
cluster 0 cluster 1
capacity 1024 (2*512) 1024 (1*1024)
load 4096 3072
cpu-scaled load 2048 3072
Simply using cpu-scaled load in the existing lb code would declare
cluster 1 busier than cluster 0, although the compute capacity budget
for one task is higher on cluster 1 (1024/3 = 341) than on cluster 0
(2*512/4 = 256).
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
|
|
arch_scale_cpu_capacity() is no longer a weak function but a #define
instead. Include the #define in topology.h.
cc: Russell King <linux@arm.linux.org.uk>
Signed-off-by: Morten Rasmussen <morten.rasmussen@arm.com>
|
|
Bring arch_scale_cpu_capacity() in line with the recent change of its
arch_scale_freq_capacity() sibling in commit dfbca41f3479 ("sched:
Optimize freq invariant accounting") from weak function to #define to
allow inlining of the function.
While at it, remove the ARCH_CAPACITY sched_feature as well. With the
change to #define there isn't a straightforward way to allow runtime
switch between an arch implementation and the default implementation of
arch_scale_cpu_capacity() using sched_feature. The default was to use
the arch-specific implementation, but only the arm architecture provided
one.
cc: Ingo Molnar <mingo@redhat.com>
cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Morten Rasmussen <morten.rasmussen@arm.com>
|
|
To enable the parsing of clock frequency and cpu efficiency values
inside parse_dt_topology [arch/arm/kernel/topology.c] to scale the
relative capacity of the cpus, this property has to be provided within
the cpu nodes of the dts file.
The patch is a copy of commit 8f15973ef8c3 ("ARM: vexpress: Add CPU
clock-frequencies to TC2 device-tree") taken from Linaro Stable Kernel
(LSK) massaged into mainline.
Cc: Jon Medhurst <tixy@linaro.org>
Cc: Russell King <linux@arm.linux.org.uk>
Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
|
|
Apply frequency scale-invariance correction factor to load tracking.
Each segment of the sched_avg::runnable_avg_sum geometric series is now
scaled by the current frequency so the sched_avg::load_avg_contrib of each
entity will be invariant with frequency scaling. As a result,
cfs_rq::runnable_load_avg which is the sum of sched_avg::load_avg_contrib,
becomes invariant too. So the load level that is returned by
weighted_cpuload, stays relative to the max frequency of the cpu.
Then, we want the keep the load tracking values in a 32bits type, which
implies that the max value of sched_avg::{runnable|running}_avg_sum must
be lower than 2^32/88761=48388 (88761 is the max weight of a task). As
LOAD_AVG_MAX = 47742, arch_scale_freq_capacity must return a value less
than (48388/47742) << SCHED_CAPACITY_SHIFT = 1037 (SCHED_SCALE_CAPACITY =
1024). So we define the range to [0..SCHED_SCALE_CAPACITY] in order to
avoid overflow.
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Acked-by: Vincent Guittot <vincent.guittot@linaro.org>
|
|
Implements arch-specific function to provide the scheduler with a
frequency scaling correction factor for more accurate load-tracking.
The factor is:
current_freq(cpu) << SCHED_CAPACITY_SHIFT / max_freq(cpu)
This implementation only provides frequency invariance. No cpu
invariance yet.
Cc: Russell King <linux@arm.linux.org.uk>
Signed-off-by: Morten Rasmussen <morten.rasmussen@arm.com>
|
|
Add node mmc0 ~ mmc3 for mt8173.dtsi
Add node mmc0, mmc1 for mt8173-evb.dts
Change-Id: I150e7bbe9d9c110135eef1795bc0719934f2c098
Signed-off-by: Chaotian Jing <chaotian.jing@mediatek.com>
Signed-off-by: Eddie Huang <eddie.huang@mediatek.com>
|
|
This patch adds the required properties in device tree to enable MT8173
cpufreq driver.
Signed-off-by: Pi-Cheng Chen <pi-cheng.chen@linaro.org>
|
|
Mediatek MT8173 is an ARMv8 based quad-core (2*Cortex-A53 and
2*Cortex-A72) SoC with duall clusters. For each cluster, two voltage
inputs, Vproc and Vsram are supplied by two regulators. For the big
cluster, two regulators come from different PMICs. In this case, when
scaling voltage inputs of the cluster, the voltages of two regulator
inputs need to be controlled by software explicitly under the SoC
specific limitation:
100mV < Vsram - Vproc < 200mV
which is called 'voltage tracking' mechanism. And when scaling the
frequency of cluster clock input, the input MUX clock need to be
parented to another "intermediate" stable PLL first and reparented to
the original PLL once the original PLL is stable at the target
frequency. This patch implements those mechanisms to enable CPU DVFS
support for Mediatek MT8173 SoC.
Signed-off-by: Pi-Cheng Chen <pi-cheng.chen@linaro.org>
|
|
Change-Id: I2c12414879b827346c0329f3a5189299b764811b
Signed-off-by: Chunfeng Yun <chunfeng.yun@mediatek.com>
|
|
MTK xhci host controller defines some extra SW scheduling
parameters for HW to minimize the scheduling effort for
synchronous and interrupt endpoints. The parameters are
put into reseved DWs of slot context and endpoint context
Change-Id: Ic7644f98e1d747a8e863dd3fb3c99cbcb728cde6
Signed-off-by: Chunfeng Yun <chunfeng.yun@mediatek.com>
|
|
Change-Id: I69c12fa9e9c716249dc8f35f0959425ead1f4265
Signed-off-by: Chunfeng Yun <chunfeng.yun@mediatek.com>
|
|
add a DT binding documentation of xHCI host controller for the
MT8173 SoC from Mediatek.
Change-Id: I0ac0884c4bc6e3783c4f9cc028c061b019dd2825
Signed-off-by: Chunfeng Yun <chunfeng.yun@mediatek.com>
|
|
add a DT binding documentation of usb3.0 phy for MT65xx
SoCs from Mediatek.
Change-Id: Ia8099589ac3953bfa933a336048e4889450a1360
Signed-off-by: Chunfeng Yun <chunfeng.yun@mediatek.com>
|
|
This patch adds CPU mux clocks which are used by Mediatek cpufreq driver
for intermediate clock source switching.
Signed-off-by: Pi-Cheng Chen <pi-cheng.chen@linaro.org>
Reviewed-by: Daniel Kurtz <djkurtz@chromium.org>
|
|
Add REF2USB_TX clock support into MT8173 APMIXEDSYS. This clock
is needed by USB 3.0.
Change-Id: I6c94fa8ce5f35dad7a1cda6dbf403b3bd6b81905
Signed-off-by: James Liao <jamesjj.liao@mediatek.com>
|
|
Most multimedia subsystem clocks will be accessed by multiple
drivers, so it's a better way to manage these clocks in CCF.
This patch adds clock support for MM, IMG, VDEC, VENC and VENC_LT
subsystems.
Change-Id: I9cc9f8a066806d3c2d0769062182dad419499628
Signed-off-by: James Liao <jamesjj.liao@mediatek.com>
|
|
This adds the binding documentation for the mmsys, imgsys, vdecsys,
vencsys and vencltsys controllers found on Mediatek SoCs.
Change-Id: Id583b6030541292fafeaa1ede7d45ad7bf540226
Signed-off-by: James Liao <jamesjj.liao@mediatek.com>
|
|
On the MT8173 the clocks are provided by different units. To enable
the critical clocks we must be sure that all parent clocks are already
registered, otherwise the parents of the critical clocks end up being
unused and get disabled later. To find a place where all parents are
registered we try each time after we've registered some clocks if
all known providers are present now and only then we enable the critical
clocks
Change-Id: Iee564e380522380ddd06b109e3fcc8f0b6ec7c52
Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de>
Signed-off-by: James Liao <jamesjj.liao@mediatek.com>
Reviewed-by: Daniel Kurtz <djkurtz@chromium.org>
|