Age | Commit message (Collapse) | Author |
|
|
|
|
|
This simple governor takes into account the predictable events: the timer sleep
duration and the next expected irq sleep duration. By mixing both it deduced
what idle state fits better.
The main purpose of this governor is to handle the guessed next events in a
categorized way:
1. deterministic events : timers
2. guessed events : IOs
3. predictable events : keystroke, incoming network packet, ...
This governor is aimed to be moved later near the scheduler, so this one
can inspect/inject more informations and act proactively rather than
reactively.
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Conflicts:
drivers/cpuidle/Kconfig
|
|
This is for KVM sata virtual hardware.
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
|
|
The irq timing framework will track only the interrupts which were explicitely
specified with the flag IRQF_TIMING. That allows to have a fine grained control
of what we are tracking as source of events. Unfortunately each driver must be
modified accordingly and that makes very difficult the development.
This patch adds /proc/irq/<nr>/timing boolean file to specify from userspace the
interrupts we want to track. So no need to find and to change for each platform
the drivers in the kernel. That can be done from userspace now.
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
|
|
|
|
Signed-off-by: Nicolas Pitre <nico@linaro.org>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
|
|
The standard deviation may be obtained by computing the variance's square
root. Given the cost this is left to user space to do.
Signed-off-by: Nicolas Pitre <nico@linaro.org>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
|
|
Signed-off-by: Nicolas Pitre <nico@linaro.org>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
|
|
Those events in the past, if any, are purged and then the first item on
the list, if any, contains our next predicted IRQ time.
We have access to the standard deviation and could use it to qualify our
confidence in the prediction eventually. For now it's only the raw
prediction that is returned.
Signed-off-by: Nicolas Pitre <nico@linaro.org>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
|
|
Once a good IRQ prediction is made, we need to enqueue it for later
consumption. While at it we discard any predictions whose time stamp
is in the past.
There shouldn't be that many expected IRQs at any given time. A sorted
list is most likely going to be good enough. And, by definition, the most
frequent IRQs will end up near the beginning of the list anyway.
Tthere is no generic way to determine what the IRQ controller is going
to do if the IRQ affinity mask contains multiple CPUs. It is therefore
assumed that the next occurrence of an IRQ is most likely to happen on
the same CPU as the last one. It appears to be the case overall from
observations on X86 despite active migration controlled from user space.
On ARM it is the first CPU in the affinity mask that is selected by the
GIC driver so this assumption is quite right in that case. If migration
frequency becomes significant compared to IRQ occurrences then we could
consider registering an affinity notifier.
Signed-off-by: Nicolas Pitre <nico@linaro.org>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
|
|
Many IRQs are quiet most of the time, or they tend to come in bursts of
fairly equal time intervals within each burst. It is therefore possible
to detect those IRQs with stable intervals and guestimate when the next
IRQ event is most likely to happen.
Examples of such IRQs may include audio related IRQs where the FIFO size
and/or DMA descriptor size with the sample rate create stable intervals,
block devices during large data transfers, etc. Even network streaming
of multimedia content creates patterns of periodic network interface IRQs
in some cases.
This patch adds code to track the mean interval and variance for each IRQ
over a window of time intervals between IRQ events. Those statistics can
be used to assist cpuidle in selecting the most appropriate sleep state
by predicting the most likely time for the next interrupt.
Because the stats are gathered in interrupt context, the core computation
is as light as possible, turning into 3 subs, 3 adds and 6 mults where 4
of those mults involve a small compile-time constant.
Signed-off-by: Nicolas Pitre <nico@linaro.org>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
|
|
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
|
|
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
|
|
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
|
|
This patch gives some statitics under /sys/devices/system/cpu/cpuX/cpuidle/stats
The statistics are regarding the prediction vs the idle state.
under_estimate: the sleep duration was longer than expected and a deeper idle
could have been choose in this case
over_estimate: the sleep duration was shorter than expected and we should have
chose a shallower state, that increase exit latency
right_estimate: the sleep duration is correct regarding the target residency
and the idle state chose.
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
|
|
The menu governor chooses a state with a selection which is the same than the
one we previously defined with the 'cpuidle_find_state' function in the
previous patch. Let's use it and factor out the code.
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
|
|
The selection loop follows the logic "choose an idle state fulfilling the sleep
duration and the exit latency constraints". It can be encapsulated into a
function and reused in different places. It gives the API to choose an idle
state regarding the timing information passed as parameter.
The 'cpuidle_find_deepest_state' function does the similar selection. The new
function can be used instead with infinite time constraints as we are going to
suspend.
That is one more step to the cpuidle/scheduler integration.
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
|
|
In the code, the convention is the deeper an idle is, the greater the exit
latency is. No need to check the exit latency between the states.
Furthermore, that will allow in the next patches to factor out this loop.
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
|
|
The CPUIDLE_DRIVER_STATE_START macro is no longer used.
Remove it.
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
|
|
The poll state does no longer exists, we can safely default to the index
state 0 as now it won't be the poll state anymore.
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
|
|
All the code is avoiding to use the poll state. The only place where the poll
state is used is when we failed to find an idle state and there is a timer to
expire within 5us. But in this place we call directly cpuidle_poll without
using the poll state callback in the idle state array of the cpuidle driver.
The poll state in the driveri's idle state array is no longer used.
Remove it and cleanup this mess.
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
|
|
In order to remove the poll state from the idle drivers in the next patch,
let's remove the usage of the CPUIDLE_DRIVER_STATE_START from the idle state
selection loops.
1. cpuidle_play_dead will ignore the poll idle state because this one does not
implement the 'enter_dead' callback. We can safely remove the usage of the
CPUIDLE_DRIVER_STATE_START macro and start from the zero index.
2. cpuidle_find_deepest_state will always ignore the poll idle state because
its exit_latency is 0 and because of the check in the loop:
if (s->disabled || su->disable || s->exit_latency <= latency_req)
continue;
... it will be always ignored (exit_latency == latency_req)
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
|
|
The following applies only for x86.
The current code does default to C0, only if a timer is about to expire.
As the timer expiration is a reliable information, this has a double guarantee:
* we ensure fast exit from idle
* we ensure we won't be polling for a too long period which is dangerous from
a thermal point of view
Unfortunately this code brings a lot of weirdness all around the default idle
state and with the CPUIDLE_DRIVER_STATE_START macro.
Regarding the number of times the poll function is called (1/10000 on my
server), we can legitimately ask if this test is worth in the menu governor
with the hardware we have nowadays with very low exit latency.
Just remove this test. If it appears it brings a real measurable gain in
performance, we can re-introduce it in a more clever way later.
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
|
|
In order to prevent a pointless forward declaration, just move the function
at the beginning of the file.
This patch does not change the behavior of the governor, it is just code
reordering.
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Acked-by: Len Brown <len.brown@intel.com>
|
|
The first time the 'get_typical_function' is called, it computes an average
of zero as no data is filled yet. That leads the 'data->predicted_us' variable
to be set to zero too.
The caller, 'menu_select' will then do:
interactivity_req = data->predicted_us /
performance_multiplier(nr_iowaiters, cpu_load);
That sets the interactivity_req to zero (0/performance...).
and then
if (latency_req > interactivity_req)
latency_req = interactivity_req;
... setting 'latency_req' to zero too.
No idle state will fulfill this constraint and we will go the C1 state as
default and leading to an update. So the next calls will compute an average
different from zero.
Even if that works with the current code but with a broken semantic, it will
just break with the next patches where we are stricter with the latencies
check: the first check will fail (latency_req is zero), then no update will
occur leading to always falling to choose an idle state.
As there are no previous values and it is pointless to compute a standard
deviation for these unexisting values.
Change the function to return the computed value and use it only if it is
different from zero and greater than the next timer expiration.
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
|
|
In the current code, the check to reflect or not the outcoming state is done
against the idle state which has been chosen and its value.
Instead of doing a check in each of the reflect functions, just don't call reflect
if something went wrong in the idle path.
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Acked-by: Nicolas Pitre <nico@linaro.org>
|
|
Following the logic of the previous patch, retrieve from the idle task the
expected timer sleep duration and pass it to the cpuidle framework.
Take the opportunity to remove the unused headers in the menu.c file.
This patch does not change the current behavior.
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Acked-by: Nicolas Pitre <nico@linaro.org>
Reviewed-by: Len Brown <len.brown@intel.com>
|
|
When the pmqos latency requirement is set to zero that means "poll in all the
cases".
That is correctly implemented on x86 but not on the other archs.
As how is written the code, if the latency request is zero, the governor will
return zero, so corresponding, for x86, to the poll function, but for the
others arch the default idle function. For example, on ARM this is wait-for-
interrupt with a latency of '1', so violating the constraint.
In order to fix that, do the latency requirement check *before* calling the
cpuidle framework in order to jump to the poll function without entering
cpuidle. That has several benefits:
1. It clarifies and unifies the code
2. It fixes x86 vs other archs behavior
3. Factors out the call to the same function
4. Prevent to enter the cpuidle framework with its expensive cost in
calculation
As the latency_req is needed in all the cases, change the select API to take
the latency_req as parameter in case it is not equal to zero.
As a positive side effect, it introduces the latency constraint specified
externally, so one more step to the cpuidle/scheduler integration.
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Acked-by: Nicolas Pitre <nico@linaro.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Len Brown <len.brown@intel.com>
|
|
The poll function is called when a timer expired or if we force to poll when
the cpu_idle_force_poll option is set.
The poll function does:
local_irq_enable();
while (!tif_need_resched())
cpu_relax();
This default poll function suits for the x86 arch because its rep; nop;
hardware power optimization. But on other archs, this optimization does not
exists and we are not saving power. The arch specific bits may want to
optimize this loop by adding their own optimization.
Give an opportunity to the different platform to specify their own polling
loop by adding a weak cpu_idle_poll_loop function.
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
|
|
The find_idlest_cpu is assuming the rq->idle_stamp information reflects when
the cpu entered the idle state. This is wrong as the cpu may exit and enter
the idle state several times without the rq->idle_stamp being updated.
We have two informations here:
* rq->idle_stamp gives when the idle task has been scheduled
* idle->idle_stamp gives when the cpu entered the idle state
The patch fixes that by using the latter information and fallbacks to
the rq's timestamp when the idle state is not accessible
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
|
|
The code is a bit poor in comments. Fix that by adding some comments in the
cpuidle enter function.
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
|
|
The scheduler uses the idle timestamp stored in the struct rq to retrieve the
time when the cpu went idle in order to find the idlest cpu. Unfortunately
this information is wrong as it does not have the same meaning from the cpuidle
point of view. The idle_stamp in the struct rq gives the information when the
idle task has been scheduled while the idle task could be interrupted several
times and the cpu going through an idle/wakeup multiple times.
Add the idle start time in the idle state structure.
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
|
|
The only place where the time is invalid is when the ACPI_CSTATE_FFH entry
method is not set. Otherwise for all the drivers, the time can be correctly
measured.
Instead of duplicating the CPUIDLE_FLAG_TIME_VALID flag in all the drivers
for all the states, just invert the logic by replacing it by the flag
CPUIDLE_FLAG_TIME_INVALID, hence we can set this flag only for the acpi idle
driver, remove the former flag from all the drivers and invert the logic with
this flag in the different governor.
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata
Pull libata fixes from Tejun Heo:
"Three libata fixes for v3.18. Nothing too interesting. PCI ID ID and
quirk additions to ahci and an error handling path fix in sata_fsl"
* 'for-3.18-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata:
ahci: disable MSI on SAMSUNG 0xa800 SSD
sata_fsl: fix error handling of irq_of_parse_and_map
AHCI: Add DeviceIDs for Sunrise Point-LP SATA controller
|
|
Pull watchdog fix from Wim Van Sebroeck:
"Fix the watchdog mask bit offset for Exynos7"
* git://www.linux-watchdog.org/linux-watchdog:
watchdog: s3c2410_wdt: Fix the mask bit offset for Exynos7
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux
Pull i2c fixes from Wolfram Sang:
"Here are two more driver bugfixes for I2C which would be good to have"
* 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
i2c: cadence: Set the hardware time-out register to maximum value
i2c: davinci: generate STP always when NACK is received
|
|
The watchdog mask bit offset listed for Exynos7 is incorrect.
Fix this.
Signed-off-by: Abhilash Kesavan <a.kesavan@samsung.com>
Acked-by: Naveen Krishna Chatradhi <naveenkrishna.ch@gmail.com
Reviewd-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Wim Van Sebroeck <wim@iguana.be>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Thomas Gleixner:
"Two final fixlets for 3.18:
- Prevent microcode reload wreckage on 32bit
- Unbreak cross compilation"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86, microcode: Limit the microcode reloading to 64-bit for now
x86: Use $(OBJDUMP) instead of plain objdump
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound fixlet from Takashi Iwai:
"Just one commit for adding a copule of HD-audio quirk entries"
* tag 'sound-3.18' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ALSA: hda/realtek - Add headset Mic support for new Dell machine
|
|
Pull drm intel fixes from Dave Airlie:
"Two intel stable fixes, that should be it from me for this round"
* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux:
drm/i915: Unlock panel even when LVDS is disabled
drm/i915: More cautious with pch fifo underruns
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI backlight fix from Rafael Wysocki:
"This is a simple fix for an ACPI backlight regression introduced by a
recent commit that overlooked a corner case which should have been
taken into account"
* tag 'pm+acpi-3.18-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI / video: update condition to check if device is in _DOD list
|
|
git://anongit.freedesktop.org/drm-intel into drm-fixes
Silence some pch fifo underrun reports and panel locking backtraces,
both cc: stable.
* tag 'drm-intel-fixes-2014-12-04' of git://anongit.freedesktop.org/drm-intel:
drm/i915: Unlock panel even when LVDS is disabled
drm/i915: More cautious with pch fifo underruns
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media
Pull media fixes from Mauro Carvalho Chehab:
"A core fix and some driver fixes:
- regression fix in Remote Controller core affecting RC6 protocol
handling
- fix video buffer handling in cx23885
- race fix in solo6x10
- fix image selection in smiapp
- fix reported payload size on s2255drv
- two updates for MAINTAINERS file"
* tag 'media/v3.18-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media:
[media] rc-core: fix toggle handling in the rc6 decoder
MAINTAINERS: Update mchehab's addresses
[media] cx23885: use sg = sg_next(sg) instead of sg++
[media] s2255drv: fix payload size for JPG, MJPEG
[media] Update MAINTAINERS for solo6x10
[media] solo6x10: fix a race in IRQ handler
[media] smiapp: Only some selection targets are settable
|
|
A typo "header=y" was introduced by commit 7071cf7fc435 ("uapi: add
missing network related headers to kbuild").
Signed-off-by: Masahiro Yamada <yamada.m@jp.panasonic.com>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Cadence I2C controller has bug wherein it generates invalid read transactions
after timeout in master receiver mode. This driver does not use the HW
timeout and this interrupt is disabled but the feature itself cannot be
disabled. Hence, this patch writes the maximum value (0xFF) to this register.
This is one of the workarounds to this bug and it will not avoid the issue
completely but reduces the chances of error.
Signed-off-by: Vishnu Motghare <vishnum@xilinx.com>
Signed-off-by: Harini Katakam <harinik@xilinx.com>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Cc: stable@kernel.org
|
|
According to I2C specification the NACK should be handled as follows:
"When SDA remains HIGH during this ninth clock pulse, this is defined as the Not
Acknowledge signal. The master can then generate either a STOP condition to
abort the transfer, or a repeated START condition to start a new transfer."
[I2C spec Rev. 6, 3.1.6: http://www.nxp.com/documents/user_manual/UM10204.pdf]
Currently the Davinci i2c driver interrupts the transfer on receipt of a
NACK but fails to send a STOP in some situations and so makes the bus
stuck until next I2C IP reset (idle/enable).
For example, the issue will happen during SMBus read transfer which
consists from two i2c messages write command/address and read data:
S Slave Address Wr A Command Code A Sr Slave Address Rd A D1..Dn A P
<--- write -----------------------> <--- read --------------------->
The I2C client device will send NACK if it can't recognize "Command Code"
and it's expected from I2C master to generate STP in this case.
But now, Davinci i2C driver will just exit with -EREMOTEIO and STP will
not be generated.
Hence, fix it by generating Stop condition (STP) always when NACK is received.
This patch fixes Davinci I2C in the same way it was done for OMAP I2C
commit cda2109a26eb ("i2c: omap: query STP always when NACK is received").
Reviewed-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Reported-by: Hein Tibosch <hein_tibosch@yahoo.es>
Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Cc: stable@kernel.org
|
|
Just like 0x1600 which got blacklisted by 66a7cbc303f4 ("ahci: disable
MSI instead of NCQ on Samsung pci-e SSDs on macbooks"), 0xa800 chokes
on NCQ commands if MSI is enabled. Disable MSI.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Dominik Mierzejewski <dominik@greysector.net>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=89171
Cc: stable@vger.kernel.org
|
|
It appears that some SCHEDULE_USER (asm for schedule_user) callers
in arch/x86/kernel/entry_64.S are called from RCU kernel context,
and schedule_user will return in RCU user context. This causes RCU
warnings and possible failures.
This is intended to be a minimal fix suitable for 3.18.
Reported-and-tested-by: Dave Jones <davej@redhat.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Frédéric Weisbecker <fweisbec@gmail.com>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|