aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2019-04-29Provide in-kernel headers to make extending kernel easierJoel Fernandes (Google)
Introduce in-kernel headers which are made available as an archive through proc (/proc/kheaders.tar.xz file). This archive makes it possible to run eBPF and other tracing programs that need to extend the kernel for tracing purposes without any dependency on the file system having headers. A github PR is sent for the corresponding BCC patch at: https://github.com/iovisor/bcc/pull/2312 On Android and embedded systems, it is common to switch kernels but not have kernel headers available on the file system. Further once a different kernel is booted, any headers stored on the file system will no longer be useful. This is an issue even well known to distros. By storing the headers as a compressed archive within the kernel, we can avoid these issues that have been a hindrance for a long time. The best way to use this feature is by building it in. Several users have a need for this, when they switch debug kernels, they do not want to update the filesystem or worry about it where to store the headers on it. However, the feature is also buildable as a module in case the user desires it not being part of the kernel image. This makes it possible to load and unload the headers from memory on demand. A tracing program can load the module, do its operations, and then unload the module to save kernel memory. The total memory needed is 3.3MB. By having the archive available at a fixed location independent of filesystem dependencies and conventions, all debugging tools can directly refer to the fixed location for the archive, without concerning with where the headers on a typical filesystem which significantly simplifies tooling that needs kernel headers. The code to read the headers is based on /proc/config.gz code and uses the same technique to embed the headers. Other approaches were discussed such as having an in-memory mountable filesystem, but that has drawbacks such as requiring an in-kernel xz decompressor which we don't have today, and requiring usage of 42 MB of kernel memory to host the decompressed headers at anytime. Also this approach is simpler than such approaches. Reviewed-by: Masahiro Yamada <yamada.masahiro@socionext.com> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-04-28kobject: Improve doc clarity kobject_init_and_add()Tobin C. Harding
Function kobject_init_and_add() is currently misused in a number of places in the kernel. On error return kobject_put() must be called but is at times not. Make the function documentation more explicit about calling kobject_put() in the error path. Signed-off-by: Tobin C. Harding <tobin@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-04-28kobject: Improve docs for kobject_add/delTobin C. Harding
There is currently some confusion on how to wind back kobject_init_and_add() during the error paths in code that uses this function. Add documentation to kobject_add() and kobject_del() to help clarify the usage. Signed-off-by: Tobin C. Harding <tobin@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-04-25driver core: platform: Fix the usage of platform device name(pdev->name)Venkata Narendra Kumar Gutta
Platform core is using pdev->name as the platform device name to do the binding of the devices with the drivers. But, when the platform driver overrides the platform device name with dev_set_name(), the pdev->name is pointing to a location which is freed and becomes an invalid parameter to do the binding match. use-after-free instance: [ 33.325013] BUG: KASAN: use-after-free in strcmp+0x8c/0xb0 [ 33.330646] Read of size 1 at addr ffffffc10beae600 by task modprobe [ 33.339068] CPU: 5 PID: 518 Comm: modprobe Tainted: G S W O 4.19.30+ #3 [ 33.346835] Hardware name: MTP (DT) [ 33.350419] Call trace: [ 33.352941] dump_backtrace+0x0/0x3b8 [ 33.356713] show_stack+0x24/0x30 [ 33.360119] dump_stack+0x160/0x1d8 [ 33.363709] print_address_description+0x84/0x2e0 [ 33.368549] kasan_report+0x26c/0x2d0 [ 33.372322] __asan_report_load1_noabort+0x2c/0x38 [ 33.377248] strcmp+0x8c/0xb0 [ 33.380306] platform_match+0x70/0x1f8 [ 33.384168] __driver_attach+0x78/0x3a0 [ 33.388111] bus_for_each_dev+0x13c/0x1b8 [ 33.392237] driver_attach+0x4c/0x58 [ 33.395910] bus_add_driver+0x350/0x560 [ 33.399854] driver_register+0x23c/0x328 [ 33.403886] __platform_driver_register+0xd0/0xe0 So, use dev_name(&pdev->dev), which fetches the platform device name from the kobject(dev->kobj->name) of the device instead of the pdev->name. Signed-off-by: Venkata Narendra Kumar Gutta <vnkgutta@codeaurora.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-04-25livepatch: Replace klp_ktype_patch's default_attrs with groupsKimberly Brown
The kobj_type default_attrs field is being replaced by the default_groups field. Replace klp_ktype_patch's default_attrs field with default_groups and use the ATTRIBUTE_GROUPS macro to create klp_patch_groups. This patch was tested by loading the livepatch-sample module and verifying that the sysfs files for the attributes in the default groups were created. Signed-off-by: Kimberly Brown <kimbrownkd@gmail.com> Acked-by: Jiri Kosina <jkosina@suse.cz> Acked-by: Miroslav Benes <mbenes@suse.cz> Acked-by: Petr Mladek <pmladek@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-04-25cpufreq: schedutil: Replace default_attrs field with groupsKimberly Brown
The kobj_type default_attrs field is being replaced by the default_groups field. Replace sugov_tunables_ktype's default_attrs field with default groups. Change "sugov_attributes" to "sugov_attrs" and use the ATTRIBUTE_GROUPS macro to create sugov_groups. This patch was tested by setting the scaling governor to schedutil and verifying that the sysfs files for the attributes in the default groups were created. Signed-off-by: Kimberly Brown <kimbrownkd@gmail.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-04-25padata: Replace padata_attr_type default_attrs field with groupsKimberly Brown
The kobj_type default_attrs field is being replaced by the default_groups field. Replace padata_attr_type's default_attrs field with default_groups and use the ATTRIBUTE_GROUPS macro to create padata_default_groups. This patch was tested by loading the pcrypt module and verifying that the sysfs files for the attributes in the default groups were created. Signed-off-by: Kimberly Brown <kimbrownkd@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-04-25irqdesc: Replace irq_kobj_type's default_attrs field with groupsKimberly Brown
The kobj_type default_attrs field is being replaced by the default_groups field. Replace irq_kobj_type's default_attrs field with default_groups and use the ATTRIBUTE_GROUPS macro to create irq_groups. This patch was tested by verifying that the sysfs files for the attributes in the default groups were created. Signed-off-by: Kimberly Brown <kimbrownkd@gmail.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-04-25net-sysfs: Replace ktype default_attrs field with groupsKimberly Brown
The kobj_type default_attrs field is being replaced by the default_groups field. Replace the default_attrs fields in rx_queue_ktype and netdev_queue_ktype with default_groups. Use the ATTRIBUTE_GROUPS macro to create rx_queue_default_groups and netdev_queue_default_groups. This patch was tested by verifying that the sysfs files for the attributes in the default groups were created. Signed-off-by: Kimberly Brown <kimbrownkd@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-04-25block: Replace all ktype default_attrs with groupsKimberly Brown
The kobj_type default_attrs field is being replaced by the default_groups field. Replace all of the ktype default_attrs fields in the block subsystem with default_groups and use the ATTRIBUTE_GROUPS macro to create the default groups. Remove default_ctx_attrs[] because it doesn't contain any attributes. This patch was tested by verifying that the sysfs files for the attributes in the default groups were created. Signed-off-by: Kimberly Brown <kimbrownkd@gmail.com> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-04-25samples/kobject: Replace foo_ktype's default_attrs field with groupsKimberly Brown
The kobj_type default_attrs field is being replaced by the default_groups field. Replace foo_ktype's default_attrs field with default_groups and use the ATTRIBUTE_GROUPS macro to create foo_default_groups. This patch was tested by loading the kset-example module and verifying that the sysfs files for the attributes in the default group were created. Signed-off-by: Kimberly Brown <kimbrownkd@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-04-25kobject: Add support for default attribute groups to kobj_typeKimberly Brown
kobj_type currently uses a list of individual attributes to store default attributes. Attribute groups are more flexible than a list of attributes because groups provide support for attribute visibility. So, add support for default attribute groups to kobj_type. In future patches, the existing uses of kobj_type’s attribute list will be converted to attribute groups. When that is complete, kobj_type’s attribute list, “default_attrs”, will be removed. Signed-off-by: Kimberly Brown <kimbrownkd@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-04-25driver core: Postpone DMA tear-down until after devres release for probe failureJohn Garry
In commit 376991db4b64 ("driver core: Postpone DMA tear-down until after devres release"), we changed the ordering of tearing down the device DMA ops and releasing all the device's resources; this was because the DMA ops should be maintained until we release the device's managed DMA memories. However, we have seen another crash on an arm64 system when a device driver probe fails: hisi_sas_v3_hw 0000:74:02.0: Adding to iommu group 2 scsi host1: hisi_sas_v3_hw BUG: Bad page state in process swapper/0 pfn:313f5 page:ffff7e0000c4fd40 count:1 mapcount:0 mapping:0000000000000000 index:0x0 flags: 0xfffe00000001000(reserved) raw: 0fffe00000001000 ffff7e0000c4fd48 ffff7e0000c4fd48 0000000000000000 raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000 page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set bad because of flags: 0x1000(reserved) Modules linked in: CPU: 49 PID: 1 Comm: swapper/0 Not tainted 5.1.0-rc1-43081-g22d97fd-dirty #1433 Hardware name: Huawei D06/D06, BIOS Hisilicon D06 UEFI RC0 - V1.12.01 01/29/2019 Call trace: dump_backtrace+0x0/0x118 show_stack+0x14/0x1c dump_stack+0xa4/0xc8 bad_page+0xe4/0x13c free_pages_check_bad+0x4c/0xc0 __free_pages_ok+0x30c/0x340 __free_pages+0x30/0x44 __dma_direct_free_pages+0x30/0x38 dma_direct_free+0x24/0x38 dma_free_attrs+0x9c/0xd8 dmam_release+0x20/0x28 release_nodes+0x17c/0x220 devres_release_all+0x34/0x54 really_probe+0xc4/0x2c8 driver_probe_device+0x58/0xfc device_driver_attach+0x68/0x70 __driver_attach+0x94/0xdc bus_for_each_dev+0x5c/0xb4 driver_attach+0x20/0x28 bus_add_driver+0x14c/0x200 driver_register+0x6c/0x124 __pci_register_driver+0x48/0x50 sas_v3_pci_driver_init+0x20/0x28 do_one_initcall+0x40/0x25c kernel_init_freeable+0x2b8/0x3c0 kernel_init+0x10/0x100 ret_from_fork+0x10/0x18 Disabling lock debugging due to kernel taint BUG: Bad page state in process swapper/0 pfn:313f6 page:ffff7e0000c4fd80 count:1 mapcount:0 mapping:0000000000000000 index:0x0 [ 89.322983] flags: 0xfffe00000001000(reserved) raw: 0fffe00000001000 ffff7e0000c4fd88 ffff7e0000c4fd88 0000000000000000 raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000 The crash occurs for the same reason. In this case, on the really_probe() failure path, we are still clearing the DMA ops prior to releasing the device's managed memories. This patch fixes this issue by reordering the DMA ops teardown and the call to devres_release_all() on the failure path. Reported-by: Xiang Chen <chenxiang66@hisilicon.com> Tested-by: Xiang Chen <chenxiang66@hisilicon.com> Signed-off-by: John Garry <john.garry@huawei.com> Reviewed-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-04-25driver core: platform: Propagate error from insert_resource()Andy Shevchenko
Since insert_resource() might return an error we don't need to shadow its error code and would safely propagate to the user. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-04-25kernfs: fix barrier usage in __kernfs_new_node()Andrea Parri
smp_mb__before_atomic() can not be applied to atomic_set(). Remove the barrier and rely on RELEASE synchronization. Fixes: ba16b2846a8c6 ("kernfs: add an API to get kernfs node from inode number") Cc: stable@vger.kernel.org Signed-off-by: Andrea Parri <andrea.parri@amarulasolutions.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-04-25fs: kernfs: Corrected spelling mistakeChristina Quast
flies => files Signed-off-by: Christina Quast <cquast@hanoverdisplays.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-04-25acpi/hmat: fix an uninitialized memory_targetQian Cai
The commit 665ac7e92757 ("acpi/hmat: Register processor domain to its memory") introduced an uninitialized "struct memory_target" that could cause an incorrect branching. drivers/acpi/hmat/hmat.c:385:6: warning: variable 'target' is used uninitialized whenever 'if' condition is false [-Wsometimes-uninitialized] if (p->flags & ACPI_HMAT_MEMORY_PD_VALID) { ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ drivers/acpi/hmat/hmat.c:392:6: note: uninitialized use occurs here if (target && p->flags & ACPI_HMAT_PROCESSOR_PD_VALID) { ^~~~~~ drivers/acpi/hmat/hmat.c:385:2: note: remove the 'if' if its condition is always true if (p->flags & ACPI_HMAT_MEMORY_PD_VALID) { ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ drivers/acpi/hmat/hmat.c:369:30: note: initialize the variable 'target' to silence this warning struct memory_target *target; ^ = NULL Signed-off-by: Qian Cai <cai@lca.pw> Reviewed-by: Mukesh Ojha <mojha@codeaurora.org> Fixes: 665ac7e92757 ("acpi/hmat: Register processor domain to its memory") Reviewed-by: Nathan Chancellor <natechancellor@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-04-25acpi/hmat: Update acpi_hmat_type enum with ACPI_HMAT_TYPE_PROXIMITYAlison Schofield
ACPI 6.3 changed the subtable "Memory Subsystem Address Range Structure" to "Memory Proximity Domain Attributes Structure". Updating and renaming of the structure was included in commit: ACPICA: ACPI 6.3: HMAT updates (9a8d961f1ef835b0d338fbe13da03cb424e87ae5) Rename the enum type to match the subtable and structure naming. Signed-off-by: Alison Schofield <alison.schofield@intel.com> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-04-25acpi/hmat: fix memory leaks in hmat_init()Qian Cai
The commit 665ac7e92757 ("acpi/hmat: Register processor domain to its memory") introduced some memory leaks below due to it fails to release the heap memory in an error path, and then those statically-allocated __initdata memory which reference them get freed during boot renders those heap memory as leaks. Since it is valid to pass NULL to acpi_put_table(), it is fine to call it even if acpi_get_table() returns an error. unreferenced object 0xc8ff8008349e9400 (size 128): comm "swapper/0", pid 1, jiffies 4294709236 (age 48121.476s) hex dump (first 32 bytes): 00 d0 9e 34 08 80 ff 84 d8 00 43 11 00 10 ff ff ...4......C..... 00 00 00 00 ff ff ff ff 00 00 00 00 00 00 00 00 ................ backtrace: [<00000000869d4503>] __kmalloc+0x568/0x600 [<0000000070fd6afb>] alloc_memory_target+0x50/0xd8 [<00000000efa2081e>] srat_parse_mem_affinity+0x58/0x5c [<000000008bfaef74>] acpi_parse_entries_array+0x1c8/0x2c0 [<0000000022804877>] acpi_table_parse_entries_array+0x11c/0x138 [<00000000ffe9cd34>] acpi_table_parse_entries+0x7c/0xac [<00000000a7023afd>] hmat_init+0x90/0x174 [<00000000694a86c1>] do_one_initcall+0x2d8/0x5f8 [<0000000024889da9>] do_initcall_level+0x37c/0x3fc [<000000009be02908>] do_basic_setup+0x38/0x50 [<0000000037b3ac0a>] kernel_init_freeable+0x194/0x258 [<00000000f5741184>] kernel_init+0x18/0x334 [<000000007b30f423>] ret_from_fork+0x10/0x18 [<000000006c7147a8>] 0xffffffffffffffff Signed-off-by: Qian Cai <cai@lca.pw> Fixes: 665ac7e92757 ("acpi/hmat: Register processor domain to its memory") Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-04-25drivers: fix a typo in the kernel doc for devm_platform_ioremap_resource()Bartosz Golaszewski
It should have been 'management' not 'managemend'. Fixes: 7945f929f1a7 ("drivers: provide devm_platform_ioremap_resource()") Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com> Reviewed-by: Mukesh Ojha <mojha@codeaurora.org> Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-04-25mm/memory_hotplug: Do not unlock when fails to take the device_hotplug_lockzhong jiang
When adding the memory by probing memory block in sysfs interface, there is an obvious issue that we will unlock the device_hotplug_lock when fails to takes it. That issue was introduced in Commit 8df1d0e4a265 ("mm/memory_hotplug: make add_memory() take the device_hotplug_lock") We should drop out in time when fails to take the device_hotplug_lock. Fixes: 8df1d0e4a265 ("mm/memory_hotplug: make add_memory() take the device_hotplug_lock") Reported-by: Yang yingliang <yangyingliang@huawei.com> Signed-off-by: zhong jiang <zhongjiang@huawei.com> Reviewed-by: Oscar Salvador <osalvador@suse.de> Reviewed-by: David Hildenbrand <david@redhat.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: stable <stable@vger.kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-04-25driver core: Clarify which counterparts to use to device_add()Borislav Petkov
It is not absolutely clear from the docs how the cleanup path after device_add() should look like so spell it out explicitly. No functional changes, just documentation. Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-04-25debugfs: update documented return values of debugfs helpersRonald Tschalär
Since commit ff9fb72bc077 ("debugfs: return error values, not NULL") these helper functions do not return NULL anymore (with the exception of debugfs_create_u32_array()). Fixes: ff9fb72bc077 ("debugfs: return error values, not NULL") Signed-off-by: Ronald Tschalär <ronald@innovation.ch> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-04-04drivers: base: power: add proper SPDX identifiers on files that did not have ↵Greg Kroah-Hartman
them. There were a few files in the driver core power code that did not have SPDX identifiers on them, so fix that up. At the same time, remove the "free form" text that specified the license of the file, as that is impossible for any tool to properly parse. Cc: "Rafael J. Wysocki" <rafael@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-04-04drivers: base: firmware_loader: add proper SPDX identifiers on files that ↵Greg Kroah-Hartman
did not have them. There were two files in the firmware_loader code that did not have SPDX identifiers on them, so fix that up. Cc: Luis Chamberlain <mcgrof@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-04-04drivers: base: test: add proper SPDX identifier to MakefileGreg Kroah-Hartman
The Makefile in the drivers/base/test/ directory did not have a SPDX identifier on it, so fix that up. Cc: "Rafael J. Wysocki" <rafael@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-04-04arch_topology: Make cpu_capacity sysfs node as read-onlyLingutla Chandrasekhar
If user updates any cpu's cpu_capacity, then the new value is going to be applied to all its online sibling cpus. But this need not to be correct always, as sibling cpus (in ARM, same micro architecture cpus) would have different cpu_capacity with different performance characteristics. So, updating the user supplied cpu_capacity to all cpu siblings is not correct. And another problem is, current code assumes that 'all cpus in a cluster or with same package_id (core_siblings), would have same cpu_capacity'. But with commit '5bdd2b3f0f8 ("arm64: topology: add support to remove cpu topology sibling masks")', when a cpu hotplugged out, the cpu information gets cleared in its sibling cpus. So, user supplied cpu_capacity would be applied to only online sibling cpus at the time. After that, if any cpu hotplugged in, it would have different cpu_capacity than its siblings, which breaks the above assumption. So, instead of mucking around the core sibling mask for user supplied value, use device-tree to set cpu capacity. And make the cpu_capacity node as read-only to know the asymmetry between cpus in the system. While at it, remove cpu_scale_mutex usage, which used for sysfs write protection. Tested-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Tested-by: Quentin Perret <quentin.perret@arm.com> Reviewed-by: Quentin Perret <quentin.perret@arm.com> Acked-by: Sudeep Holla <sudeep.holla@arm.com> Signed-off-by: Lingutla Chandrasekhar <clingutla@codeaurora.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-04-04doc/mm: New documentation for memory performanceKeith Busch
Platforms may provide system memory where some physical address ranges perform differently than others, or is cached by the system on the memory side. Add documentation describing a high level overview of such systems and the perforamnce and caching attributes the kernel provides for applications wishing to query this information. Reviewed-by: Mike Rapoport <rppt@linux.ibm.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Keith Busch <keith.busch@intel.com> Tested-by: Brice Goglin <Brice.Goglin@inria.fr> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-04-04acpi/hmat: Register memory side cache attributesKeith Busch
Register memory side cache attributes with the memory's node if HMAT provides the side cache iniformation table. Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Keith Busch <keith.busch@intel.com> Tested-by: Brice Goglin <Brice.Goglin@inria.fr> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-04-04acpi/hmat: Register performance attributesKeith Busch
Save the best performance access attributes and register these with the memory's node if HMAT provides the locality table. While HMAT does make it possible to know performance for all possible initiator-target pairings, we export only the local pairings at this time. Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Keith Busch <keith.busch@intel.com> Tested-by: Brice Goglin <Brice.Goglin@inria.fr> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-04-04acpi/hmat: Register processor domain to its memoryKeith Busch
If the HMAT Subsystem Address Range provides a valid processor proximity domain for a memory domain, or a processor domain matches the performance access of the valid processor proximity domain, register the memory target with that initiator so this relationship will be visible under the node's sysfs directory. Since HMAT requires valid address ranges have an equivalent SRAT entry, verify each memory target satisfies this requirement. Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Keith Busch <keith.busch@intel.com> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Brice Goglin <Brice.Goglin@inria.fr> Tested-by: Brice Goglin <Brice.Goglin@inria.fr> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-04-04node: Add memory-side caching attributesKeith Busch
System memory may have caches to help improve access speed to frequently requested address ranges. While the system provided cache is transparent to the software accessing these memory ranges, applications can optimize their own access based on cache attributes. Provide a new API for the kernel to register these memory-side caches under the memory node that provides it. The new sysfs representation is modeled from the existing cpu cacheinfo attributes, as seen from /sys/devices/system/cpu/<cpu>/cache/. Unlike CPU cacheinfo though, the node cache level is reported from the view of the memory. A higher level number is nearer to the CPU, while lower levels are closer to the last level memory. The exported attributes are the cache size, the line size, associativity indexing, and write back policy, and add the attributes for the system memory caches to sysfs stable documentation. Signed-off-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Brice Goglin <Brice.Goglin@inria.fr> Tested-by: Brice Goglin <Brice.Goglin@inria.fr> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-04-04node: Add heterogenous memory access attributesKeith Busch
Heterogeneous memory systems provide memory nodes with different latency and bandwidth performance attributes. Provide a new kernel interface for subsystems to register the attributes under the memory target node's initiator access class. If the system provides this information, applications may query these attributes when deciding which node to request memory. The following example shows the new sysfs hierarchy for a node exporting performance attributes: # tree -P "read*|write*"/sys/devices/system/node/nodeY/accessZ/initiators/ /sys/devices/system/node/nodeY/accessZ/initiators/ |-- read_bandwidth |-- read_latency |-- write_bandwidth `-- write_latency The bandwidth is exported as MB/s and latency is reported in nanoseconds. The values are taken from the platform as reported by the manufacturer. Memory accesses from an initiator node that is not one of the memory's access "Z" initiator nodes linked in the same directory may observe different performance than reported here. When a subsystem makes use of this interface, initiators of a different access number may not have the same performance relative to initiators in other access numbers, or omitted from the any access class' initiators. Descriptions for memory access initiator performance access attributes are added to sysfs stable documentation. Acked-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Tested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Tested-by: Brice Goglin <Brice.Goglin@inria.fr> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-04-04node: Link memory nodes to their compute nodesKeith Busch
Systems may be constructed with various specialized nodes. Some nodes may provide memory, some provide compute devices that access and use that memory, and others may provide both. Nodes that provide memory are referred to as memory targets, and nodes that can initiate memory access are referred to as memory initiators. Memory targets will often have varying access characteristics from different initiators, and platforms may have ways to express those relationships. In preparation for these systems, provide interfaces for the kernel to export the memory relationship among different nodes memory targets and their initiators with symlinks to each other. If a system provides access locality for each initiator-target pair, nodes may be grouped into ranked access classes relative to other nodes. The new interface allows a subsystem to register relationships of varying classes if available and desired to be exported. A memory initiator may have multiple memory targets in the same access class. The target memory's initiators in a given class indicate the nodes access characteristics share the same performance relative to other linked initiator nodes. Each target within an initiator's access class, though, do not necessarily perform the same as each other. A memory target node may have multiple memory initiators. All linked initiators in a target's class have the same access characteristics to that target. The following example show the nodes' new sysfs hierarchy for a memory target node 'Y' with access class 0 from initiator node 'X': # symlinks -v /sys/devices/system/node/nodeX/access0/ relative: /sys/devices/system/node/nodeX/access0/targets/nodeY -> ../../nodeY # symlinks -v /sys/devices/system/node/nodeY/access0/ relative: /sys/devices/system/node/nodeY/access0/initiators/nodeX -> ../../nodeX The new attributes are added to the sysfs stable documentation. Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Tested-by: Brice Goglin <Brice.Goglin@inria.fr> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-04-04acpi/hmat: Parse and report heterogeneous memoryKeith Busch
Systems may provide different memory types and export this information in the ACPI Heterogeneous Memory Attribute Table (HMAT). Parse these tables provided by the platform and report the memory access and caching attributes to the kernel messages. Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Tested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Keith Busch <keith.busch@intel.com> Tested-by: Brice Goglin <Brice.Goglin@inria.fr> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-04-04acpi: Add HMAT to generic parsing tablesKeith Busch
The Heterogeneous Memory Attribute Table (HMAT) header has different field lengths than the existing parsing uses. Add the HMAT type to the parsing rules so it may be generically parsed. Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Tested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Keith Busch <keith.busch@intel.com> Tested-by: Brice Goglin <Brice.Goglin@inria.fr> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-04-04acpi: Create subtable parsing infrastructureKeith Busch
Parsing entries in an ACPI table had assumed a generic header structure. There is no standard ACPI header, though, so less common layouts with different field sizes required custom parsers to go through their subtable entry list. Create the infrastructure for adding different table types so parsing the entries array may be more reused for all ACPI system tables and the common code doesn't need to be duplicated. Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Tested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Keith Busch <keith.busch@intel.com> Tested-by: Brice Goglin <Brice.Goglin@inria.fr> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-04-01kobject: Don't trigger kobject_uevent(KOBJ_REMOVE) twice.Tetsuo Handa
syzbot is hitting use-after-free bug in uinput module [1]. This is because kobject_uevent(KOBJ_REMOVE) is called again due to commit 0f4dafc0563c6c49 ("Kobject: auto-cleanup on final unref") after memory allocation fault injection made kobject_uevent(KOBJ_REMOVE) from device_del() from input_unregister_device() fail, while uinput_destroy_device() is expecting that kobject_uevent(KOBJ_REMOVE) is not called after device_del() from input_unregister_device() completed. That commit intended to catch cases where nobody even attempted to send "remove" uevents. But there is no guarantee that an event will ultimately be sent. We are at the point of no return as far as the rest of the kernel is concerned; there are no repeats or do-overs. Also, it is not clear whether some subsystem depends on that commit. If no subsystem depends on that commit, it will be better to remove the state_{add,remove}_uevent_sent logic. But we don't want to risk a regression (in a patch which will be backported) by trying to remove that logic. Therefore, as a first step, let's avoid the use-after-free bug by making sure that kobject_uevent(KOBJ_REMOVE) won't be triggered twice. [1] https://syzkaller.appspot.com/bug?id=8b17c134fe938bbddd75a45afaa9e68af43a362d Reported-by: syzbot <syzbot+f648cfb7e0b52bf7ae32@syzkaller.appspotmail.com> Analyzed-by: Dmitry Torokhov <dmitry.torokhov@gmail.com> Fixes: 0f4dafc0563c6c49 ("Kobject: auto-cleanup on final unref") Cc: Kay Sievers <kay@vrfy.org> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-04-01driver: base: Disable CONFIG_UEVENT_HELPER by defaultGeert Uytterhoeven
Since commit 7934779a69f1184f ("Driver-Core: disable /sbin/hotplug by default"), the help text for the /sbin/hotplug fork-bomb says "This should not be used today [...] creates a high system load, or [...] out-of-memory situations during bootup". The rationale for this was that no recent mainstream system used this anymore (in 2010!). A few years later, the complete uevent helper support was made optional in commit 86d56134f1b67d0c ("kobject: Make support for uevent_helper optional."). However, if was still left enabled by default, to support ancient userland. Time passed by, and nothing should use this anymore, so it can be disabled by default. Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-04-01device.h: reorganize struct deviceGreg Kroah-Hartman
struct device is big, around 760 bytes on x86_64. It's not a critical structure, but it is embedded everywhere, so making it smaller is always a good thing. With a recent patch that moved a field from struct device to the private structure, some benchmarks showed a very odd regression, despite this structure having nothing to do with those benchmarks. That caused me to look into the layout of the structure. Using 'pahole', it showed a number of holes and ways that the structure could be reordered in order to align some cachelines better, as well as reduce the size of the overall structure. Move 'struct kobj' to the start of the structure, to keep that access in the first cacheline, and try to organize things a bit more compactly where possible By doing these few moves, the result removes at least 8 bytes from 'struct device' on a 64bit system. Given we know there are systems with at least 30k devices in memory at once, every little byte counts, and this change could be a savings of 240k of kernel memory for them. On "normal" systems the overall memory savings would be much less. Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com> Cc: Johan Hovold <johan@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-03-31Linux 5.1-rc3Linus Torvalds
2019-03-31Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull KVM fixes from Paolo Bonzini: "A collection of x86 and ARM bugfixes, and some improvements to documentation. On top of this, a cleanup of kvm_para.h headers, which were exported by some architectures even though they not support KVM at all. This is responsible for all the Kbuild changes in the diffstat" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (28 commits) Documentation: kvm: clarify KVM_SET_USER_MEMORY_REGION KVM: doc: Document the life cycle of a VM and its resources KVM: selftests: complete IO before migrating guest state KVM: selftests: disable stack protector for all KVM tests KVM: selftests: explicitly disable PIE for tests KVM: selftests: assert on exit reason in CR4/cpuid sync test KVM: x86: update %rip after emulating IO x86/kvm/hyper-v: avoid spurious pending stimer on vCPU init kvm/x86: Move MSR_IA32_ARCH_CAPABILITIES to array emulated_msrs KVM: x86: Emulate MSR_IA32_ARCH_CAPABILITIES on AMD hosts kvm: don't redefine flags as something else kvm: mmu: Used range based flushing in slot_handle_level_range KVM: export <linux/kvm_para.h> and <asm/kvm_para.h> iif KVM is supported KVM: x86: remove check on nr_mmu_pages in kvm_arch_commit_memory_region() kvm: nVMX: Add a vmentry check for HOST_SYSENTER_ESP and HOST_SYSENTER_EIP fields KVM: SVM: Workaround errata#1096 (insn_len maybe zero on SMAP violation) KVM: Reject device ioctls from processes other than the VM's creator KVM: doc: Fix incorrect word ordering regarding supported use of APIs KVM: x86: fix handling of role.cr4_pae and rename it to 'gpte_size' KVM: nVMX: Do not inherit quadrant and invalid for the root shadow EPT ...
2019-03-31Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Thomas Gleixner: "A pile of x86 updates: - Prevent exceeding he valid physical address space in the /dev/mem limit checks. - Move all header content inside the header guard to prevent compile failures. - Fix the bogus __percpu annotation in this_cpu_has() which makes sparse very noisy. - Disable switch jump tables completely when retpolines are enabled. - Prevent leaking the trampoline address" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/realmode: Make set_real_mode_mem() static inline x86/cpufeature: Fix __percpu annotation in this_cpu_has() x86/mm: Don't exceed the valid physical address space x86/retpolines: Disable switch jump tables when retpolines are enabled x86/realmode: Don't leak the trampoline kernel address x86/boot: Fix incorrect ifdeffery scope x86/resctrl: Remove unused variable
2019-03-31Merge branch 'perf-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf tooling fixes from Thomas Gleixner: "Core libraries: - Fix max perf_event_attr.precise_ip detection. - Fix parser error for uncore event alias - Fixup ordering of kernel maps after obtaining the main kernel map address. Intel PT: - Fix TSC slip where A TSC packet can slip past MTC packets so that the timestamp appears to go backwards. - Fixes for exported-sql-viewer GUI conversion to python3. ARM coresight: - Fix the build by adding a missing case value for enumeration value introduced in newer library, that now is the required one. tool headers: - Syncronize kernel headers with the kernel, getting new io_uring and pidfd_send_signal syscalls so that 'perf trace' can handle them" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf pmu: Fix parser error for uncore event alias perf scripts python: exported-sql-viewer.py: Fix python3 support perf scripts python: exported-sql-viewer.py: Fix never-ending loop perf machine: Update kernel map address and re-order properly tools headers uapi: Sync powerpc's asm/kvm.h copy with the kernel sources tools headers: Update x86's syscall_64.tbl and uapi/asm-generic/unistd tools headers uapi: Update drm/i915_drm.h tools arch x86: Sync asm/cpufeatures.h with the kernel sources tools headers uapi: Sync linux/fcntl.h to get the F_SEAL_FUTURE_WRITE addition tools headers uapi: Sync asm-generic/mman-common.h and linux/mman.h perf evsel: Fix max perf_event_attr.precise_ip detection perf intel-pt: Fix TSC slip perf cs-etm: Add missing case value
2019-03-31Merge branch 'smp-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull CPU hotplug fixes from Thomas Gleixner: "Two SMT/hotplug related fixes: - Prevent crash when HOTPLUG_CPU is disabled and the CPU bringup aborts. This is triggered with the 'nosmt' command line option, but can happen by any abort condition. As the real unplug code is not compiled in, prevent the fail by keeping the CPU in zombie state. - Enforce HOTPLUG_CPU for SMP on x86 to avoid the above situation completely. With 'nosmt' being a popular option it's required to unplug the half brought up sibling CPUs (due to the MCE wreckage) completely" * 'smp-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/smp: Enforce CONFIG_HOTPLUG_CPU when SMP=y cpu/hotplug: Prevent crash when CPU bringup fails on CONFIG_HOTPLUG_CPU=n
2019-03-31Merge branch 'locking-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking fixlet from Thomas Gleixner: "Trivial update to the maintainers file" * 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: MAINTAINERS: Remove deleted file from futex file pattern
2019-03-31Merge branch 'core-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull core fixes from Thomas Gleixner: "A small set of core updates: - Make the watchdog respect the selected CPU mask again. That was broken by the rework of the watchdog thread management and caused inconsistent state and NMI watchdog being unstoppable. - Ensure that the objtool build can find the libelf location. - Remove dead kcore stub code" * 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: watchdog: Respect watchdog cpumask on CPU hotplug objtool: Query pkg-config for libelf location proc/kcore: Remove unused kclist_add_remap()
2019-03-31Merge tag 'powerpc-5.1-4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fixes from Michael Ellerman: "Three non-regression fixes. - Our optimised memcmp could read past the end of one of the buffers and potentially trigger a page fault leading to an oops. - Some of our code to read energy management data on PowerVM had an endian bug leading to bogus results. - When reporting a machine check exception we incorrectly reported TLB multihits as D-Cache multhits due to a missing entry in the array of causes. Thanks to: Chandan Rajendra, Gautham R. Shenoy, Mahesh Salgaonkar, Segher Boessenkool, Vaidyanathan Srinivasan" * tag 'powerpc-5.1-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc/pseries/mce: Fix misleading print for TLB mutlihit powerpc/pseries/energy: Use OF accessor functions to read ibm,drc-indexes powerpc/64: Fix memcmp reading past the end of src/dest
2019-03-31Merge tag 'dmaengine-fix-5.1-rc3' of ↵Linus Torvalds
git://git.infradead.org/users/vkoul/slave-dma Pull dmaengine fixes from Vinod Koul: - Revert "dmaengine: stm32-mdma: Add a check on read_u32_array" as that caused regression - Fix MAINTAINER file uniphier-mdmac.c file path * tag 'dmaengine-fix-5.1-rc3' of git://git.infradead.org/users/vkoul/slave-dma: MAINTAINERS: Fix uniphier-mdmac.c file path dmaengine: stm32-mdma: Revert "dmaengine: stm32-mdma: Add a check on read_u32_array"
2019-03-30Merge tag 'led-fixes-for-5.1-rc3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/j.anaszewski/linux-leds Pull LED fixes from Jacek Anaszewski: - fix refcnt leak on interface rename - use memcpy in device_name_store() to avoid including garbage from a previous, longer value in the device_name - fix a potential NULL pointer dereference in case of_match_device() cannot find a match * tag 'led-fixes-for-5.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/j.anaszewski/linux-leds: leds: trigger: netdev: use memcpy in device_name_store leds: pca9532: fix a potential NULL pointer dereference leds: trigger: netdev: fix refcnt leak on interface rename