aboutsummaryrefslogtreecommitdiff
path: root/include
AgeCommit message (Collapse)Author
2021-02-26Merge branch 'akpm' (patches from Andrew)Linus Torvalds
Merge more updates from Andrew Morton: "118 patches: - The rest of MM. Includes kfence - another runtime memory validator. Not as thorough as KASAN, but it has unmeasurable overhead and is intended to be usable in production builds. - Everything else Subsystems affected by this patch series: alpha, procfs, sysctl, misc, core-kernel, MAINTAINERS, lib, bitops, checkpatch, init, coredump, seq_file, gdb, ubsan, initramfs, and mm (thp, cma, vmstat, memory-hotplug, mlock, rmap, zswap, zsmalloc, cleanups, kfence, kasan2, and pagemap2)" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (118 commits) MIPS: make userspace mapping young by default initramfs: panic with memory information ubsan: remove overflow checks kgdb: fix to kill breakpoints on initmem after boot scripts/gdb: fix list_for_each x86: fix seq_file iteration for pat/memtype.c seq_file: document how per-entry resources are managed. fs/coredump: use kmap_local_page() init/Kconfig: fix a typo in CC_VERSION_TEXT help text init: clean up early_param_on_off() macro init/version.c: remove Version_<LINUX_VERSION_CODE> symbol checkpatch: do not apply "initialise globals to 0" check to BPF progs checkpatch: don't warn about colon termination in linker scripts checkpatch: add kmalloc_array_node to unnecessary OOM message check checkpatch: add warning for avoiding .L prefix symbols in assembly files checkpatch: improve TYPECAST_INT_CONSTANT test message checkpatch: prefer ftrace over function entry/exit printks checkpatch: trivial style fixes checkpatch: ignore warning designated initializers using NR_CPUS checkpatch: improve blank line after declaration test ...
2021-02-26MIPS: make userspace mapping young by defaultHuang Pei
MIPS page fault path(except huge page) takes 3 exceptions (1 TLB Miss + 2 TLB Invalid), butthe second TLB Invalid exception is just triggered by __update_tlb from do_page_fault writing tlb without _PAGE_VALID set. With this patch, user space mapping prot is made young by default (with both _PAGE_VALID and _PAGE_YOUNG set), and it only take 1 TLB Miss + 1 TLB Invalid exception Remove pte_sw_mkyoung without polluting MM code and make page fault delay of MIPS on par with other architecture Link: https://lkml.kernel.org/r/20210204013942.8398-1-huangpei@loongson.cn Signed-off-by: Huang Pei <huangpei@loongson.cn> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Acked-by: <huangpei@loongson.cn> Acked-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: <ambrosehua@gmail.com> Cc: Bibo Mao <maobibo@loongson.cn> Cc: Jiaxun Yang <jiaxun.yang@flygoat.com> Cc: Paul Burton <paulburton@kernel.org> Cc: Li Xuefeng <lixuefeng@loongson.cn> Cc: Yang Tiezhu <yangtiezhu@loongson.cn> Cc: Gao Juxin <gaojuxin@loongson.cn> Cc: Fuxin Zhang <zhangfx@lemote.com> Cc: Huacai Chen <chenhc@lemote.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-26kgdb: fix to kill breakpoints on initmem after bootSumit Garg
Currently breakpoints in kernel .init.text section are not handled correctly while allowing to remove them even after corresponding pages have been freed. Fix it via killing .init.text section breakpoints just prior to initmem pages being freed. Doug: "HW breakpoints aren't handled by this patch but it's probably not such a big deal". Link: https://lkml.kernel.org/r/20210224081652.587785-1-sumit.garg@linaro.org Signed-off-by: Sumit Garg <sumit.garg@linaro.org> Suggested-by: Doug Anderson <dianders@chromium.org> Acked-by: Doug Anderson <dianders@chromium.org> Acked-by: Daniel Thompson <daniel.thompson@linaro.org> Tested-by: Daniel Thompson <daniel.thompson@linaro.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Steven Rostedt (VMware) <rostedt@goodmis.org> Cc: Jason Wessel <jason.wessel@windriver.com> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-26init: clean up early_param_on_off() macroMasahiro Yamada
Use early_param() to define early_param_on_off(). Link: https://lkml.kernel.org/r/20210201041532.4025025-1-masahiroy@kernel.org Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Reviewed-by: Johan Hovold <johan@kernel.org> Reviewed-by: Miguel Ojeda <ojeda@kernel.org> Cc: Masahiro Yamada <masahiroy@kernel.org> Cc: Joe Perches <joe@perches.com> Cc: Nick Desaulniers <ndesaulniers@gooogle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-26include/linux/bitops.h: spelling s/synomyn/synonym/Geert Uytterhoeven
Fix a misspelling of "synonym". Link: https://lkml.kernel.org/r/20210108105305.2028120-1-geert+renesas@glider.be Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-26lib: stackdepot: add support to disable stack depotVijayanand Jitta
Add a kernel parameter stack_depot_disable to disable stack depot. So that stack hash table doesn't consume any memory when stack depot is disabled. The use case is CONFIG_PAGE_OWNER without page_owner=on. Without this patch, stackdepot will consume the memory for the hashtable. By default, it's 8M which is never trivial. With this option, in CONFIG_PAGE_OWNER configured system, page_owner=off, stack_depot_disable in kernel command line, we could save the wasted memory for the hashtable. [akpm@linux-foundation.org: fix CONFIG_STACKDEPOT=n build] Link: https://lkml.kernel.org/r/1611749198-24316-2-git-send-email-vjitta@codeaurora.org Signed-off-by: Vinayak Menon <vinmenon@codeaurora.org> Signed-off-by: Vijayanand Jitta <vjitta@codeaurora.org> Cc: Alexander Potapenko <glider@google.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Yogesh Lal <ylal@codeaurora.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-26string.h: move fortified functions definitions in a dedicated header.Francis Laniel
This patch adds fortify-string.h to contain fortified functions definitions. Thus, the code is more separated and compile time is approximately 1% faster for people who do not set CONFIG_FORTIFY_SOURCE. Link: https://lkml.kernel.org/r/20210111092141.22946-1-laniel_francis@privacyrequired.com Link: https://lkml.kernel.org/r/20210111092141.22946-2-laniel_francis@privacyrequired.com Signed-off-by: Francis Laniel <laniel_francis@privacyrequired.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-26groups: use flexible-array member in struct group_infoHubert Jasudowicz
Replace zero-size array with flexible array member, as recommended by the docs. Link: https://lkml.kernel.org/r/155995eed35c3c1bdcc56e69d8997c8e4c46740a.1611620846.git.hubert.jasudowicz@gmail.com Signed-off-by: Hubert Jasudowicz <hubert.jasudowicz@gmail.com> Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org> Cc: Micah Morton <mortonm@chromium.org> Cc: Gao Xiang <xiang@kernel.org> Cc: Michael Kelley <mikelley@microsoft.com> Cc: Thomas Cedeno <thomascedeno@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-26treewide: Miguel has movedMiguel Ojeda
Update contact info. Link: https://lkml.kernel.org/r/20210206162524.GA11520@kernel.org Signed-off-by: Miguel Ojeda <ojeda@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-26include/linux: remove repeated wordsRandy Dunlap
Drop the doubled word "for" in a comment. {firewire-cdev.h} Drop the doubled word "in" in a comment. {input.h} Drop the doubled word "a" in a comment. {mdev.h} Drop the doubled word "the" in a comment. {ptrace.h} Link: https://lkml.kernel.org/r/20210126232444.22861-1-rdunlap@infradead.org Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Stefan Richter <stefanr@s5r6.in-berlin.de> Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com> Cc: Kirti Wankhede <kwankhede@nvidia.com> Cc: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-26kasan: unify large kfree checksAndrey Konovalov
Unify checks in kasan_kfree_large() and in kasan_slab_free_mempool() for large allocations as it's done for small kfree() allocations. With this change, kasan_slab_free_mempool() starts checking that the first byte of the memory that's being freed is accessible. Link: https://lkml.kernel.org/r/14ffc4cd867e0b1ed58f7527e3b748a1b4ad08aa.1612546384.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov <andreyknvl@google.com> Reviewed-by: Marco Elver <elver@google.com> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Branislav Rankov <Branislav.Rankov@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Evgenii Stepanov <eugenis@google.com> Cc: Kevin Brodsky <kevin.brodsky@arm.com> Cc: Peter Collingbourne <pcc@google.com> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-26kasan, mm: don't save alloc stacks twiceAndrey Konovalov
Patch series "kasan: optimizations and fixes for HW_TAGS", v4. This patchset makes the HW_TAGS mode more efficient, mostly by reworking poisoning approaches and simplifying/inlining some internal helpers. With this change, the overhead of HW_TAGS annotations excluding setting and checking memory tags is ~3%. The performance impact caused by tags will be unknown until we have hardware that supports MTE. As a side-effect, this patchset speeds up generic KASAN by ~15%. This patch (of 13): Currently KASAN saves allocation stacks in both kasan_slab_alloc() and kasan_kmalloc() annotations. This patch changes KASAN to save allocation stacks for slab objects from kmalloc caches in kasan_kmalloc() only, and stacks for other slab objects in kasan_slab_alloc() only. This change requires ____kasan_kmalloc() knowing whether the object belongs to a kmalloc cache. This is implemented by adding a flag field to the kasan_info structure. That flag is only set for kmalloc caches via a new kasan_cache_create_kmalloc() annotation. Link: https://lkml.kernel.org/r/cover.1612546384.git.andreyknvl@google.com Link: https://lkml.kernel.org/r/7c673ebca8d00f40a7ad6f04ab9a2bddeeae2097.1612546384.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov <andreyknvl@google.com> Reviewed-by: Marco Elver <elver@google.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Alexander Potapenko <glider@google.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Peter Collingbourne <pcc@google.com> Cc: Evgenii Stepanov <eugenis@google.com> Cc: Branislav Rankov <Branislav.Rankov@arm.com> Cc: Kevin Brodsky <kevin.brodsky@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-26tracing: add error_report_end trace pointAlexander Potapenko
Patch series "Add error_report_end tracepoint to KFENCE and KASAN", v3. This patchset adds a tracepoint, error_repor_end, that is to be used by KFENCE, KASAN, and potentially other bug detection tools, when they print an error report. One of the possible use cases is userspace collection of kernel error reports: interested parties can subscribe to the tracing event via tracefs, and get notified when an error report occurs. This patch (of 3): Introduce error_report_end tracepoint. It can be used in debugging tools like KASAN, KFENCE, etc. to provide extensions to the error reporting mechanisms (e.g. allow tests hook into error reporting, ease error report collection from production kernels). Another benefit would be making use of ftrace for debugging or benchmarking the tools themselves. Should we need it, the tracepoint name leaves us with the possibility to introduce a complementary error_report_start tracepoint in the future. Link: https://lkml.kernel.org/r/20210121131915.1331302-1-glider@google.com Link: https://lkml.kernel.org/r/20210121131915.1331302-2-glider@google.com Signed-off-by: Alexander Potapenko <glider@google.com> Suggested-by: Marco Elver <elver@google.com> Cc: Andrey Konovalov <andreyknvl@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Petr Mladek <pmladek@suse.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-26kfence: add test suiteMarco Elver
Add KFENCE test suite, testing various error detection scenarios. Makes use of KUnit for test organization. Since KFENCE's interface to obtain error reports is via the console, the test verifies that KFENCE outputs expected reports to the console. [elver@google.com: fix typo in test] Link: https://lkml.kernel.org/r/X9lHQExmHGvETxY4@elver.google.com [elver@google.com: show access type in report] Link: https://lkml.kernel.org/r/20210111091544.3287013-2-elver@google.com Link: https://lkml.kernel.org/r/20201103175841.3495947-9-elver@google.com Signed-off-by: Alexander Potapenko <glider@google.com> Signed-off-by: Marco Elver <elver@google.com> Reviewed-by: Dmitry Vyukov <dvyukov@google.com> Co-developed-by: Alexander Potapenko <glider@google.com> Reviewed-by: Jann Horn <jannh@google.com> Cc: Andrey Konovalov <andreyknvl@google.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christopher Lameter <cl@linux.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Rientjes <rientjes@google.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Hillf Danton <hdanton@sina.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Joern Engel <joern@purestorage.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Kees Cook <keescook@chromium.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Paul E. McKenney <paulmck@kernel.org> Cc: Pekka Enberg <penberg@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: SeongJae Park <sjpark@amazon.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-26mm, kfence: insert KFENCE hooks for SLUBAlexander Potapenko
Inserts KFENCE hooks into the SLUB allocator. To pass the originally requested size to KFENCE, add an argument 'orig_size' to slab_alloc*(). The additional argument is required to preserve the requested original size for kmalloc() allocations, which uses size classes (e.g. an allocation of 272 bytes will return an object of size 512). Therefore, kmem_cache::size does not represent the kmalloc-caller's requested size, and we must introduce the argument 'orig_size' to propagate the originally requested size to KFENCE. Without the originally requested size, we would not be able to detect out-of-bounds accesses for objects placed at the end of a KFENCE object page if that object is not equal to the kmalloc-size class it was bucketed into. When KFENCE is disabled, there is no additional overhead, since slab_alloc*() functions are __always_inline. Link: https://lkml.kernel.org/r/20201103175841.3495947-6-elver@google.com Signed-off-by: Marco Elver <elver@google.com> Signed-off-by: Alexander Potapenko <glider@google.com> Reviewed-by: Dmitry Vyukov <dvyukov@google.com> Reviewed-by: Jann Horn <jannh@google.com> Co-developed-by: Marco Elver <elver@google.com> Cc: Andrey Konovalov <andreyknvl@google.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christopher Lameter <cl@linux.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Rientjes <rientjes@google.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Hillf Danton <hdanton@sina.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Joern Engel <joern@purestorage.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Kees Cook <keescook@chromium.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Paul E. McKenney <paulmck@kernel.org> Cc: Pekka Enberg <penberg@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: SeongJae Park <sjpark@amazon.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-26mm, kfence: insert KFENCE hooks for SLABAlexander Potapenko
Inserts KFENCE hooks into the SLAB allocator. To pass the originally requested size to KFENCE, add an argument 'orig_size' to slab_alloc*(). The additional argument is required to preserve the requested original size for kmalloc() allocations, which uses size classes (e.g. an allocation of 272 bytes will return an object of size 512). Therefore, kmem_cache::size does not represent the kmalloc-caller's requested size, and we must introduce the argument 'orig_size' to propagate the originally requested size to KFENCE. Without the originally requested size, we would not be able to detect out-of-bounds accesses for objects placed at the end of a KFENCE object page if that object is not equal to the kmalloc-size class it was bucketed into. When KFENCE is disabled, there is no additional overhead, since slab_alloc*() functions are __always_inline. Link: https://lkml.kernel.org/r/20201103175841.3495947-5-elver@google.com Signed-off-by: Marco Elver <elver@google.com> Signed-off-by: Alexander Potapenko <glider@google.com> Reviewed-by: Dmitry Vyukov <dvyukov@google.com> Co-developed-by: Marco Elver <elver@google.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Andrey Konovalov <andreyknvl@google.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Hillf Danton <hdanton@sina.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jann Horn <jannh@google.com> Cc: Joern Engel <joern@purestorage.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Kees Cook <keescook@chromium.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Paul E. McKenney <paulmck@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: SeongJae Park <sjpark@amazon.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-26kfence: use pt_regs to generate stack trace on faultsMarco Elver
Instead of removing the fault handling portion of the stack trace based on the fault handler's name, just use struct pt_regs directly. Change kfence_handle_page_fault() to take a struct pt_regs, and plumb it through to kfence_report_error() for out-of-bounds, use-after-free, or invalid access errors, where pt_regs is used to generate the stack trace. If the kernel is a DEBUG_KERNEL, also show registers for more information. Link: https://lkml.kernel.org/r/20201105092133.2075331-1-elver@google.com Signed-off-by: Marco Elver <elver@google.com> Suggested-by: Mark Rutland <mark.rutland@arm.com> Acked-by: Mark Rutland <mark.rutland@arm.com> Cc: Alexander Potapenko <glider@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Jann Horn <jannh@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-26mm: add Kernel Electric-Fence infrastructureAlexander Potapenko
Patch series "KFENCE: A low-overhead sampling-based memory safety error detector", v7. This adds the Kernel Electric-Fence (KFENCE) infrastructure. KFENCE is a low-overhead sampling-based memory safety error detector of heap use-after-free, invalid-free, and out-of-bounds access errors. This series enables KFENCE for the x86 and arm64 architectures, and adds KFENCE hooks to the SLAB and SLUB allocators. KFENCE is designed to be enabled in production kernels, and has near zero performance overhead. Compared to KASAN, KFENCE trades performance for precision. The main motivation behind KFENCE's design, is that with enough total uptime KFENCE will detect bugs in code paths not typically exercised by non-production test workloads. One way to quickly achieve a large enough total uptime is when the tool is deployed across a large fleet of machines. KFENCE objects each reside on a dedicated page, at either the left or right page boundaries. The pages to the left and right of the object page are "guard pages", whose attributes are changed to a protected state, and cause page faults on any attempted access to them. Such page faults are then intercepted by KFENCE, which handles the fault gracefully by reporting a memory access error. Guarded allocations are set up based on a sample interval (can be set via kfence.sample_interval). After expiration of the sample interval, the next allocation through the main allocator (SLAB or SLUB) returns a guarded allocation from the KFENCE object pool. At this point, the timer is reset, and the next allocation is set up after the expiration of the interval. To enable/disable a KFENCE allocation through the main allocator's fast-path without overhead, KFENCE relies on static branches via the static keys infrastructure. The static branch is toggled to redirect the allocation to KFENCE. The KFENCE memory pool is of fixed size, and if the pool is exhausted no further KFENCE allocations occur. The default config is conservative with only 255 objects, resulting in a pool size of 2 MiB (with 4 KiB pages). We have verified by running synthetic benchmarks (sysbench I/O, hackbench) and production server-workload benchmarks that a kernel with KFENCE (using sample intervals 100-500ms) is performance-neutral compared to a non-KFENCE baseline kernel. KFENCE is inspired by GWP-ASan [1], a userspace tool with similar properties. The name "KFENCE" is a homage to the Electric Fence Malloc Debugger [2]. For more details, see Documentation/dev-tools/kfence.rst added in the series -- also viewable here: https://raw.githubusercontent.com/google/kasan/kfence/Documentation/dev-tools/kfence.rst [1] http://llvm.org/docs/GwpAsan.html [2] https://linux.die.net/man/3/efence This patch (of 9): This adds the Kernel Electric-Fence (KFENCE) infrastructure. KFENCE is a low-overhead sampling-based memory safety error detector of heap use-after-free, invalid-free, and out-of-bounds access errors. KFENCE is designed to be enabled in production kernels, and has near zero performance overhead. Compared to KASAN, KFENCE trades performance for precision. The main motivation behind KFENCE's design, is that with enough total uptime KFENCE will detect bugs in code paths not typically exercised by non-production test workloads. One way to quickly achieve a large enough total uptime is when the tool is deployed across a large fleet of machines. KFENCE objects each reside on a dedicated page, at either the left or right page boundaries. The pages to the left and right of the object page are "guard pages", whose attributes are changed to a protected state, and cause page faults on any attempted access to them. Such page faults are then intercepted by KFENCE, which handles the fault gracefully by reporting a memory access error. To detect out-of-bounds writes to memory within the object's page itself, KFENCE also uses pattern-based redzones. The following figure illustrates the page layout: ---+-----------+-----------+-----------+-----------+-----------+--- | xxxxxxxxx | O : | xxxxxxxxx | : O | xxxxxxxxx | | xxxxxxxxx | B : | xxxxxxxxx | : B | xxxxxxxxx | | x GUARD x | J : RED- | x GUARD x | RED- : J | x GUARD x | | xxxxxxxxx | E : ZONE | xxxxxxxxx | ZONE : E | xxxxxxxxx | | xxxxxxxxx | C : | xxxxxxxxx | : C | xxxxxxxxx | | xxxxxxxxx | T : | xxxxxxxxx | : T | xxxxxxxxx | ---+-----------+-----------+-----------+-----------+-----------+--- Guarded allocations are set up based on a sample interval (can be set via kfence.sample_interval). After expiration of the sample interval, a guarded allocation from the KFENCE object pool is returned to the main allocator (SLAB or SLUB). At this point, the timer is reset, and the next allocation is set up after the expiration of the interval. To enable/disable a KFENCE allocation through the main allocator's fast-path without overhead, KFENCE relies on static branches via the static keys infrastructure. The static branch is toggled to redirect the allocation to KFENCE. To date, we have verified by running synthetic benchmarks (sysbench I/O, hackbench) that a kernel compiled with KFENCE is performance-neutral compared to the non-KFENCE baseline. For more details, see Documentation/dev-tools/kfence.rst (added later in the series). [elver@google.com: fix parameter description for kfence_object_start()] Link: https://lkml.kernel.org/r/20201106092149.GA2851373@elver.google.com [elver@google.com: avoid stalling work queue task without allocations] Link: https://lkml.kernel.org/r/CADYN=9J0DQhizAGB0-jz4HOBBh+05kMBXb4c0cXMS7Qi5NAJiw@mail.gmail.com Link: https://lkml.kernel.org/r/20201110135320.3309507-1-elver@google.com [elver@google.com: fix potential deadlock due to wake_up()] Link: https://lkml.kernel.org/r/000000000000c0645805b7f982e4@google.com Link: https://lkml.kernel.org/r/20210104130749.1768991-1-elver@google.com [elver@google.com: add option to use KFENCE without static keys] Link: https://lkml.kernel.org/r/20210111091544.3287013-1-elver@google.com [elver@google.com: add missing copyright and description headers] Link: https://lkml.kernel.org/r/20210118092159.145934-1-elver@google.com Link: https://lkml.kernel.org/r/20201103175841.3495947-2-elver@google.com Signed-off-by: Marco Elver <elver@google.com> Signed-off-by: Alexander Potapenko <glider@google.com> Reviewed-by: Dmitry Vyukov <dvyukov@google.com> Reviewed-by: SeongJae Park <sjpark@amazon.de> Co-developed-by: Marco Elver <elver@google.com> Reviewed-by: Jann Horn <jannh@google.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Paul E. McKenney <paulmck@kernel.org> Cc: Andrey Konovalov <andreyknvl@google.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christopher Lameter <cl@linux.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Rientjes <rientjes@google.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Hillf Danton <hdanton@sina.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Joern Engel <joern@purestorage.com> Cc: Kees Cook <keescook@chromium.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-26mm: page-flags.h: Typo fix (It -> If)Guo Ren
The "If" was wrongly spelled as "It". Link: https://lkml.kernel.org/r/1608959036-91409-1-git-send-email-guoren@kernel.org Signed-off-by: Guo Ren <guoren@linux.alibaba.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Alexander Duyck <alexander.h.duyck@linux.intel.com> Cc: David Hildenbrand <david@redhat.com> Cc: Steven Price <steven.price@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-26zsmalloc: account the number of compacted pages correctlyRokudo Yan
There exists multiple path may do zram compaction concurrently. 1. auto-compaction triggered during memory reclaim 2. userspace utils write zram<id>/compaction node So, multiple threads may call zs_shrinker_scan/zs_compact concurrently. But pages_compacted is a per zsmalloc pool variable and modification of the variable is not serialized(through under class->lock). There are two issues here: 1. the pages_compacted may not equal to total number of pages freed(due to concurrently add). 2. zs_shrinker_scan may not return the correct number of pages freed(issued by current shrinker). The fix is simple: 1. account the number of pages freed in zs_compact locally. 2. use actomic variable pages_compacted to accumulate total number. Link: https://lkml.kernel.org/r/20210202122235.26885-1-wu-yan@tcl.com Fixes: 860c707dca155a56 ("zsmalloc: account the number of compacted pages") Signed-off-by: Rokudo Yan <wu-yan@tcl.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-26mm/zswap: add the flag can_sleep_mappedTian Tao
Patch series "Fix the compatibility of zsmalloc and zswap". Patch #1 adds a flag to zpool, then zswap used to determine if zpool drivers such as zbud/z3fold/zsmalloc will enter an atomic context after mapping. The difference between zbud/z3fold and zsmalloc is that zsmalloc requires an atomic context that since its map function holds a preempt-disabled, but zbud/z3fold don't require an atomic context. So patch #2 sets flag sleep_mapped to true indicating that zbud/z3fold can sleep after mapping. zsmalloc didn't support sleep after mapping, so don't set that flag to true. This patch (of 2): Add a flag to zpool, named is "can_sleep_mapped", and have it set true for zbud/z3fold, not set this flag for zsmalloc, so its default value is false. Then zswap could go the current path if the flag is true; and if it's false, copy data from src to a temporary buffer, then unmap the handle, take the mutex, process the buffer instead of src to avoid sleeping function called from atomic context. [natechancellor@gmail.com: add return value in zswap_frontswap_load] Link: https://lkml.kernel.org/r/20210121214804.926843-1-natechancellor@gmail.com [tiantao6@hisilicon.com: fix potential memory leak] Link: https://lkml.kernel.org/r/1611538365-51811-1-git-send-email-tiantao6@hisilicon.com [colin.king@canonical.com: fix potential uninitialized pointer read on tmp] Link: https://lkml.kernel.org/r/20210128141728.639030-1-colin.king@canonical.com [tiantao6@hisilicon.com: fix variable 'entry' is uninitialized when used] Link: https://lkml.kernel.org/r/1611223030-58346-1-git-send-email-tiantao6@hisilicon.comLink: https://lkml.kernel.org/r/1611035683-12732-1-git-send-email-tiantao6@hisilicon.com Link: https://lkml.kernel.org/r/1611035683-12732-2-git-send-email-tiantao6@hisilicon.com Signed-off-by: Tian Tao <tiantao6@hisilicon.com> Signed-off-by: Nathan Chancellor <natechancellor@gmail.com> Signed-off-by: Colin Ian King <colin.king@canonical.com> Reviewed-by: Vitaly Wool <vitaly.wool@konsulko.com> Acked-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Reported-by: Mike Galbraith <efault@gmx.de> Cc: Barry Song <song.bao.hua@hisilicon.com> Cc: Dan Streetman <ddstreet@ieee.org> Cc: Seth Jennings <sjenning@redhat.com> Cc: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-26mm/rmap: fix potential pte_unmap on an not mapped pteMiaohe Lin
For PMD-mapped page (usually THP), pvmw->pte is NULL. For PTE-mapped THP, pvmw->pte is mapped. But for HugeTLB pages, pvmw->pte is not mapped and set to the relevant page table entry. So in page_vma_mapped_walk_done(), we may do pte_unmap() for HugeTLB pte which is not mapped. Fix this by checking pvmw->page against PageHuge before trying to do pte_unmap(). Link: https://lkml.kernel.org/r/20210127093349.39081-1-linmiaohe@huawei.com Fixes: ace71a19cec5 ("mm: introduce page_vma_mapped_walk()") Signed-off-by: Hongxiang Lou <louhongxiang@huawei.com> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Tested-by: Sedat Dilek <sedat.dilek@gmail.com> Cc: Kees Cook <keescook@chromium.org> Cc: Nathan Chancellor <natechancellor@gmail.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Shakeel Butt <shakeelb@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Michel Lespinasse <walken@google.com> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Wei Yang <richard.weiyang@linux.alibaba.com> Cc: Dmitry Safonov <0x7f454c46@gmail.com> Cc: Brian Geffon <bgeffon@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-26mm/memory_hotplug: prevalidate the address range being added with platformAnshuman Khandual
Patch series "mm/memory_hotplug: Pre-validate the address range with platform", v5. This series adds a mechanism allowing platforms to weigh in and prevalidate incoming address range before proceeding further with the memory hotplug. This helps prevent potential platform errors for the given address range, down the hotplug call chain, which inevitably fails the hotplug itself. This mechanism was suggested by David Hildenbrand during another discussion with respect to a memory hotplug fix on arm64 platform. https://lore.kernel.org/linux-arm-kernel/1600332402-30123-1-git-send-email-anshuman.khandual@arm.com/ This mechanism focuses on the addressibility aspect and not [sub] section alignment aspect. Hence check_hotplug_memory_range() and check_pfn_span() have been left unchanged. This patch (of 4): This introduces mhp_range_allowed() which can be called in various memory hotplug paths to prevalidate the address range which is being added, with the platform. Then mhp_range_allowed() calls mhp_get_pluggable_range() which provides applicable address range depending on whether linear mapping is required or not. For ranges that require linear mapping, it calls a new arch callback arch_get_mappable_range() which the platform can override. So the new callback, in turn provides the platform an opportunity to configure acceptable memory hotplug address ranges in case there are constraints. This mechanism will help prevent platform specific errors deep down during hotplug calls. This drops now redundant check_hotplug_memory_addressable() check in __add_pages() but instead adds a VM_BUG_ON() check which would ensure that the range has been validated with mhp_range_allowed() earlier in the call chain. Besides mhp_get_pluggable_range() also can be used by potential memory hotplug callers to avail the allowed physical range which would go through on a given platform. This does not really add any new range check in generic memory hotplug but instead compensates for lost checks in arch_add_memory() where applicable and check_hotplug_memory_addressable(), with unified mhp_range_allowed(). [akpm@linux-foundation.org: make pagemap_range() return -EINVAL when mhp_range_allowed() fails] Link: https://lkml.kernel.org/r/1612149902-7867-1-git-send-email-anshuman.khandual@arm.com Link: https://lkml.kernel.org/r/1612149902-7867-2-git-send-email-anshuman.khandual@arm.com Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Suggested-by: David Hildenbrand <david@redhat.com> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Oscar Salvador <osalvador@suse.de> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> # s390 Cc: Will Deacon <will@kernel.org> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Jason Wang <jasowang@redhat.com> Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Pankaj Gupta <pankaj.gupta@cloud.ionos.com> Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com> Cc: teawater <teawaterz@linux.alibaba.com> Cc: Wei Yang <richard.weiyang@linux.alibaba.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-26drivers/base/memory: don't store phys_device in memory blocksDavid Hildenbrand
No need to store the value for each and every memory block, as we can easily query the value at runtime. Reshuffle the members to optimize the memory layout. Also, let's clarify what the interface once was used for and why it's legacy nowadays. "phys_device" was used on s390x in older versions of lsmem[2]/chmem[3], back when they were still part of s390x-tools. They were later replaced by the variants in linux-utils. For example, RHEL6 and RHEL7 contain lsmem/chmem from s390-utils. RHEL8 switched to versions from util-linux on s390x [4]. "phys_device" was added with sysfs support for memory hotplug in commit 3947be1969a9 ("[PATCH] memory hotplug: sysfs and add/remove functions") in 2005. It always returned 0. s390x started returning something != 0 on some setups (if sclp.rzm is set by HW) in 2010 via commit 57b552ba0b2f ("memory hotplug/s390: set phys_device"). For s390x, it allowed for identifying which memory block devices belong to the same storage increment (RZM). Only if all memory block devices comprising a single storage increment were offline, the memory could actually be removed in the hypervisor. Since commit e5d709bb5fb7 ("s390/memory hotplug: provide memory_block_size_bytes() function") in 2013 a memory block device spans at least one storage increment - which is why the interface isn't really helpful/used anymore (except by old lsmem/chmem tools). There were once RFC patches to make use of "phys_device" in ACPI context; however, the underlying problem could be solved using different interfaces [1]. [1] https://patchwork.kernel.org/patch/2163871/ [2] https://github.com/ibm-s390-tools/s390-tools/blob/v2.1.0/zconf/lsmem [3] https://github.com/ibm-s390-tools/s390-tools/blob/v2.1.0/zconf/chmem [4] https://bugzilla.redhat.com/show_bug.cgi?id=1504134 Link: https://lkml.kernel.org/r/20210201181347.13262-2-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Acked-by: Michal Hocko <mhocko@suse.com> Reviewed-by: Oscar Salvador <osalvador@suse.de> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Cc: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Cc: Ilya Dryomov <idryomov@gmail.com> Cc: Vaibhav Jain <vaibhav@linux.ibm.com> Cc: Tom Rix <trix@redhat.com> Cc: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-26mm/memory_hotplug: MEMHP_MERGE_RESOURCE -> MHP_MERGE_RESOURCEDavid Hildenbrand
Let's make "MEMHP_MERGE_RESOURCE" consistent with "MHP_NONE", "mhp_t" and "mhp_flags". As discussed recently [1], "mhp" is our internal acronym for memory hotplug now. [1] https://lore.kernel.org/linux-mm/c37de2d0-28a1-4f7d-f944-cfd7d81c334d@redhat.com/ Link: https://lkml.kernel.org/r/20210126115829.10909-1-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Reviewed-by: Miaohe Lin <linmiaohe@huawei.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Oscar Salvador <osalvador@suse.de> Acked-by: Wei Liu <wei.liu@kernel.org> Reviewed-by: Pankaj Gupta <pankaj.gupta@cloud.ionos.com> Cc: "K. Y. Srinivasan" <kys@microsoft.com> Cc: Haiyang Zhang <haiyangz@microsoft.com> Cc: Stephen Hemminger <sthemmin@microsoft.com> Cc: Jason Wang <jasowang@redhat.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Juergen Gross <jgross@suse.com> Cc: Stefano Stabellini <sstabellini@kernel.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Wei Yang <richard.weiyang@linux.alibaba.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-26mm/memory_hotplug: rename all existing 'memhp' into 'mhp'Anshuman Khandual
This renames all 'memhp' instances to 'mhp' except for memhp_default_state for being a kernel command line option. This is just a clean up and should not cause a functional change. Let's make it consistent rater than mixing the two prefixes. In preparation for more users of the 'mhp' terminology. Link: https://lkml.kernel.org/r/1611554093-27316-1-git-send-email-anshuman.khandual@arm.com Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Suggested-by: David Hildenbrand <david@redhat.com> Reviewed-by: David Hildenbrand <david@redhat.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-26mm: fix memory_failure() handling of dax-namespace metadataDan Williams
Given 'struct dev_pagemap' spans both data pages and metadata pages be careful to consult the altmap if present to delineate metadata. In fact the pfn_first() helper already identifies the first valid data pfn, so export that helper for other code paths via pgmap_pfn_valid(). Other usage of get_dev_pagemap() are not a concern because those are operating on known data pfns having been looked up by get_user_pages(). I.e. metadata pfns are never user mapped. Link: https://lkml.kernel.org/r/161058501758.1840162.4239831989762604527.stgit@dwillia2-desk3.amr.corp.intel.com Fixes: 6100e34b2526 ("mm, memory_failure: Teach memory_failure() about dev_pagemap pages") Signed-off-by: Dan Williams <dan.j.williams@intel.com> Reported-by: David Hildenbrand <david@redhat.com> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Naoya Horiguchi <naoya.horiguchi@nec.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Qian Cai <cai@lca.pw> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-26mm: teach pfn_to_online_page() about ZONE_DEVICE section collisionsDan Williams
While pfn_to_online_page() is able to determine pfn_valid() at subsection granularity it is not able to reliably determine if a given pfn is also online if the section is mixes ZONE_{NORMAL,MOVABLE} with ZONE_DEVICE. This means that pfn_to_online_page() may return invalid @page objects. For example with a memory map like: 100000000-1fbffffff : System RAM 142000000-143002e16 : Kernel code 143200000-143713fff : Kernel rodata 143800000-143b15b7f : Kernel data 144227000-144ffffff : Kernel bss 1fc000000-2fbffffff : Persistent Memory (legacy) 1fc000000-2fbffffff : namespace0.0 This command: echo 0x1fc000000 > /sys/devices/system/memory/soft_offline_page ...succeeds when it should fail. When it succeeds it touches an uninitialized page and may crash or cause other damage (see dissolve_free_huge_page()). While the memory map above is contrived via the memmap=ss!nn kernel command line option, the collision happens in practice on shipping platforms. The memory controller resources that decode spans of physical address space are a limited resource. One technique platform-firmware uses to conserve those resources is to share a decoder across 2 devices to keep the address range contiguous. Unfortunately the unit of operation of a decoder is 64MiB while the Linux section size is 128MiB. This results in situations where, without subsection hotplug memory mappings with different lifetimes collide into one object that can only express one lifetime. Update move_pfn_range_to_zone() to flag (SECTION_TAINT_ZONE_DEVICE) a section that mixes ZONE_DEVICE pfns with other online pfns. With SECTION_TAINT_ZONE_DEVICE to delineate, pfn_to_online_page() can fall back to a slow-path check for ZONE_DEVICE pfns in an online section. In the fast path online_section() for a full ZONE_DEVICE section returns false. Because the collision case is rare, and for simplicity, the SECTION_TAINT_ZONE_DEVICE flag is never cleared once set. [dan.j.williams@intel.com: fix CONFIG_ZONE_DEVICE=n build] Link: https://lkml.kernel.org/r/CAPcyv4iX+7LAgAeSqx7Zw-Zd=ZV9gBv8Bo7oTbwCOOqJoZ3+Yg@mail.gmail.com Link: https://lkml.kernel.org/r/161058500675.1840162.7887862152161279354.stgit@dwillia2-desk3.amr.corp.intel.com Fixes: ba72b4c8cf60 ("mm/sparsemem: support sub-section hotplug") Signed-off-by: Dan Williams <dan.j.williams@intel.com> Reported-by: Michal Hocko <mhocko@suse.com> Acked-by: Michal Hocko <mhocko@suse.com> Reported-by: David Hildenbrand <david@redhat.com> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Oscar Salvador <osalvador@suse.de> Cc: Naoya Horiguchi <naoya.horiguchi@nec.com> Cc: Qian Cai <cai@lca.pw> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-26mm: move pfn_to_online_page() out of lineDan Williams
Patch series "mm: Fix pfn_to_online_page() with respect to ZONE_DEVICE", v4. A pfn-walker that uses pfn_to_online_page() may inadvertently translate a pfn as online and in the page allocator, when it is offline managed by a ZONE_DEVICE mapping (details in Patch 3: ("mm: Teach pfn_to_online_page() about ZONE_DEVICE section collisions")). The 2 proposals under consideration are teach pfn_to_online_page() to be precise in the presence of mixed-zone sections, or teach the memory-add code to drop the System RAM associated with ZONE_DEVICE collisions. In order to not regress memory capacity by a few 10s to 100s of MiB the approach taken in this set is to add precision to pfn_to_online_page(). In the course of validating pfn_to_online_page() a couple other fixes fell out: 1/ soft_offline_page() fails to drop the reference taken in the madvise(..., MADV_SOFT_OFFLINE) case. 2/ memory_failure() uses get_dev_pagemap() to lookup ZONE_DEVICE pages, however that mapping may contain data pages and metadata raw pfns. Introduce pgmap_pfn_valid() to delineate the 2 types and fail the handling of raw metadata pfns. This patch (of 4); pfn_to_online_page() is already too large to be a macro or an inline function. In anticipation of further logic changes / growth, move it out of line. No functional change, just code movement. Link: https://lkml.kernel.org/r/161058499000.1840162.702316708443239771.stgit@dwillia2-desk3.amr.corp.intel.com Link: https://lkml.kernel.org/r/161058499608.1840162.10165648147615238793.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com> Reported-by: Michal Hocko <mhocko@kernel.org> Acked-by: Michal Hocko <mhocko@suse.com> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Oscar Salvador <osalvador@suse.de> Cc: Naoya Horiguchi <naoya.horiguchi@nec.com> Cc: Qian Cai <cai@lca.pw> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-26mm: vmstat: add some comments on internal storage of byte itemsJohannes Weiner
Byte-accounted items are used for slab object accounting at the cgroup level, because the objects in a slab page can belong to different cgroups. At the global level these items always change in multiples of whole slab pages. The vmstat code exploits this and stores these items as pages internally, which allows for more compact per-cpu data. This optimization isn't self-evident from the asserts and the division in the stat update functions. Provide the reader with some context. Link: https://lkml.kernel.org/r/20210202184411.118614-1-hannes@cmpxchg.org Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-26mm/page_alloc: count CMA pages per zone and print them in /proc/zoneinfoDavid Hildenbrand
Let's count the number of CMA pages per zone and print them in /proc/zoneinfo. Having access to the total number of CMA pages per zone is helpful for debugging purposes to know where exactly the CMA pages ended up, and to figure out how many pages of a zone might behave differently, even after some of these pages might already have been allocated. As one example, CMA pages part of a kernel zone cannot be used for ordinary kernel allocations but instead behave more like ZONE_MOVABLE. For now, we are only able to get the global nr+free cma pages from /proc/meminfo and the free cma pages per zone from /proc/zoneinfo. Example after this patch when booting a 6 GiB QEMU VM with "hugetlb_cma=2G": # cat /proc/zoneinfo | grep cma cma 0 nr_free_cma 0 cma 0 nr_free_cma 0 cma 524288 nr_free_cma 493016 cma 0 cma 0 # cat /proc/meminfo | grep Cma CmaTotal: 2097152 kB CmaFree: 1972064 kB Note: We print even without CONFIG_CMA, just like "nr_free_cma"; this way, one can be sure when spotting "cma 0", that there are definetly no CMA pages located in a zone. [david@redhat.com: v2] Link: https://lkml.kernel.org/r/20210128164533.18566-1-david@redhat.com [david@redhat.com: v3] Link: https://lkml.kernel.org/r/20210129113451.22085-1-david@redhat.com Link: https://lkml.kernel.org/r/20210127101813.6370-3-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Reviewed-by: Oscar Salvador <osalvador@suse.de> Acked-by: David Rientjes <rientjes@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org> Cc: Mike Rapoport <rppt@kernel.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Wei Yang <richard.weiyang@linux.alibaba.com> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-26mm,thp,shmem: make khugepaged obey tmpfs mount flagsRik van Riel
Currently if thp enabled=[madvise], mounting a tmpfs filesystem with huge=always and mmapping files from that tmpfs does not result in khugepaged collapsing those mappings, despite the mount flag indicating that it should. Fix that by breaking up the blocks of tests in hugepage_vma_check a little bit, and testing things in the correct order. Link: https://lkml.kernel.org/r/20201124194925.623931-4-riel@surriel.com Fixes: c2231020ea7b ("mm: thp: register mm for khugepaged when merging vma for shmem") Signed-off-by: Rik van Riel <riel@surriel.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Michal Hocko <mhocko@suse.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Xu Yu <xuyu@linux.alibaba.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-26mm,thp,shmem: limit shmem THP alloc gfp_maskRik van Riel
Patch series "mm,thp,shm: limit shmem THP alloc gfp_mask", v6. The allocation flags of anonymous transparent huge pages can be controlled through the files in /sys/kernel/mm/transparent_hugepage/defrag, which can help the system from getting bogged down in the page reclaim and compaction code when many THPs are getting allocated simultaneously. However, the gfp_mask for shmem THP allocations were not limited by those configuration settings, and some workloads ended up with all CPUs stuck on the LRU lock in the page reclaim code, trying to allocate dozens of THPs simultaneously. This patch applies the same configurated limitation of THPs to shmem hugepage allocations, to prevent that from happening. This way a THP defrag setting of "never" or "defer+madvise" will result in quick allocation failures without direct reclaim when no 2MB free pages are available. With this patch applied, THP allocations for tmpfs will be a little more aggressive than today for files mmapped with MADV_HUGEPAGE, and a little less aggressive for files that are not mmapped or mapped without that flag. This patch (of 4): The allocation flags of anonymous transparent huge pages can be controlled through the files in /sys/kernel/mm/transparent_hugepage/defrag, which can help the system from getting bogged down in the page reclaim and compaction code when many THPs are getting allocated simultaneously. However, the gfp_mask for shmem THP allocations were not limited by those configuration settings, and some workloads ended up with all CPUs stuck on the LRU lock in the page reclaim code, trying to allocate dozens of THPs simultaneously. This patch applies the same configurated limitation of THPs to shmem hugepage allocations, to prevent that from happening. Controlling the gfp_mask of THP allocations through the knobs in sysfs allows users to determine the balance between how aggressively the system tries to allocate THPs at fault time, and how much the application may end up stalling attempting those allocations. This way a THP defrag setting of "never" or "defer+madvise" will result in quick allocation failures without direct reclaim when no 2MB free pages are available. With this patch applied, THP allocations for tmpfs will be a little more aggressive than today for files mmapped with MADV_HUGEPAGE, and a little less aggressive for files that are not mmapped or mapped without that flag. Link: https://lkml.kernel.org/r/20201124194925.623931-1-riel@surriel.com Link: https://lkml.kernel.org/r/20201124194925.623931-2-riel@surriel.com Signed-off-by: Rik van Riel <riel@surriel.com> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Xu Yu <xuyu@linux.alibaba.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-26mm: remove pagevec_lookup_entriesMatthew Wilcox (Oracle)
pagevec_lookup_entries() is now just a wrapper around find_get_entries() so remove it and convert all its callers. Link: https://lkml.kernel.org/r/20201112212641.27837-15-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: William Kucharski <william.kucharski@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Cc: Dave Chinner <dchinner@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Yang Shi <yang.shi@linux.alibaba.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-26mm: pass pvec directly to find_get_entriesMatthew Wilcox (Oracle)
All callers of find_get_entries() use a pvec, so pass it directly instead of manipulating it in the caller. Link: https://lkml.kernel.org/r/20201112212641.27837-14-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: William Kucharski <william.kucharski@oracle.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Dave Chinner <dchinner@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Yang Shi <yang.shi@linux.alibaba.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-26mm: remove nr_entries parameter from pagevec_lookup_entriesMatthew Wilcox (Oracle)
All callers want to fetch the full size of the pvec. Link: https://lkml.kernel.org/r/20201112212641.27837-13-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: William Kucharski <william.kucharski@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Cc: Dave Chinner <dchinner@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Yang Shi <yang.shi@linux.alibaba.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-26mm: add an 'end' parameter to pagevec_lookup_entriesMatthew Wilcox (Oracle)
Simplifies the callers and uses the existing functionality in find_get_entries(). We can also drop the final argument of truncate_exceptional_pvec_entries() and simplify the logic in that function. Link: https://lkml.kernel.org/r/20201112212641.27837-12-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: William Kucharski <william.kucharski@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Cc: Dave Chinner <dchinner@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Yang Shi <yang.shi@linux.alibaba.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-26mm: add an 'end' parameter to find_get_entriesMatthew Wilcox (Oracle)
This simplifies the callers and leads to a more efficient implementation since the XArray has this functionality already. Link: https://lkml.kernel.org/r/20201112212641.27837-11-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: William Kucharski <william.kucharski@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Cc: Dave Chinner <dchinner@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Yang Shi <yang.shi@linux.alibaba.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-26mm/filemap: add mapping_seek_hole_dataMatthew Wilcox (Oracle)
Rewrite shmem_seek_hole_data() and move it to filemap.c. [willy@infradead.org: don't put an xa_is_value() page] Link: https://lkml.kernel.org/r/20201124041507.28996-4-willy@infradead.org Link: https://lkml.kernel.org/r/20201112212641.27837-8-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: William Kucharski <william.kucharski@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Cc: Dave Chinner <dchinner@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jan Kara <jack@suse.cz> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Yang Shi <yang.shi@linux.alibaba.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-26mm: add FGP_ENTRYMatthew Wilcox (Oracle)
The functionality of find_lock_entry() and find_get_entry() can be provided by pagecache_get_page(), which lets us delete find_lock_entry() and make find_get_entry() static. Link: https://lkml.kernel.org/r/20201112212641.27837-5-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Cc: Dave Chinner <dchinner@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jan Kara <jack@suse.cz> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: William Kucharski <william.kucharski@oracle.com> Cc: Yang Shi <yang.shi@linux.alibaba.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-26Merge tag 'nfs-for-5.12-1' of git://git.linux-nfs.org/projects/anna/linux-nfsLinus Torvalds
Pull NFS Client Updates from Anna Schumaker: "New Features: - Support for eager writes, and the write=eager and write=wait mount options - Other Bugfixes and Cleanups: - Fix typos in some comments - Fix up fall-through warnings for Clang - Cleanups to the NFS readpage codepath - Remove FMR support in rpcrdma_convert_iovs() - Various other cleanups to xprtrdma - Fix xprtrdma pad optimization for servers that don't support RFC 8797 - Improvements to rpcrdma tracepoints - Fix up nfs4_bitmask_adjust() - Optimize sparse writes past the end of files" * tag 'nfs-for-5.12-1' of git://git.linux-nfs.org/projects/anna/linux-nfs: (27 commits) NFS: Support the '-owrite=' option in /proc/self/mounts and mountinfo NFS: Set the stable writes flag when initialising the super block NFS: Add mount options supporting eager writes NFS: Add support for eager writes NFS: 'flags' field should be unsigned in struct nfs_server NFS: Don't set NFS_INO_INVALID_XATTR if there is no xattr cache NFS: Always clear an invalid mapping when attempting a buffered write NFS: Optimise sparse writes past the end of file NFS: Fix documenting comment for nfs_revalidate_file_size() NFSv4: Fixes for nfs4_bitmask_adjust() xprtrdma: Clean up rpcrdma_prepare_readch() rpcrdma: Capture bytes received in Receive completion tracepoints xprtrdma: Pad optimization, revisited rpcrdma: Fix comments about reverse-direction operation xprtrdma: Refactor invocations of offset_in_page() xprtrdma: Simplify rpcrdma_convert_kvec() and frwr_map() xprtrdma: Remove FMR support in rpcrdma_convert_iovs() NFS: Add nfs_pageio_complete_read() and remove nfs_readpage_async() NFS: Call readpage_async_filler() from nfs_readpage_async() NFS: Refactor nfs_readpage() and nfs_readpage_async() to use nfs_readdesc ...
2021-02-25Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhostLinus Torvalds
Pull virtio updates from Michael Tsirkin: - new vdpa features to allow creation and deletion of new devices - virtio-blk support per-device queue depth - fixes, cleanups all over the place * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: (31 commits) virtio-input: add multi-touch support virtio_mmio: fix one typo vdpa/mlx5: fix param validation in mlx5_vdpa_get_config() virtio_net: Fix fall-through warnings for Clang virtio_input: Prevent EV_MSC/MSC_TIMESTAMP loop storm for MT. virtio-blk: support per-device queue depth virtio_vdpa: don't warn when fail to disable vq virtio-pci: introduce modern device module virito-pci-modern: rename map_capability() to vp_modern_map_capability() virtio-pci-modern: introduce helper to get notification offset virtio-pci-modern: introduce helper for getting queue nums virtio-pci-modern: introduce helper for setting/geting queue size virtio-pci-modern: introduce helper to set/get queue_enable virtio-pci-modern: introduce vp_modern_queue_address() virtio-pci-modern: introduce vp_modern_set_queue_vector() virtio-pci-modern: introduce vp_modern_generation() virtio-pci-modern: introduce helpers for setting and getting features virtio-pci-modern: introduce helpers for setting and getting status virtio-pci-modern: introduce helper to set config vector virtio-pci-modern: introduce vp_modern_remove() ...
2021-02-25Merge tag 'mips_5.12_1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux Pull more MIPS updates from Thomas Bogendoerfer: - added n64 block driver - fix for ubsan warnings - fix for bcm63xx platform - update of linux-mips mailinglist * tag 'mips_5.12_1' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux: arch: mips: update references to current linux-mips list mips: bmips: init clocks earlier vmlinux.lds.h: catch even more instrumentation symbols into .data n64: store dev instance into disk private data n64: cleanup n64cart_probe() n64: cosmetics changes n64: remove curly brackets n64: use sector SECTOR_SHIFT instead 512 n64: use enums for reg n64: move module param at the top n64: move module info at the end n64: use pr_fmt to avoid duplicate string block: Add n64 cart driver
2021-02-25Merge tag 'drm-next-2021-02-26' of git://anongit.freedesktop.org/drm/drmLinus Torvalds
Pull more drm updates from Dave Airlie: "This is mostly fixes but I missed msm-next pull last week. It's been in drm-next. Otherwise it's a selection of i915, amdgpu and misc fixes, one TTM memory leak, nothing really major stands out otherwise. core: - vblank fence timing improvements dma-buf: - improve error handling ttm: - memory leak fix msm: - a6xx speedbin support - a508, a509, a512 support - various a5xx fixes - various dpu fixes - qseed3lite support for sm8250 - dsi fix for msm8994 - mdp5 fix for framerate bug with cmd mode panels - a6xx GMU OOB race fixes that were showing up in CI - various addition and removal of semicolons - gem submit fix for legacy userspace relocs path amdgpu: - clang warning fix - S0ix platform shutdown/poweroff fix - misc display fixes i915: - color format fix - -Wuninitialised reenabled - GVT ww locking, cmd parser fixes atyfb: - fix build rockchip: - AFBC modifier fix" * tag 'drm-next-2021-02-26' of git://anongit.freedesktop.org/drm/drm: (60 commits) drm/panel: kd35t133: allow using non-continuous dsi clock drm/rockchip: Require the YTR modifier for AFBC drm/ttm: Fix a memory leak drm/drm_vblank: set the dma-fence timestamp during send_vblank_event dma-fence: allow signaling drivers to set fence timestamp dma-buf: heaps: Rework heap allocation hooks to return struct dma_buf instead of fd dma-buf: system_heap: Make sure to return an error if we abort drm/amd/display: Fix system hang after multiple hotplugs (v3) drm/amdgpu: fix shutdown and poweroff process failed with s0ix drm/i915: Nuke INTEL_OUTPUT_FORMAT_INVALID drm/i915: Enable -Wuninitialized drm/amd/display: Remove Assert from dcn10_get_dig_frontend drm/amd/display: Add vupdate_no_lock interrupts for DCN2.1 Revert "drm/amd/display: reuse current context instead of recreating one" drm/amd/pm/swsmu: Avoid using structure_size uninitialized in smu_cmn_init_soft_gpu_metrics fbdev: atyfb: add stubs for aty_{ld,st}_lcd() drm/i915/gvt: Introduce per object locking in GVT scheduler. drm/i915/gvt: Purge dev_priv->gt drm/i915/gvt: Parse default state to update reg whitelist dt-bindings: dp-connector: Drop maxItems from -supply ...
2021-02-25Merge tag 'net-5.12-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Rather small batch this time. Current release - regressions: - bcm63xx_enet: fix sporadic kernel panic due to queue length mis-accounting Current release - new code bugs: - bcm4908_enet: fix RX path possible mem leak - bcm4908_enet: fix NAPI poll returned value - stmmac: fix missing spin_lock_init in visconti_eth_dwmac_probe() - sched: cls_flower: validate ct_state for invalid and reply flags Previous releases - regressions: - net: introduce CAN specific pointer in the struct net_device to prevent mis-interpreting memory - phy: micrel: set soft_reset callback to genphy_soft_reset for KSZ8081 - psample: fix netlink skb length with tunnel info Previous releases - always broken: - icmp: pass zeroed opts from icmp{,v6}_ndo_send before sending - wireguard: device: do not generate ICMP for non-IP packets - mptcp: provide subflow aware release function to avoid a mem leak - hsr: add support for EntryForgetTime - r8169: fix jumbo packet handling on RTL8168e - octeontx2-af: fix an off by one in rvu_dbg_qsize_write() - i40e: fix flow for IPv6 next header (extension header) - phy: icplus: call phy_restore_page() when phy_select_page() fails - dpaa_eth: fix the access method for the dpaa_napi_portal" * tag 'net-5.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (55 commits) r8169: fix jumbo packet handling on RTL8168e net: phy: micrel: set soft_reset callback to genphy_soft_reset for KSZ8081 net: psample: Fix netlink skb length with tunnel info net: broadcom: bcm4908_enet: fix NAPI poll returned value net: broadcom: bcm4908_enet: fix RX path possible mem leak net: hsr: add support for EntryForgetTime net: dsa: sja1105: Remove unneeded cast in sja1105_crc32() ibmvnic: fix a race between open and reset net: stmmac: Fix missing spin_lock_init in visconti_eth_dwmac_probe() net: introduce CAN specific pointer in the struct net_device net: usb: qmi_wwan: support ZTE P685M modem wireguard: kconfig: use arm chacha even with no neon wireguard: queueing: get rid of per-peer ring buffers wireguard: device: do not generate ICMP for non-IP packets wireguard: peer: put frequently used members above cache lines wireguard: selftests: test multiple parallel streams wireguard: socket: remove bogus __be32 annotation wireguard: avoid double unlikely() notation when using IS_ERR() net: qrtr: Fix memory leak in qrtr_tun_open vxlan: move debug check after netdev unregister ...
2021-02-25Merge tag 'acpi-5.12-rc1-3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull more ACPI updates from Rafael Wysocki: "These make additional changes to the platform profile interface merged recently and add support for the FPDT ACPI table. Specifics: - Rearrange Kconfig handling of ACPI_PLATFORM_PROFILE, add "balanced-performance" to the list of supported platform profiles and fix up some file references in a comment (Maximilian Luz). - Add support for parsing the ACPI Firmware Performance Data Table (FPDT) and exposing the data from there via sysfs (Zhang Rui)" * tag 'acpi-5.12-rc1-3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: ACPI: platform: Add balanced-performance platform profile ACPI: platform: Fix file references in comment ACPI: platform: Hide ACPI_PLATFORM_PROFILE option ACPI: tables: introduce support for FPDT table
2021-02-25Merge tag 'kbuild-v5.12' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild Pull Kbuild updates from Masahiro Yamada: - Fix false-positive build warnings for ARCH=ia64 builds - Optimize dictionary size for module compression with xz - Check the compiler and linker versions in Kconfig - Fix misuse of extra-y - Support DWARF v5 debug info - Clamp SUBLEVEL to 255 because stable releases 4.4.x and 4.9.x exceeded the limit - Add generic syscall{tbl,hdr}.sh for cleanups across arches - Minor cleanups of genksyms - Minor cleanups of Kconfig * tag 'kbuild-v5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (38 commits) initramfs: Remove redundant dependency of RD_ZSTD on BLK_DEV_INITRD kbuild: remove deprecated 'always' and 'hostprogs-y/m' kbuild: parse C= and M= before changing the working directory kbuild: reuse this-makefile to define abs_srctree kconfig: unify rule of config, menuconfig, nconfig, gconfig, xconfig kconfig: omit --oldaskconfig option for 'make config' kconfig: fix 'invalid option' for help option kconfig: remove dead code in conf_askvalue() kconfig: clean up nested if-conditionals in check_conf() kconfig: Remove duplicate call to sym_get_string_value() Makefile: Remove # characters from compiler string Makefile: reuse CC_VERSION_TEXT kbuild: check the minimum linker version in Kconfig kbuild: remove ld-version macro scripts: add generic syscallhdr.sh scripts: add generic syscalltbl.sh arch: syscalls: remove $(srctree)/ prefix from syscall tables arch: syscalls: add missing FORCE and fix 'targets' to make if_changed work gen_compile_commands: prune some directories kbuild: simplify access to the kernel's version ...
2021-02-25Merge tag 'pci-v5.12-changes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci Pull PCI updates from Bjorn Helgaas: "Enumeration: - Remove unnecessary locking around _OSC (Bjorn Helgaas) - Clarify message about _OSC failure (Bjorn Helgaas) - Remove notification of PCIe bandwidth changes (Bjorn Helgaas) - Tidy checking of syscall user config accessors (Heiner Kallweit) Resource management: - Decline to resize resources if boot config must be preserved (Ard Biesheuvel) - Fix pci_register_io_range() memory leak (Geert Uytterhoeven) Error handling (Keith Busch): - Clear error status from the correct device - Retain error recovery status so drivers can use it after reset - Log the type of Port (Root or Switch Downstream) that we reset - Always request a reset for Downstream Ports in frozen state Endpoint framework and NTB (Kishon Vijay Abraham I): - Make *_get_first_free_bar() take into account 64 bit BAR - Add helper API to get the 'next' unreserved BAR - Make *_free_bar() return error codes on failure - Remove unused pci_epf_match_device() - Add support to associate secondary EPC with EPF - Add support in configfs to associate two EPCs with EPF - Add pci_epc_ops to map MSI IRQ - Add pci_epf_ops to expose function-specific attrs - Allow user to create sub-directory of 'EPF Device' directory - Implement ->msi_map_irq() ops for cadence - Configure LM_EP_FUNC_CFG based on epc->function_num_map for cadence - Add EP function driver to provide NTB functionality - Add support for EPF PCI Non-Transparent Bridge - Add specification for PCI NTB function device - Add PCI endpoint NTB function user guide - Add configfs binding documentation for pci-ntb endpoint function Broadcom STB PCIe controller driver: - Add support for BCM4908 and external PERST# signal controller (Rafał Miłecki) Cadence PCIe controller driver: - Retrain Link to work around Gen2 training defect (Nadeem Athani) - Fix merge botch in cdns_pcie_host_map_dma_ranges() (Krzysztof Wilczyński) Freescale Layerscape PCIe controller driver: - Add LX2160A rev2 EP mode support (Hou Zhiqiang) - Convert to builtin_platform_driver() (Michael Walle) MediaTek PCIe controller driver: - Fix OF node reference leak (Krzysztof Wilczyński) Microchip PolarFlare PCIe controller driver: - Add Microchip PolarFire PCIe controller driver (Daire McNamara) Qualcomm PCIe controller driver: - Use PHY_REFCLK_USE_PAD only for ipq8064 (Ansuel Smith) - Add support for ddrss_sf_tbu clock for sm8250 (Dmitry Baryshkov) Renesas R-Car PCIe controller driver: - Drop PCIE_RCAR config option (Lad Prabhakar) - Always allocate MSI addresses in 32bit space (Marek Vasut) Rockchip PCIe controller driver: - Add FriendlyARM NanoPi M4B DT binding (Chen-Yu Tsai) - Make 'ep-gpios' DT property optional (Chen-Yu Tsai) Synopsys DesignWare PCIe controller driver: - Work around ECRC configuration hardware defect (Vidya Sagar) - Drop support for config space in DT 'ranges' (Rob Herring) - Change size to u64 for EP outbound iATU (Shradha Todi) - Add upper limit address for outbound iATU (Shradha Todi) - Make dw_pcie ops optional (Jisheng Zhang) - Remove unnecessary dw_pcie_ops from al driver (Jisheng Zhang) Xilinx Versal CPM PCIe controller driver: - Fix OF node reference leak (Pan Bian) Miscellaneous: - Remove tango host controller driver (Arnd Bergmann) - Remove IRQ handler & data together (altera-msi, brcmstb, dwc) (Martin Kaiser) - Fix xgene-msi race in installing chained IRQ handler (Martin Kaiser) - Apply CONFIG_PCI_DEBUG to entire drivers/pci hierarchy (Junhao He) - Fix pci-bridge-emul array overruns (Russell King) - Remove obsolete uses of WARN_ON(in_interrupt()) (Sebastian Andrzej Siewior)" * tag 'pci-v5.12-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (69 commits) PCI: qcom: Use PHY_REFCLK_USE_PAD only for ipq8064 PCI: qcom: Add support for ddrss_sf_tbu clock dt-bindings: PCI: qcom: Document ddrss_sf_tbu clock for sm8250 PCI: al: Remove useless dw_pcie_ops PCI: dwc: Don't assume the ops in dw_pcie always exist PCI: dwc: Add upper limit address for outbound iATU PCI: dwc: Change size to u64 for EP outbound iATU PCI: dwc: Drop support for config space in 'ranges' PCI: layerscape: Convert to builtin_platform_driver() PCI: layerscape: Add LX2160A rev2 EP mode support dt-bindings: PCI: layerscape: Add LX2160A rev2 compatible strings PCI: dwc: Work around ECRC configuration issue PCI/portdrv: Report reset for frozen channel PCI/AER: Specify the type of Port that was reset PCI/ERR: Retain status from error notification PCI/AER: Clear AER status from Root Port when resetting Downstream Port PCI/ERR: Clear status of the reporting device dt-bindings: arm: rockchip: Add FriendlyARM NanoPi M4B PCI: rockchip: Make 'ep-gpios' DT property optional Documentation: PCI: Add PCI endpoint NTB function user guide ...
2021-02-24Merge tag 'x86-entry-2021-02-24' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 irq entry updates from Thomas Gleixner: "The irq stack switching was moved out of the ASM entry code in course of the entry code consolidation. It ended up being suboptimal in various ways. This reworks the X86 irq stack handling: - Make the stack switching inline so the stackpointer manipulation is not longer at an easy to find place. - Get rid of the unnecessary indirect call. - Avoid the double stack switching in interrupt return and reuse the interrupt stack for softirq handling. - A objtool fix for CONFIG_FRAME_POINTER=y builds where it got confused about the stack pointer manipulation" * tag 'x86-entry-2021-02-24' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: objtool: Fix stack-swizzle for FRAME_POINTER=y um: Enforce the usage of asm-generic/softirq_stack.h x86/softirq/64: Inline do_softirq_own_stack() softirq: Move do_softirq_own_stack() to generic asm header softirq: Move __ARCH_HAS_DO_SOFTIRQ to Kconfig x86: Select CONFIG_HAVE_IRQ_EXIT_ON_IRQ_STACK x86/softirq: Remove indirection in do_softirq_own_stack() x86/entry: Use run_sysvec_on_irqstack_cond() for XEN upcall x86/entry: Convert device interrupts to inline stack switching x86/entry: Convert system vectors to irq stack macro x86/irq: Provide macro for inlining irq stack switching x86/apic: Split out spurious handling code x86/irq/64: Adjust the per CPU irq stack pointer by 8 x86/irq: Sanitize irq stack tracking x86/entry: Fix instrumentation annotation
2021-02-24Merge branch 'akpm' (patches from Andrew)Linus Torvalds
Merge misc updates from Andrew Morton: "A few small subsystems and some of MM. 172 patches. Subsystems affected by this patch series: hexagon, scripts, ntfs, ocfs2, vfs, and mm (slab-generic, slab, slub, debug, pagecache, swap, memcg, pagemap, mprotect, mremap, page-reporting, vmalloc, kasan, pagealloc, memory-failure, hugetlb, vmscan, z3fold, compaction, mempolicy, oom-kill, hugetlbfs, and migration)" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (172 commits) mm/migrate: remove unneeded semicolons hugetlbfs: remove unneeded return value of hugetlb_vmtruncate() hugetlbfs: fix some comment typos hugetlbfs: correct some obsolete comments about inode i_mutex hugetlbfs: make hugepage size conversion more readable hugetlbfs: remove meaningless variable avoid_reserve hugetlbfs: correct obsolete function name in hugetlbfs_read_iter() hugetlbfs: use helper macro default_hstate in init_hugetlbfs_fs hugetlbfs: remove useless BUG_ON(!inode) in hugetlbfs_setattr() hugetlbfs: remove special hugetlbfs_set_page_dirty() mm/hugetlb: change hugetlb_reserve_pages() to type bool mm, oom: fix a comment in dump_task() mm/mempolicy: use helper range_in_vma() in queue_pages_test_walk() numa balancing: migrate on fault among multiple bound nodes mm, compaction: make fast_isolate_freepages() stay within zone mm/compaction: fix misbehaviors of fast_find_migrateblock() mm/compaction: correct deferral logic for proactive compaction mm/compaction: remove duplicated VM_BUG_ON_PAGE !PageLocked mm/compaction: remove rcu_read_lock during page compaction z3fold: simplify the zhdr initialization code in init_z3fold_page() ...