diff options
author | Stephen Rothwell <sfr@canb.auug.org.au> | 2017-06-02 15:10:00 +1000 |
---|---|---|
committer | Stephen Rothwell <sfr@canb.auug.org.au> | 2017-06-02 15:10:00 +1000 |
commit | 29dd111c576bcdd7a17483405d67ec8a81a6a2fc (patch) | |
tree | 86e1a1560b6b93d035d3e6448874e820c4e25857 /Documentation | |
parent | 1bc60b48157f140f72a98357b7f1f4eef12a887b (diff) | |
parent | d25f0d9c6100a46e1e35cde84b4bb8b892254180 (diff) |
Merge branch 'akpm-current/current'
Diffstat (limited to 'Documentation')
-rw-r--r-- | Documentation/ABI/testing/sysfs-block-zram | 10 | ||||
-rw-r--r-- | Documentation/admin-guide/kernel-parameters.txt | 7 | ||||
-rw-r--r-- | Documentation/blockdev/zram.txt | 3 | ||||
-rw-r--r-- | Documentation/cgroup-v2.txt | 48 | ||||
-rw-r--r-- | Documentation/dev-tools/kmemleak.rst | 1 | ||||
-rw-r--r-- | Documentation/fault-injection/fault-injection.txt | 79 | ||||
-rw-r--r-- | Documentation/filesystems/proc.txt | 6 | ||||
-rw-r--r-- | Documentation/kdump/kdump.txt | 12 | ||||
-rw-r--r-- | Documentation/vm/ksm.txt | 63 |
9 files changed, 219 insertions, 10 deletions
diff --git a/Documentation/ABI/testing/sysfs-block-zram b/Documentation/ABI/testing/sysfs-block-zram index 451b6d882b2c..3c1945f4d5f2 100644 --- a/Documentation/ABI/testing/sysfs-block-zram +++ b/Documentation/ABI/testing/sysfs-block-zram @@ -90,3 +90,13 @@ Description: device's debugging info useful for kernel developers. Its format is not documented intentionally and may change anytime without any notice. + +What: /sys/block/zram<id>/use_dedup +Date: March 2017 +Contact: Joonsoo Kim <iamjoonsoo.kim@lge.com> +Description: + The use_dedup file is read-write and specifies deduplication + feature is used or not. If enabled, duplicated data is + managed by reference count and will not be stored in memory + twice. Benefit of this feature largely depends on the workload + so keep attention when use. diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index 0d16c9b81309..9b852eb3b620 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -2286,8 +2286,11 @@ that the amount of memory usable for all allocations is not too small. - movable_node [KNL] Boot-time switch to enable the effects - of CONFIG_MOVABLE_NODE=y. See mm/Kconfig for details. + movable_node [KNL] Boot-time switch to make hotplugable memory + NUMA nodes to be movable. This means that the memory + of such nodes will be usable only for movable + allocations which rules out almost all kernel + allocations. Use with caution! MTD_Partition= [MTD] Format: <name>,<region-number>,<size>,<offset> diff --git a/Documentation/blockdev/zram.txt b/Documentation/blockdev/zram.txt index 4fced8a21307..cbbe39b39953 100644 --- a/Documentation/blockdev/zram.txt +++ b/Documentation/blockdev/zram.txt @@ -168,6 +168,7 @@ max_comp_streams RW the number of possible concurrent compress operations comp_algorithm RW show and change the compression algorithm compact WO trigger memory compaction debug_stat RO this file is used for zram debugging purposes +use_dedup RW show and set deduplication feature User space is advised to use the following files to read the device statistics. @@ -217,6 +218,8 @@ line of text and contains the following stats separated by whitespace: same_pages the number of same element filled pages written to this disk. No memory is allocated for such pages. pages_compacted the number of pages freed during compaction + dup_data_size deduplicated data size + meta_data_size the amount of metadata allocated for deduplication feature 9) Deactivate: swapoff /dev/zram0 diff --git a/Documentation/cgroup-v2.txt b/Documentation/cgroup-v2.txt index dc5e2dcdbef4..a86f3cb88125 100644 --- a/Documentation/cgroup-v2.txt +++ b/Documentation/cgroup-v2.txt @@ -826,13 +826,25 @@ PAGE_SIZE multiple when read back. The number of times the cgroup's memory usage was about to go over the max boundary. If direct reclaim - fails to bring it down, the OOM killer is invoked. + fails to bring it down, the cgroup goes to OOM state. oom - The number of times the OOM killer has been invoked in - the cgroup. This may not exactly match the number of - processes killed but should generally be close. + The number of time the cgroup's memory usage was + reached the limit and allocation was about to fail. + + Depending on context result could be invocation of OOM + killer and retrying allocation or failing alloction. + + Failed allocation in its turn could be returned into + userspace as -ENOMEM or siletly ignored in cases like + disk readahead. For now OOM in memory cgroup kills + tasks iff shortage has happened inside page fault. + + oom_kill + + The number of processes belonging to this cgroup + killed by any kind of OOM killer. memory.stat @@ -930,6 +942,34 @@ PAGE_SIZE multiple when read back. Number of times a shadow node has been reclaimed + pgrefill + + Amount of scanned pages (in an active LRU list) + + pgscan + + Amount of scanned pages (in an inactive LRU list) + + pgsteal + + Amount of reclaimed pages + + pgactivate + + Amount of pages moved to the active LRU list + + pgdeactivate + + Amount of pages moved to the inactive LRU lis + + pglazyfree + + Amount of pages postponed to be freed under memory pressure + + pglazyfreed + + Amount of reclaimed lazyfree pages + memory.swap.current A read-only single value file which exists on non-root diff --git a/Documentation/dev-tools/kmemleak.rst b/Documentation/dev-tools/kmemleak.rst index b2391b829169..cb8862659178 100644 --- a/Documentation/dev-tools/kmemleak.rst +++ b/Documentation/dev-tools/kmemleak.rst @@ -150,6 +150,7 @@ See the include/linux/kmemleak.h header for the functions prototype. - ``kmemleak_init`` - initialize kmemleak - ``kmemleak_alloc`` - notify of a memory block allocation - ``kmemleak_alloc_percpu`` - notify of a percpu memory block allocation +- ``kmemleak_vmalloc`` - notify of a vmalloc() memory allocation - ``kmemleak_free`` - notify of a memory block freeing - ``kmemleak_free_part`` - notify of a partial memory block freeing - ``kmemleak_free_percpu`` - notify of a percpu memory block freeing diff --git a/Documentation/fault-injection/fault-injection.txt b/Documentation/fault-injection/fault-injection.txt index 415484f3d59a..918972babcd8 100644 --- a/Documentation/fault-injection/fault-injection.txt +++ b/Documentation/fault-injection/fault-injection.txt @@ -134,6 +134,23 @@ use the boot option: fail_futex= mmc_core.fail_request=<interval>,<probability>,<space>,<times> +o proc entries + +- /proc/<pid>/fail-nth: +- /proc/self/task/<tid>/fail-nth: + + Write to this file of integer N makes N-th call in the task fail. + Read from this file returns a integer value. A value of '0' indicates + that the fault setup with a previous write to this file was injected. + A positive integer N indicates that the fault wasn't yet injected. + Note that this file enables all types of faults (slab, futex, etc). + This setting takes precedence over all other generic debugfs settings + like probability, interval, times, etc. But per-capability settings + (e.g. fail_futex/ignore-private) take precedence over it. + + This feature is intended for systematic testing of faults in a single + system call. See an example below. + How to add new fault injection capability ----------------------------------------- @@ -278,3 +295,65 @@ allocation failure. # env FAILCMD_TYPE=fail_page_alloc \ ./tools/testing/fault-injection/failcmd.sh --times=100 \ -- make -C tools/testing/selftests/ run_tests + +Systematic faults using fail-nth +--------------------------------- + +The following code systematically faults 0-th, 1-st, 2-nd and so on +capabilities in the socketpair() system call. + +#include <sys/types.h> +#include <sys/stat.h> +#include <sys/socket.h> +#include <sys/syscall.h> +#include <fcntl.h> +#include <unistd.h> +#include <string.h> +#include <stdlib.h> +#include <stdio.h> +#include <errno.h> + +int main() +{ + int i, err, res, fail_nth, fds[2]; + char buf[128]; + + system("echo N > /sys/kernel/debug/failslab/ignore-gfp-wait"); + sprintf(buf, "/proc/self/task/%ld/fail-nth", syscall(SYS_gettid)); + fail_nth = open(buf, O_RDWR); + for (i = 1;; i++) { + sprintf(buf, "%d", i); + write(fail_nth, buf, strlen(buf)); + res = socketpair(AF_LOCAL, SOCK_STREAM, 0, fds); + err = errno; + pread(fail_nth, buf, sizeof(buf), 0); + if (res == 0) { + close(fds[0]); + close(fds[1]); + } + printf("%d-th fault %c: res=%d/%d\n", i, atoi(buf) ? 'N' : 'Y', + res, err); + if (atoi(buf)) + break; + } + return 0; +} + +An example output: + +1-th fault Y: res=-1/23 +2-th fault Y: res=-1/23 +3-th fault Y: res=-1/12 +4-th fault Y: res=-1/12 +5-th fault Y: res=-1/23 +6-th fault Y: res=-1/23 +7-th fault Y: res=-1/23 +8-th fault Y: res=-1/12 +9-th fault Y: res=-1/12 +10-th fault Y: res=-1/12 +11-th fault Y: res=-1/12 +12-th fault Y: res=-1/12 +13-th fault Y: res=-1/12 +14-th fault Y: res=-1/12 +15-th fault Y: res=-1/12 +16-th fault N: res=0/12 diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt index 4cddbce85ac9..adba21b5ada7 100644 --- a/Documentation/filesystems/proc.txt +++ b/Documentation/filesystems/proc.txt @@ -1786,12 +1786,16 @@ pair provide additional information particular to the objects they represent. pos: 0 flags: 02 mnt_id: 9 - tfd: 5 events: 1d data: ffffffffffffffff + tfd: 5 events: 1d data: ffffffffffffffff pos:0 ino:61af sdev:7 where 'tfd' is a target file descriptor number in decimal form, 'events' is events mask being watched and the 'data' is data associated with a target [see epoll(7) for more details]. + The 'pos' is current offset of the target file in decimal form + [see lseek(2)], 'ino' and 'sdev' are inode and device numbers + where target file resides, all in hex format. + Fsnotify files ~~~~~~~~~~~~~~ For inotify files the format is the following diff --git a/Documentation/kdump/kdump.txt b/Documentation/kdump/kdump.txt index 615434d81108..51814450a7f8 100644 --- a/Documentation/kdump/kdump.txt +++ b/Documentation/kdump/kdump.txt @@ -112,8 +112,8 @@ There are two possible methods of using Kdump. 2) Or use the system kernel binary itself as dump-capture kernel and there is no need to build a separate dump-capture kernel. This is possible only with the architectures which support a relocatable kernel. As - of today, i386, x86_64, ppc64, ia64 and arm architectures support relocatable - kernel. + of today, i386, x86_64, ppc64, ia64, arm and arm64 architectures support + relocatable kernel. Building a relocatable kernel is advantageous from the point of view that one does not have to build a second kernel for capturing the dump. But @@ -339,7 +339,7 @@ For arm: For arm64: - Use vmlinux or Image -If you are using a uncompressed vmlinux image then use following command +If you are using an uncompressed vmlinux image then use following command to load dump-capture kernel. kexec -p <dump-capture-kernel-vmlinux-image> \ @@ -361,6 +361,12 @@ to load dump-capture kernel. --dtb=<dtb-for-dump-capture-kernel> \ --append="root=<root-dev> <arch-specific-options>" +If you are using an uncompressed Image, then use following command +to load dump-capture kernel. + + kexec -p <dump-capture-kernel-Image> \ + --initrd=<initrd-for-dump-capture-kernel> \ + --append="root=<root-dev> <arch-specific-options>" Please note, that --args-linux does not need to be specified for ia64. It is planned to make this a no-op on that architecture, but for now diff --git a/Documentation/vm/ksm.txt b/Documentation/vm/ksm.txt index 6b0ca7feb135..6686bd267dc9 100644 --- a/Documentation/vm/ksm.txt +++ b/Documentation/vm/ksm.txt @@ -98,6 +98,50 @@ use_zero_pages - specifies whether empty pages (i.e. allocated pages it is only effective for pages merged after the change. Default: 0 (normal KSM behaviour as in earlier releases) +max_page_sharing - Maximum sharing allowed for each KSM page. This + enforces a deduplication limit to avoid the virtual + memory rmap lists to grow too large. The minimum + value is 2 as a newly created KSM page will have at + least two sharers. The rmap walk has O(N) + complexity where N is the number of rmap_items + (i.e. virtual mappings) that are sharing the page, + which is in turn capped by max_page_sharing. So + this effectively spread the the linear O(N) + computational complexity from rmap walk context + over different KSM pages. The ksmd walk over the + stable_node "chains" is also O(N), but N is the + number of stable_node "dups", not the number of + rmap_items, so it has not a significant impact on + ksmd performance. In practice the best stable_node + "dup" candidate will be kept and found at the head + of the "dups" list. The higher this value the + faster KSM will merge the memory (because there + will be fewer stable_node dups queued into the + stable_node chain->hlist to check for pruning) and + the higher the deduplication factor will be, but + the slowest the worst case rmap walk could be for + any given KSM page. Slowing down the rmap_walk + means there will be higher latency for certain + virtual memory operations happening during + swapping, compaction, NUMA balancing and page + migration, in turn decreasing responsiveness for + the caller of those virtual memory operations. The + scheduler latency of other tasks not involved with + the VM operations doing the rmap walk is not + affected by this parameter as the rmap walks are + always schedule friendly themselves. + +stable_node_chains_prune_millisecs - How frequently to walk the whole + list of stable_node "dups" linked in the + stable_node "chains" in order to prune stale + stable_nodes. Smaller milllisecs values will free + up the KSM metadata with lower latency, but they + will make ksmd use more CPU during the scan. This + only applies to the stable_node chains so it's a + noop if not a single KSM page hit the + max_page_sharing yet (there would be no stable_node + chains in such case). + The effectiveness of KSM and MADV_MERGEABLE is shown in /sys/kernel/mm/ksm/: pages_shared - how many shared pages are being used @@ -106,10 +150,29 @@ pages_unshared - how many pages unique but repeatedly checked for merging pages_volatile - how many pages changing too fast to be placed in a tree full_scans - how many times all mergeable areas have been scanned +stable_node_chains - number of stable node chains allocated, this is + effectively the number of KSM pages that hit the + max_page_sharing limit +stable_node_dups - number of stable node dups queued into the + stable_node chains + A high ratio of pages_sharing to pages_shared indicates good sharing, but a high ratio of pages_unshared to pages_sharing indicates wasted effort. pages_volatile embraces several different kinds of activity, but a high proportion there would also indicate poor use of madvise MADV_MERGEABLE. +The maximum possible page_sharing/page_shared ratio is limited by the +max_page_sharing tunable. To increase the ratio max_page_sharing must +be increased accordingly. + +The stable_node_dups/stable_node_chains ratio is also affected by the +max_page_sharing tunable, and an high ratio may indicate fragmentation +in the stable_node dups, which could be solved by introducing +fragmentation algorithms in ksmd which would refile rmap_items from +one stable_node dup to another stable_node dup, in order to freeup +stable_node "dups" with few rmap_items in them, but that may increase +the ksmd CPU usage and possibly slowdown the readonly computations on +the KSM pages of the applications. + Izik Eidus, Hugh Dickins, 17 Nov 2009 |