aboutsummaryrefslogtreecommitdiff
path: root/Documentation
diff options
context:
space:
mode:
authorStephen Rothwell <sfr@canb.auug.org.au>2017-06-02 15:10:00 +1000
committerStephen Rothwell <sfr@canb.auug.org.au>2017-06-02 15:10:00 +1000
commit29dd111c576bcdd7a17483405d67ec8a81a6a2fc (patch)
tree86e1a1560b6b93d035d3e6448874e820c4e25857 /Documentation
parent1bc60b48157f140f72a98357b7f1f4eef12a887b (diff)
parentd25f0d9c6100a46e1e35cde84b4bb8b892254180 (diff)
Merge branch 'akpm-current/current'
Diffstat (limited to 'Documentation')
-rw-r--r--Documentation/ABI/testing/sysfs-block-zram10
-rw-r--r--Documentation/admin-guide/kernel-parameters.txt7
-rw-r--r--Documentation/blockdev/zram.txt3
-rw-r--r--Documentation/cgroup-v2.txt48
-rw-r--r--Documentation/dev-tools/kmemleak.rst1
-rw-r--r--Documentation/fault-injection/fault-injection.txt79
-rw-r--r--Documentation/filesystems/proc.txt6
-rw-r--r--Documentation/kdump/kdump.txt12
-rw-r--r--Documentation/vm/ksm.txt63
9 files changed, 219 insertions, 10 deletions
diff --git a/Documentation/ABI/testing/sysfs-block-zram b/Documentation/ABI/testing/sysfs-block-zram
index 451b6d882b2c..3c1945f4d5f2 100644
--- a/Documentation/ABI/testing/sysfs-block-zram
+++ b/Documentation/ABI/testing/sysfs-block-zram
@@ -90,3 +90,13 @@ Description:
device's debugging info useful for kernel developers. Its
format is not documented intentionally and may change
anytime without any notice.
+
+What: /sys/block/zram<id>/use_dedup
+Date: March 2017
+Contact: Joonsoo Kim <iamjoonsoo.kim@lge.com>
+Description:
+ The use_dedup file is read-write and specifies deduplication
+ feature is used or not. If enabled, duplicated data is
+ managed by reference count and will not be stored in memory
+ twice. Benefit of this feature largely depends on the workload
+ so keep attention when use.
diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 0d16c9b81309..9b852eb3b620 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -2286,8 +2286,11 @@
that the amount of memory usable for all allocations
is not too small.
- movable_node [KNL] Boot-time switch to enable the effects
- of CONFIG_MOVABLE_NODE=y. See mm/Kconfig for details.
+ movable_node [KNL] Boot-time switch to make hotplugable memory
+ NUMA nodes to be movable. This means that the memory
+ of such nodes will be usable only for movable
+ allocations which rules out almost all kernel
+ allocations. Use with caution!
MTD_Partition= [MTD]
Format: <name>,<region-number>,<size>,<offset>
diff --git a/Documentation/blockdev/zram.txt b/Documentation/blockdev/zram.txt
index 4fced8a21307..cbbe39b39953 100644
--- a/Documentation/blockdev/zram.txt
+++ b/Documentation/blockdev/zram.txt
@@ -168,6 +168,7 @@ max_comp_streams RW the number of possible concurrent compress operations
comp_algorithm RW show and change the compression algorithm
compact WO trigger memory compaction
debug_stat RO this file is used for zram debugging purposes
+use_dedup RW show and set deduplication feature
User space is advised to use the following files to read the device statistics.
@@ -217,6 +218,8 @@ line of text and contains the following stats separated by whitespace:
same_pages the number of same element filled pages written to this disk.
No memory is allocated for such pages.
pages_compacted the number of pages freed during compaction
+ dup_data_size deduplicated data size
+ meta_data_size the amount of metadata allocated for deduplication feature
9) Deactivate:
swapoff /dev/zram0
diff --git a/Documentation/cgroup-v2.txt b/Documentation/cgroup-v2.txt
index dc5e2dcdbef4..a86f3cb88125 100644
--- a/Documentation/cgroup-v2.txt
+++ b/Documentation/cgroup-v2.txt
@@ -826,13 +826,25 @@ PAGE_SIZE multiple when read back.
The number of times the cgroup's memory usage was
about to go over the max boundary. If direct reclaim
- fails to bring it down, the OOM killer is invoked.
+ fails to bring it down, the cgroup goes to OOM state.
oom
- The number of times the OOM killer has been invoked in
- the cgroup. This may not exactly match the number of
- processes killed but should generally be close.
+ The number of time the cgroup's memory usage was
+ reached the limit and allocation was about to fail.
+
+ Depending on context result could be invocation of OOM
+ killer and retrying allocation or failing alloction.
+
+ Failed allocation in its turn could be returned into
+ userspace as -ENOMEM or siletly ignored in cases like
+ disk readahead. For now OOM in memory cgroup kills
+ tasks iff shortage has happened inside page fault.
+
+ oom_kill
+
+ The number of processes belonging to this cgroup
+ killed by any kind of OOM killer.
memory.stat
@@ -930,6 +942,34 @@ PAGE_SIZE multiple when read back.
Number of times a shadow node has been reclaimed
+ pgrefill
+
+ Amount of scanned pages (in an active LRU list)
+
+ pgscan
+
+ Amount of scanned pages (in an inactive LRU list)
+
+ pgsteal
+
+ Amount of reclaimed pages
+
+ pgactivate
+
+ Amount of pages moved to the active LRU list
+
+ pgdeactivate
+
+ Amount of pages moved to the inactive LRU lis
+
+ pglazyfree
+
+ Amount of pages postponed to be freed under memory pressure
+
+ pglazyfreed
+
+ Amount of reclaimed lazyfree pages
+
memory.swap.current
A read-only single value file which exists on non-root
diff --git a/Documentation/dev-tools/kmemleak.rst b/Documentation/dev-tools/kmemleak.rst
index b2391b829169..cb8862659178 100644
--- a/Documentation/dev-tools/kmemleak.rst
+++ b/Documentation/dev-tools/kmemleak.rst
@@ -150,6 +150,7 @@ See the include/linux/kmemleak.h header for the functions prototype.
- ``kmemleak_init`` - initialize kmemleak
- ``kmemleak_alloc`` - notify of a memory block allocation
- ``kmemleak_alloc_percpu`` - notify of a percpu memory block allocation
+- ``kmemleak_vmalloc`` - notify of a vmalloc() memory allocation
- ``kmemleak_free`` - notify of a memory block freeing
- ``kmemleak_free_part`` - notify of a partial memory block freeing
- ``kmemleak_free_percpu`` - notify of a percpu memory block freeing
diff --git a/Documentation/fault-injection/fault-injection.txt b/Documentation/fault-injection/fault-injection.txt
index 415484f3d59a..918972babcd8 100644
--- a/Documentation/fault-injection/fault-injection.txt
+++ b/Documentation/fault-injection/fault-injection.txt
@@ -134,6 +134,23 @@ use the boot option:
fail_futex=
mmc_core.fail_request=<interval>,<probability>,<space>,<times>
+o proc entries
+
+- /proc/<pid>/fail-nth:
+- /proc/self/task/<tid>/fail-nth:
+
+ Write to this file of integer N makes N-th call in the task fail.
+ Read from this file returns a integer value. A value of '0' indicates
+ that the fault setup with a previous write to this file was injected.
+ A positive integer N indicates that the fault wasn't yet injected.
+ Note that this file enables all types of faults (slab, futex, etc).
+ This setting takes precedence over all other generic debugfs settings
+ like probability, interval, times, etc. But per-capability settings
+ (e.g. fail_futex/ignore-private) take precedence over it.
+
+ This feature is intended for systematic testing of faults in a single
+ system call. See an example below.
+
How to add new fault injection capability
-----------------------------------------
@@ -278,3 +295,65 @@ allocation failure.
# env FAILCMD_TYPE=fail_page_alloc \
./tools/testing/fault-injection/failcmd.sh --times=100 \
-- make -C tools/testing/selftests/ run_tests
+
+Systematic faults using fail-nth
+---------------------------------
+
+The following code systematically faults 0-th, 1-st, 2-nd and so on
+capabilities in the socketpair() system call.
+
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <sys/socket.h>
+#include <sys/syscall.h>
+#include <fcntl.h>
+#include <unistd.h>
+#include <string.h>
+#include <stdlib.h>
+#include <stdio.h>
+#include <errno.h>
+
+int main()
+{
+ int i, err, res, fail_nth, fds[2];
+ char buf[128];
+
+ system("echo N > /sys/kernel/debug/failslab/ignore-gfp-wait");
+ sprintf(buf, "/proc/self/task/%ld/fail-nth", syscall(SYS_gettid));
+ fail_nth = open(buf, O_RDWR);
+ for (i = 1;; i++) {
+ sprintf(buf, "%d", i);
+ write(fail_nth, buf, strlen(buf));
+ res = socketpair(AF_LOCAL, SOCK_STREAM, 0, fds);
+ err = errno;
+ pread(fail_nth, buf, sizeof(buf), 0);
+ if (res == 0) {
+ close(fds[0]);
+ close(fds[1]);
+ }
+ printf("%d-th fault %c: res=%d/%d\n", i, atoi(buf) ? 'N' : 'Y',
+ res, err);
+ if (atoi(buf))
+ break;
+ }
+ return 0;
+}
+
+An example output:
+
+1-th fault Y: res=-1/23
+2-th fault Y: res=-1/23
+3-th fault Y: res=-1/12
+4-th fault Y: res=-1/12
+5-th fault Y: res=-1/23
+6-th fault Y: res=-1/23
+7-th fault Y: res=-1/23
+8-th fault Y: res=-1/12
+9-th fault Y: res=-1/12
+10-th fault Y: res=-1/12
+11-th fault Y: res=-1/12
+12-th fault Y: res=-1/12
+13-th fault Y: res=-1/12
+14-th fault Y: res=-1/12
+15-th fault Y: res=-1/12
+16-th fault N: res=0/12
diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt
index 4cddbce85ac9..adba21b5ada7 100644
--- a/Documentation/filesystems/proc.txt
+++ b/Documentation/filesystems/proc.txt
@@ -1786,12 +1786,16 @@ pair provide additional information particular to the objects they represent.
pos: 0
flags: 02
mnt_id: 9
- tfd: 5 events: 1d data: ffffffffffffffff
+ tfd: 5 events: 1d data: ffffffffffffffff pos:0 ino:61af sdev:7
where 'tfd' is a target file descriptor number in decimal form,
'events' is events mask being watched and the 'data' is data
associated with a target [see epoll(7) for more details].
+ The 'pos' is current offset of the target file in decimal form
+ [see lseek(2)], 'ino' and 'sdev' are inode and device numbers
+ where target file resides, all in hex format.
+
Fsnotify files
~~~~~~~~~~~~~~
For inotify files the format is the following
diff --git a/Documentation/kdump/kdump.txt b/Documentation/kdump/kdump.txt
index 615434d81108..51814450a7f8 100644
--- a/Documentation/kdump/kdump.txt
+++ b/Documentation/kdump/kdump.txt
@@ -112,8 +112,8 @@ There are two possible methods of using Kdump.
2) Or use the system kernel binary itself as dump-capture kernel and there is
no need to build a separate dump-capture kernel. This is possible
only with the architectures which support a relocatable kernel. As
- of today, i386, x86_64, ppc64, ia64 and arm architectures support relocatable
- kernel.
+ of today, i386, x86_64, ppc64, ia64, arm and arm64 architectures support
+ relocatable kernel.
Building a relocatable kernel is advantageous from the point of view that
one does not have to build a second kernel for capturing the dump. But
@@ -339,7 +339,7 @@ For arm:
For arm64:
- Use vmlinux or Image
-If you are using a uncompressed vmlinux image then use following command
+If you are using an uncompressed vmlinux image then use following command
to load dump-capture kernel.
kexec -p <dump-capture-kernel-vmlinux-image> \
@@ -361,6 +361,12 @@ to load dump-capture kernel.
--dtb=<dtb-for-dump-capture-kernel> \
--append="root=<root-dev> <arch-specific-options>"
+If you are using an uncompressed Image, then use following command
+to load dump-capture kernel.
+
+ kexec -p <dump-capture-kernel-Image> \
+ --initrd=<initrd-for-dump-capture-kernel> \
+ --append="root=<root-dev> <arch-specific-options>"
Please note, that --args-linux does not need to be specified for ia64.
It is planned to make this a no-op on that architecture, but for now
diff --git a/Documentation/vm/ksm.txt b/Documentation/vm/ksm.txt
index 6b0ca7feb135..6686bd267dc9 100644
--- a/Documentation/vm/ksm.txt
+++ b/Documentation/vm/ksm.txt
@@ -98,6 +98,50 @@ use_zero_pages - specifies whether empty pages (i.e. allocated pages
it is only effective for pages merged after the change.
Default: 0 (normal KSM behaviour as in earlier releases)
+max_page_sharing - Maximum sharing allowed for each KSM page. This
+ enforces a deduplication limit to avoid the virtual
+ memory rmap lists to grow too large. The minimum
+ value is 2 as a newly created KSM page will have at
+ least two sharers. The rmap walk has O(N)
+ complexity where N is the number of rmap_items
+ (i.e. virtual mappings) that are sharing the page,
+ which is in turn capped by max_page_sharing. So
+ this effectively spread the the linear O(N)
+ computational complexity from rmap walk context
+ over different KSM pages. The ksmd walk over the
+ stable_node "chains" is also O(N), but N is the
+ number of stable_node "dups", not the number of
+ rmap_items, so it has not a significant impact on
+ ksmd performance. In practice the best stable_node
+ "dup" candidate will be kept and found at the head
+ of the "dups" list. The higher this value the
+ faster KSM will merge the memory (because there
+ will be fewer stable_node dups queued into the
+ stable_node chain->hlist to check for pruning) and
+ the higher the deduplication factor will be, but
+ the slowest the worst case rmap walk could be for
+ any given KSM page. Slowing down the rmap_walk
+ means there will be higher latency for certain
+ virtual memory operations happening during
+ swapping, compaction, NUMA balancing and page
+ migration, in turn decreasing responsiveness for
+ the caller of those virtual memory operations. The
+ scheduler latency of other tasks not involved with
+ the VM operations doing the rmap walk is not
+ affected by this parameter as the rmap walks are
+ always schedule friendly themselves.
+
+stable_node_chains_prune_millisecs - How frequently to walk the whole
+ list of stable_node "dups" linked in the
+ stable_node "chains" in order to prune stale
+ stable_nodes. Smaller milllisecs values will free
+ up the KSM metadata with lower latency, but they
+ will make ksmd use more CPU during the scan. This
+ only applies to the stable_node chains so it's a
+ noop if not a single KSM page hit the
+ max_page_sharing yet (there would be no stable_node
+ chains in such case).
+
The effectiveness of KSM and MADV_MERGEABLE is shown in /sys/kernel/mm/ksm/:
pages_shared - how many shared pages are being used
@@ -106,10 +150,29 @@ pages_unshared - how many pages unique but repeatedly checked for merging
pages_volatile - how many pages changing too fast to be placed in a tree
full_scans - how many times all mergeable areas have been scanned
+stable_node_chains - number of stable node chains allocated, this is
+ effectively the number of KSM pages that hit the
+ max_page_sharing limit
+stable_node_dups - number of stable node dups queued into the
+ stable_node chains
+
A high ratio of pages_sharing to pages_shared indicates good sharing, but
a high ratio of pages_unshared to pages_sharing indicates wasted effort.
pages_volatile embraces several different kinds of activity, but a high
proportion there would also indicate poor use of madvise MADV_MERGEABLE.
+The maximum possible page_sharing/page_shared ratio is limited by the
+max_page_sharing tunable. To increase the ratio max_page_sharing must
+be increased accordingly.
+
+The stable_node_dups/stable_node_chains ratio is also affected by the
+max_page_sharing tunable, and an high ratio may indicate fragmentation
+in the stable_node dups, which could be solved by introducing
+fragmentation algorithms in ksmd which would refile rmap_items from
+one stable_node dup to another stable_node dup, in order to freeup
+stable_node "dups" with few rmap_items in them, but that may increase
+the ksmd CPU usage and possibly slowdown the readonly computations on
+the KSM pages of the applications.
+
Izik Eidus,
Hugh Dickins, 17 Nov 2009