summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2022-06-06Merge 4.14.282 into android-4.14-stableGreg Kroah-Hartman
Changes in 4.14.282 x86/pci/xen: Disable PCI/MSI[-X] masking for XEN_HVM guests staging: rtl8723bs: prevent ->Ssid overflow in rtw_wx_set_scan() tcp: change source port randomizarion at connect() time secure_seq: use the 64 bits of the siphash for port offset calculation ACPI: sysfs: Make sparse happy about address space in use ACPI: sysfs: Fix BERT error region memory mapping net: af_key: check encryption module availability consistency net: ftgmac100: Disable hardware checksum on AST2600 drivers: i2c: thunderx: Allow driver to work with ACPI defined TWSI controllers assoc_array: Fix BUG_ON during garbage collect drm/i915: Fix -Wstringop-overflow warning in call to intel_read_wm_latency() block-map: add __GFP_ZERO flag for alloc_page in function bio_copy_kern exec: Force single empty string when argv is empty netfilter: conntrack: re-fetch conntrack after insertion zsmalloc: fix races between asynchronous zspage free and page migration dm integrity: fix error code in dm_integrity_ctr() dm crypt: make printing of the key constant-time dm stats: add cond_resched when looping over entries dm verity: set DM_TARGET_IMMUTABLE feature flag tpm: ibmvtpm: Correct the return value in tpm_ibmvtpm_probe() docs: submitting-patches: Fix crossref to 'The canonical patch format' NFSD: Fix possible sleep during nfsd4_release_lockowner() bpf: Enlarge offset check value to INT_MAX in bpf_skb_{load,store}_bytes Linux 4.14.282 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: Ieb45fbca1afbc9bf3a97a5beda24be2f82b65abe
2022-06-06NFSD: Fix possible sleep during nfsd4_release_lockowner()Chuck Lever
commit ce3c4ad7f4ce5db7b4f08a1e237d8dd94b39180b upstream. nfsd4_release_lockowner() holds clp->cl_lock when it calls check_for_locks(). However, check_for_locks() calls nfsd_file_get() / nfsd_file_put() to access the backing inode's flc_posix list, and nfsd_file_put() can sleep if the inode was recently removed. Let's instead rely on the stateowner's reference count to gate whether the release is permitted. This should be a reliable indication of locks-in-use since file lock operations and ->lm_get_owner take appropriate references, which are released appropriately when file locks are removed. Reported-by: Dai Ngo <dai.ngo@oracle.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-06-06exec: Force single empty string when argv is emptyKees Cook
commit dcd46d897adb70d63e025f175a00a89797d31a43 upstream. Quoting[1] Ariadne Conill: "In several other operating systems, it is a hard requirement that the second argument to execve(2) be the name of a program, thus prohibiting a scenario where argc < 1. POSIX 2017 also recommends this behaviour, but it is not an explicit requirement[2]: The argument arg0 should point to a filename string that is associated with the process being started by one of the exec functions. ... Interestingly, Michael Kerrisk opened an issue about this in 2008[3], but there was no consensus to support fixing this issue then. Hopefully now that CVE-2021-4034 shows practical exploitative use[4] of this bug in a shellcode, we can reconsider. This issue is being tracked in the KSPP issue tracker[5]." While the initial code searches[6][7] turned up what appeared to be mostly corner case tests, trying to that just reject argv == NULL (or an immediately terminated pointer list) quickly started tripping[8] existing userspace programs. The next best approach is forcing a single empty string into argv and adjusting argc to match. The number of programs depending on argc == 0 seems a smaller set than those calling execve with a NULL argv. Account for the additional stack space in bprm_stack_limits(). Inject an empty string when argc == 0 (and set argc = 1). Warn about the case so userspace has some notice about the change: process './argc0' launched './argc0' with NULL argv: empty string added Additionally WARN() and reject NULL argv usage for kernel threads. [1] https://lore.kernel.org/lkml/20220127000724.15106-1-ariadne@dereferenced.org/ [2] https://pubs.opengroup.org/onlinepubs/9699919799/functions/exec.html [3] https://bugzilla.kernel.org/show_bug.cgi?id=8408 [4] https://www.qualys.com/2022/01/25/cve-2021-4034/pwnkit.txt [5] https://github.com/KSPP/linux/issues/176 [6] https://codesearch.debian.net/search?q=execve%5C+*%5C%28%5B%5E%2C%5D%2B%2C+*NULL&literal=0 [7] https://codesearch.debian.net/search?q=execlp%3F%5Cs*%5C%28%5B%5E%2C%5D%2B%2C%5Cs*NULL&literal=0 [8] https://lore.kernel.org/lkml/20220131144352.GE16385@xsang-OptiPlex-9020/ Reported-by: Ariadne Conill <ariadne@dereferenced.org> Reported-by: Michael Kerrisk <mtk.manpages@gmail.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Christian Brauner <brauner@kernel.org> Cc: Rich Felker <dalias@libc.org> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: linux-fsdevel@vger.kernel.org Cc: stable@vger.kernel.org Signed-off-by: Kees Cook <keescook@chromium.org> Acked-by: Christian Brauner <brauner@kernel.org> Acked-by: Ariadne Conill <ariadne@dereferenced.org> Acked-by: Andy Lutomirski <luto@kernel.org> Link: https://lore.kernel.org/r/20220201000947.2453721-1-keescook@chromium.org [vegard: fixed conflicts due to missing 886d7de631da71e30909980fdbf318f7caade262^- and 3950e975431bc914f7e81b8f2a2dbdf2064acb0f^- and 655c16a8ce9c15842547f40ce23fd148aeccc074] Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-05-16Merge 4.14.279 into android-4.14-stableGreg Kroah-Hartman
Changes in 4.14.279 MIPS: Use address-of operator on section symbols block: drbd: drbd_nl: Make conversion to 'enum drbd_ret_code' explicit can: grcan: grcan_probe(): fix broken system id check for errata workaround needs can: grcan: only use the NAPI poll budget for RX Bluetooth: Fix the creation of hdev->name mmc: rtsx: add 74 Clocks in power on flow mm: hugetlb: fix missing cache flush in copy_huge_page_from_user() mm: userfaultfd: fix missing cache flush in mcopy_atomic_pte() and __mcopy_atomic() ALSA: pcm: Fix races among concurrent hw_params and hw_free calls ALSA: pcm: Fix races among concurrent read/write and buffer changes ALSA: pcm: Fix races among concurrent prepare and hw_params/hw_free calls ALSA: pcm: Fix races among concurrent prealloc proc writes ALSA: pcm: Fix potential AB/BA lock with buffer_mutex and mmap_lock VFS: Fix memory leak caused by concurrently mounting fs with subtype Linux 4.14.279 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: Iffa711de48afa20364d743cb3a59d668b2c36b6e
2022-05-15VFS: Fix memory leak caused by concurrently mounting fs with subtypeChenXiaoSong
If two processes mount same superblock, memory leak occurs: CPU0 | CPU1 do_new_mount | do_new_mount fs_set_subtype | fs_set_subtype kstrdup | | kstrdup memrory leak | The following reproducer triggers the problem: 1. shell command: mount -t ntfs /dev/sda1 /mnt & 2. c program: mount("/dev/sda1", "/mnt", "fuseblk", 0, "...") with kmemleak report being along the lines of unreferenced object 0xffff888235f1a5c0 (size 8): comm "mount.ntfs", pid 2860, jiffies 4295757824 (age 43.423s) hex dump (first 8 bytes): 00 a5 f1 35 82 88 ff ff ...5.... backtrace: [<00000000656e30cc>] __kmalloc_track_caller+0x16e/0x430 [<000000008e591727>] kstrdup+0x3e/0x90 [<000000008430d12b>] do_mount.cold+0x7b/0xd9 [<0000000078d639cd>] ksys_mount+0xb2/0x150 [<000000006015988d>] __x64_sys_mount+0x29/0x40 [<00000000e0a7c118>] do_syscall_64+0xc1/0x1d0 [<00000000bcea7df5>] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [<00000000803a4067>] 0xffffffffffffffff Linus's tree already have refactoring patchset [1], one of them can fix this bug: c30da2e981a7 ("fuse: convert to use the new mount API") After refactoring, init super_block->s_subtype in fuse_fill_super. Since we did not merge the refactoring patchset in this branch, I create this patch. This patch fix this by adding a write lock while calling fs_set_subtype. [1] https://patchwork.kernel.org/project/linux-fsdevel/patch/20190903113640.7984-3-mszeredi@redhat.com/ Fixes: 79c0b2df79eb ("add filesystem subtype support") Cc: David Howells <dhowells@redhat.com> Cc: <stable@vger.kernel.org> Signed-off-by: ChenXiaoSong <chenxiaosong2@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-05-12Merge 4.14.278 into android-4.14-stableGreg Kroah-Hartman
Changes in 4.14.278 floppy: disable FDRAWCMD by default hamradio: defer 6pack kfree after unregister_netdev hamradio: remove needs_free_netdev to avoid UAF net/sched: cls_u32: fix netns refcount changes in u32_change() Revert "net: ethernet: stmmac: fix altr_tse_pcs function when using a fixed-link" lightnvm: disable the subsystem usb: mtu3: fix USB 3.0 dual-role-switch from device to host USB: quirks: add a Realtek card reader USB: quirks: add STRING quirk for VCOM device USB: serial: whiteheat: fix heap overflow in WHITEHEAT_GET_DTR_RTS USB: serial: cp210x: add PIDs for Kamstrup USB Meter Reader USB: serial: option: add support for Cinterion MV32-WA/MV32-WB USB: serial: option: add Telit 0x1057, 0x1058, 0x1075 compositions xhci: stop polling roothubs after shutdown iio: dac: ad5592r: Fix the missing return value. iio: dac: ad5446: Fix read_raw not returning set value iio: magnetometer: ak8975: Fix the error handling in ak8975_power_on() usb: misc: fix improper handling of refcount in uss720_probe() usb: gadget: uvc: Fix crash when encoding data for usb request usb: gadget: configfs: clear deactivation flag in configfs_composite_unbind() serial: 8250: Also set sticky MCR bits in console restoration serial: 8250: Correct the clock for EndRun PTP/1588 PCIe device hex2bin: make the function hex_to_bin constant-time hex2bin: fix access beyond string end USB: Fix xhci event ring dequeue pointer ERDP update issue ARM: dts: imx6qdl-apalis: Fix sgtl5000 detection issue phy: samsung: Fix missing of_node_put() in exynos_sata_phy_probe phy: samsung: exynos5250-sata: fix missing device put in probe error paths ARM: OMAP2+: Fix refcount leak in omap_gic_of_init ARM: dts: Fix mmc order for omap3-gta04 ipvs: correctly print the memory size of ip_vs_conn_tab mtd: rawnand: Fix return value check of wait_for_completion_timeout sctp: check asoc strreset_chunk in sctp_generate_reconf_event pinctrl: pistachio: fix use of irq_of_parse_and_map() ip_gre: Make o_seqno start from 0 in native mode tcp: fix potential xmit stalls caused by TCP_NOTSENT_LOWAT bus: sunxi-rsb: Fix the return value of sunxi_rsb_device_create() clk: sunxi: sun9i-mmc: check return value after calling platform_get_resource() net: bcmgenet: hide status block before TX timestamping bnx2x: fix napi API usage sequence ASoC: wm8731: Disable the regulator when probing fails x86: __memcpy_flushcache: fix wrong alignment if size > 2^32 cifs: destage any unwritten data to the server before calling copychunk_write drivers: net: hippi: Fix deadlock in rr_close() x86/cpu: Load microcode during restore_processor_state() tty: n_gsm: fix wrong signal octet encoding in convergence layer type 2 tty: n_gsm: fix malformed counter for out of frame data tty: n_gsm: fix insufficient txframe size tty: n_gsm: fix missing explicit ldisc flush tty: n_gsm: fix wrong command retry handling tty: n_gsm: fix wrong command frame length field encoding tty: n_gsm: fix incorrect UA handling drm/vgem: Close use-after-free race in vgem_gem_create MIPS: Fix CP0 counter erratum detection for R4k CPUs parisc: Merge model and model name into one line in /proc/cpuinfo ALSA: fireworks: fix wrong return count shorter than expected by 4 bytes Revert "SUNRPC: attempt AF_LOCAL connect on setup" firewire: fix potential uaf in outbound_phy_packet_callback() firewire: remove check of list iterator against head past the loop body firewire: core: extend card->lock in fw_core_handle_bus_reset ASoC: wm8958: Fix change notifications for DSP controls can: grcan: grcan_close(): fix deadlock can: grcan: use ofdev->dev when allocating DMA memory nfc: replace improper check device_is_registered() in netlink related functions nfc: nfcmrvl: main: reorder destructive operations in nfcmrvl_nci_unregister_dev to avoid bugs NFC: netlink: fix sleep in atomic bug when firmware download timeout hwmon: (adt7470) Fix warning on module removal ASoC: dmaengine: Restore NULL prepare_slave_config() callback net: emaclite: Add error handling for of_address_to_resource() smsc911x: allow using IRQ0 btrfs: always log symlinks in full mode net: igmp: respect RCU rules in ip_mc_source() and ip_mc_msfilter() kvm: x86/cpuid: Only provide CPUID leaf 0xA if host has architectural PMU net: ipv6: ensure we call ipv6_mc_down() at most once dm: fix mempool NULL pointer race when completing IO dm: interlock pending dm_io and dm_wait_for_bios_completion PCI: aardvark: Clear all MSIs at setup PCI: aardvark: Fix reading MSI interrupt number Linux 4.14.278 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: Ic228df2ab4834dc5c32776a73c80f3d649dbbcd9
2022-05-12btrfs: always log symlinks in full modeFilipe Manana
commit d0e64a981fd841cb0f28fcd6afcac55e6f1e6994 upstream. On Linux, empty symlinks are invalid, and attempting to create one with the system call symlink(2) results in an -ENOENT error and this is explicitly documented in the man page. If we rename a symlink that was created in the current transaction and its parent directory was logged before, we actually end up logging the symlink without logging its content, which is stored in an inline extent. That means that after a power failure we can end up with an empty symlink, having no content and an i_size of 0 bytes. It can be easily reproduced like this: $ mkfs.btrfs -f /dev/sdc $ mount /dev/sdc /mnt $ mkdir /mnt/testdir $ sync # Create a file inside the directory and fsync the directory. $ touch /mnt/testdir/foo $ xfs_io -c "fsync" /mnt/testdir # Create a symlink inside the directory and then rename the symlink. $ ln -s /mnt/testdir/foo /mnt/testdir/bar $ mv /mnt/testdir/bar /mnt/testdir/baz # Now fsync again the directory, this persist the log tree. $ xfs_io -c "fsync" /mnt/testdir <power failure> $ mount /dev/sdc /mnt $ stat -c %s /mnt/testdir/baz 0 $ readlink /mnt/testdir/baz $ Fix this by always logging symlinks in full mode (LOG_INODE_ALL), so that their content is also logged. A test case for fstests will follow. CC: stable@vger.kernel.org # 4.9+ Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-05-12cifs: destage any unwritten data to the server before calling copychunk_writeRonnie Sahlberg
[ Upstream commit f5d0f921ea362636e4a2efb7c38d1ead373a8700 ] because the copychunk_write might cover a region of the file that has not yet been sent to the server and thus fail. A simple way to reproduce this is: truncate -s 0 /mnt/testfile; strace -f -o x -ttT xfs_io -i -f -c 'pwrite 0k 128k' -c 'fcollapse 16k 24k' /mnt/testfile the issue is that the 'pwrite 0k 128k' becomes rearranged on the wire with the 'fcollapse 16k 24k' due to write-back caching. fcollapse is implemented in cifs.ko as a SMB2 IOCTL(COPYCHUNK_WRITE) call and it will fail serverside since the file is still 0b in size serverside until the writes have been destaged. To avoid this we must ensure that we destage any unwritten data to the server before calling COPYCHUNK_WRITE. Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1997373 Reported-by: Xiaoli Feng <xifeng@redhat.com> Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-04-27Merge 4.14.277 into android-4.14-stableGreg Kroah-Hartman
Changes in 4.14.277 etherdevice: Adjust ether_addr* prototypes to silence -Wstringop-overead mm: page_alloc: fix building error on -Werror=array-compare tracing: Have traceon and traceoff trigger honor the instance tracing: Dump stacktrace trigger to the corresponding instance can: usb_8dev: usb_8dev_start_xmit(): fix double dev_kfree_skb() in error path gfs2: assign rgrp glock before compute_bitstructs ALSA: usb-audio: Clear MIDI port active flag after draining tcp: fix race condition when creating child sockets from syncookies tcp: Fix potential use-after-free due to double kfree() dmaengine: imx-sdma: Fix error checking in sdma_event_remap net/packet: fix packet_sock xmit return value checking netlink: reset network and mac headers in netlink_dump() ARM: vexpress/spc: Avoid negative array index when !SMP platform/x86: samsung-laptop: Fix an unsigned comparison which can never be negative ALSA: usb-audio: Fix undefined behavior due to shift overflowing the constant vxlan: fix error return code in vxlan_fdb_append cifs: Check the IOCB_DIRECT flag, not O_DIRECT brcmfmac: sdio: Fix undefined behavior due to shift overflowing the constant drm/msm/mdp5: check the return of kzalloc() net: macb: Restart tx only if queue pointer is lagging stat: fix inconsistency between struct stat and struct compat_stat ata: pata_marvell: Check the 'bmdma_addr' beforing reading dma: at_xdmac: fix a missing check on list iterator powerpc/perf: Fix power9 event alternatives openvswitch: fix OOB access in reserve_sfa_size() ASoC: soc-dapm: fix two incorrect uses of list iterator e1000e: Fix possible overflow in LTR decoding ARC: entry: fix syscall_trace_exit argument ext4: fix symlink file size not match to file content ext4: limit length to bitmap_maxbytes - blocksize in punch_hole ext4: fix overhead calculation to account for the reserved gdt blocks ext4: force overhead calculation if the s_overhead_cluster makes no sense staging: ion: Prevent incorrect reference counting behavour block/compat_ioctl: fix range check in BLKGETSIZE ax25: add refcount in ax25_dev to avoid UAF bugs ax25: fix reference count leaks of ax25_dev ax25: fix UAF bugs of net_device caused by rebinding operation ax25: Fix refcount leaks caused by ax25_cb_del() ax25: fix UAF bug in ax25_send_control() ax25: fix NPD bug in ax25_disconnect ax25: Fix NULL pointer dereferences in ax25 timers ax25: Fix UAF bugs in ax25 timers Revert "net: micrel: fix KS8851_MLL Kconfig" Linux 4.14.277 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: I8df77df882183aec177fa9d6d0f8cbe8ebbf54c2
2022-04-27ext4: force overhead calculation if the s_overhead_cluster makes no senseTheodore Ts'o
commit 85d825dbf4899a69407338bae462a59aa9a37326 upstream. If the file system does not use bigalloc, calculating the overhead is cheap, so force the recalculation of the overhead so we don't have to trust the precalculated overhead in the superblock. Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-04-27ext4: fix overhead calculation to account for the reserved gdt blocksTheodore Ts'o
commit 10b01ee92df52c8d7200afead4d5e5f55a5c58b1 upstream. The kernel calculation was underestimating the overhead by not taking into account the reserved gdt blocks. With this change, the overhead calculated by the kernel matches the overhead calculation in mke2fs. Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-04-27ext4: limit length to bitmap_maxbytes - blocksize in punch_holeTadeusz Struk
commit 2da376228a2427501feb9d15815a45dbdbdd753e upstream. Syzbot found an issue [1] in ext4_fallocate(). The C reproducer [2] calls fallocate(), passing size 0xffeffeff000ul, and offset 0x1000000ul, which, when added together exceed the bitmap_maxbytes for the inode. This triggers a BUG in ext4_ind_remove_space(). According to the comments in this function the 'end' parameter needs to be one block after the last block to be removed. In the case when the BUG is triggered it points to the last block. Modify the ext4_punch_hole() function and add constraint that caps the length to satisfy the one before laster block requirement. LINK: [1] https://syzkaller.appspot.com/bug?id=b80bd9cf348aac724a4f4dff251800106d721331 LINK: [2] https://syzkaller.appspot.com/text?tag=ReproC&x=14ba0238700000 Fixes: a4bb6b64e39a ("ext4: enable "punch hole" functionality") Reported-by: syzbot+7a806094edd5d07ba029@syzkaller.appspotmail.com Signed-off-by: Tadeusz Struk <tadeusz.struk@linaro.org> Link: https://lore.kernel.org/r/20220331200515.153214-1-tadeusz.struk@linaro.org Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-04-27ext4: fix symlink file size not match to file contentYe Bin
commit a2b0b205d125f27cddfb4f7280e39affdaf46686 upstream. We got issue as follows: [home]# fsck.ext4 -fn ram0yb e2fsck 1.45.6 (20-Mar-2020) Pass 1: Checking inodes, blocks, and sizes Pass 2: Checking directory structure Symlink /p3/d14/d1a/l3d (inode #3494) is invalid. Clear? no Entry 'l3d' in /p3/d14/d1a (3383) has an incorrect filetype (was 7, should be 0). Fix? no As the symlink file size does not match the file content. If the writeback of the symlink data block failed, ext4_finish_bio() handles the end of IO. However this function fails to mark the buffer with BH_write_io_error and so when unmount does journal checkpoint it cannot detect the writeback error and will cleanup the journal. Thus we've lost the correct data in the journal area. To solve this issue, mark the buffer as BH_write_io_error in ext4_finish_bio(). Cc: stable@kernel.org Signed-off-by: Ye Bin <yebin10@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20220321144438.201685-1-yebin10@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-04-27stat: fix inconsistency between struct stat and struct compat_statMikulas Patocka
[ Upstream commit 932aba1e169090357a77af18850a10c256b50819 ] struct stat (defined in arch/x86/include/uapi/asm/stat.h) has 32-bit st_dev and st_rdev; struct compat_stat (defined in arch/x86/include/asm/compat.h) has 16-bit st_dev and st_rdev followed by a 16-bit padding. This patch fixes struct compat_stat to match struct stat. [ Historical note: the old x86 'struct stat' did have that 16-bit field that the compat layer had kept around, but it was changes back in 2003 by "struct stat - support larger dev_t": https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git/commit/?id=e95b2065677fe32512a597a79db94b77b90c968d and back in those days, the x86_64 port was still new, and separate from the i386 code, and had already picked up the old version with a 16-bit st_dev field ] Note that we can't change compat_dev_t because it is used by compat_loop_info. Also, if the st_dev and st_rdev values are 32-bit, we don't have to use old_valid_dev to test if the value fits into them. This fixes -EOVERFLOW on filesystems that are on NVMe because NVMe uses the major number 259. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Cc: Andreas Schwab <schwab@linux-m68k.org> Cc: Matthew Wilcox <willy@infradead.org> Cc: Christoph Hellwig <hch@infradead.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-04-27cifs: Check the IOCB_DIRECT flag, not O_DIRECTDavid Howells
[ Upstream commit 994fd530a512597ffcd713b0f6d5bc916c5698f0 ] Use the IOCB_DIRECT indicator flag on the I/O context rather than checking to see if the file was opened O_DIRECT. Signed-off-by: David Howells <dhowells@redhat.com> cc: Steve French <sfrench@samba.org> cc: Shyam Prasad N <nspmangalore@gmail.com> cc: Rohith Surabattula <rohiths.msft@gmail.com> cc: linux-cifs@vger.kernel.org Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-04-27gfs2: assign rgrp glock before compute_bitstructsBob Peterson
commit 428f651cb80b227af47fc302e4931791f2fb4741 upstream. Before this patch, function read_rindex_entry called compute_bitstructs before it allocated a glock for the rgrp. But if compute_bitstructs found a problem with the rgrp, it called gfs2_consist_rgrpd, and that called gfs2_dump_glock for rgd->rd_gl which had not yet been assigned. read_rindex_entry compute_bitstructs gfs2_consist_rgrpd gfs2_dump_glock <---------rgd->rd_gl was not set. This patch changes read_rindex_entry so it assigns an rgrp glock before calling compute_bitstructs so gfs2_dump_glock does not reference an unassigned pointer. If an error is discovered, the glock must also be put, so a new goto and label were added. Reported-by: syzbot+c6fd14145e2f62ca0784@syzkaller.appspotmail.com Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-04-21Merge 4.14.276 into android-4.14-stableGreg Kroah-Hartman
Changes in 4.14.276 USB: serial: pl2303: add IBM device IDs USB: serial: simple: add Nokia phone driver netdevice: add the case if dev is NULL virtio_console: break out of buf poll on remove ethernet: sun: Free the coherent when failing in probing spi: Fix invalid sgs value spi: Fix erroneous sgs value with min_t() af_key: add __GFP_ZERO flag for compose_sadb_supported in function pfkey_register fuse: fix pipe buffer lifetime for direct_io tpm: fix reference counting for struct tpm_chip block: Add a helper to validate the block size virtio-blk: Use blk_validate_block_size() to validate block size USB: usb-storage: Fix use of bitfields for hardware data in ene_ub6250.c coresight: Fix TRCCONFIGR.QE sysfs interface iio: inkern: apply consumer scale on IIO_VAL_INT cases iio: inkern: apply consumer scale when no channel scale is available iio: inkern: make a best effort on offset calculation clk: uniphier: Fix fixed-rate initialization ptrace: Check PTRACE_O_SUSPEND_SECCOMP permission on PTRACE_SEIZE Documentation: add link to stable release candidate tree Documentation: update stable tree link SUNRPC: avoid race between mod_timer() and del_timer_sync() NFSD: prevent underflow in nfssvc_decode_writeargs() pinctrl: samsung: drop pin banks references on error paths can: ems_usb: ems_usb_start_xmit(): fix double dev_kfree_skb() in error path jffs2: fix use-after-free in jffs2_clear_xattr_subsystem jffs2: fix memory leak in jffs2_do_mount_fs jffs2: fix memory leak in jffs2_scan_medium mm/pages_alloc.c: don't create ZONE_MOVABLE beyond the end of a node mempolicy: mbind_range() set_policy() after vma_merge() scsi: libsas: Fix sas_ata_qc_issue() handling of NCQ NON DATA commands qed: display VF trust config qed: validate and restrict untrusted VFs vlan promisc mode Revert "Input: clear BTN_RIGHT/MIDDLE on buttonpads" ALSA: cs4236: fix an incorrect NULL check on list iterator drbd: fix potential silent data corruption ACPI: properties: Consistently return -ENOENT if there are no more references drivers: hamradio: 6pack: fix UAF bug caused by mod_timer() video: fbdev: sm712fb: Fix crash in smtcfb_read() video: fbdev: atari: Atari 2 bpp (STe) palette bugfix ARM: dts: at91: sama5d2: Fix PMERRLOC resource size ARM: dts: exynos: fix UART3 pins configuration in Exynos5250 ARM: dts: exynos: add missing HDMI supplies on SMDK5250 ARM: dts: exynos: add missing HDMI supplies on SMDK5420 carl9170: fix missing bit-wise or operator for tx_params thermal: int340x: Increase bitmap size lib/raid6/test: fix multiple definition linking error DEC: Limit PMAX memory probing to R3k systems media: davinci: vpif: fix unbalanced runtime PM get brcmfmac: firmware: Allocate space for default boardrev in nvram brcmfmac: pcie: Replace brcmf_pcie_copy_mem_todev with memcpy_toio PCI: pciehp: Clear cmd_busy bit in polling mode crypto: authenc - Fix sleep in atomic context in decrypt_tail crypto: mxs-dcp - Fix scatterlist processing spi: tegra114: Add missing IRQ check in tegra_spi_probe selftests/x86: Add validity check and allow field splitting spi: pxa2xx-pci: Balance reference count for PCI DMA device hwmon: (pmbus) Add mutex to regulator ops hwmon: (sch56xx-common) Replace WDOG_ACTIVE with WDOG_HW_RUNNING PM: hibernate: fix __setup handler error handling PM: suspend: fix return value of __setup handler hwrng: atmel - disable trng on failure path crypto: vmx - add missing dependencies ACPI: APEI: fix return value of __setup handlers crypto: ccp - ccp_dmaengine_unregister release dma channels hwmon: (pmbus) Add Vin unit off handling clocksource: acpi_pm: fix return value of __setup handler sched/debug: Remove mpol_get/put and task_lock/unlock from sched_show_numa perf/core: Fix address filter parser for multiple filters perf/x86/intel/pt: Fix address filter config for 32-bit kernel media: coda: Fix missing put_device() call in coda_get_vdoa_data video: fbdev: smscufx: Fix null-ptr-deref in ufx_usb_probe() video: fbdev: fbcvt.c: fix printing in fb_cvt_print_name() ARM: dts: qcom: ipq4019: fix sleep clock soc: ti: wkup_m3_ipc: Fix IRQ check in wkup_m3_ipc_probe media: usb: go7007: s2250-board: fix leak in probe() ASoC: ti: davinci-i2s: Add check for clk_enable() ALSA: spi: Add check for clk_enable() arm64: dts: ns2: Fix spi-cpol and spi-cpha property arm64: dts: broadcom: Fix sata nodename printk: fix return value of printk.devkmsg __setup handler ASoC: mxs-saif: Handle errors for clk_enable ASoC: atmel_ssc_dai: Handle errors for clk_enable memory: emif: Add check for setup_interrupts memory: emif: check the pointer temp in get_device_details() ALSA: firewire-lib: fix uninitialized flag for AV/C deferred transaction media: stk1160: If start stream fails, return buffers with VB2_BUF_STATE_QUEUED ASoC: atmel: Add missing of_node_put() in at91sam9g20ek_audio_probe ASoC: wm8350: Handle error for wm8350_register_irq ASoC: fsi: Add check for clk_enable video: fbdev: omapfb: Add missing of_node_put() in dvic_probe_of ASoC: dmaengine: do not use a NULL prepare_slave_config() callback ASoC: mxs: Fix error handling in mxs_sgtl5000_probe ASoC: imx-es8328: Fix error return code in imx_es8328_probe() ASoC: msm8916-wcd-digital: Fix missing clk_disable_unprepare() in msm8916_wcd_digital_probe mtd: onenand: Check for error irq drm/edid: Don't clear formats if using deep color ath9k_htc: fix uninit value bugs power: reset: gemini-poweroff: Fix IRQ check in gemini_poweroff_probe ray_cs: Check ioremap return value power: supply: ab8500: Fix memory leak in ab8500_fg_sysfs_init HID: i2c-hid: fix GET/SET_REPORT for unnumbered reports iwlwifi: Fix -EIO error code that is never returned dm crypt: fix get_key_size compiler warning if !CONFIG_KEYS scsi: pm8001: Fix command initialization in pm80XX_send_read_log() scsi: pm8001: Fix command initialization in pm8001_chip_ssp_tm_req() scsi: pm8001: Fix payload initialization in pm80xx_set_thermal_config() scsi: pm8001: Fix abort all task initialization TOMOYO: fix __setup handlers return values ext2: correct max file size computing drm/tegra: Fix reference leak in tegra_dsi_ganged_probe power: supply: bq24190_charger: Fix bq24190_vbus_is_enabled() wrong false return KVM: x86: Fix emulation in writing cr8 KVM: x86/emulator: Defer not-present segment check in __load_segment_descriptor() i2c: xiic: Make bus names unique power: supply: wm8350-power: Handle error for wm8350_register_irq power: supply: wm8350-power: Add missing free in free_charger_irq PCI: Reduce warnings on possible RW1C corruption powerpc/sysdev: fix incorrect use to determine if list is empty mfd: mc13xxx: Add check for mc13xxx_irq_request vxcan: enable local echo for sent CAN frames MIPS: RB532: fix return value of __setup handler mtd: rawnand: atmel: fix refcount issue in atmel_nand_controller_init USB: storage: ums-realtek: fix error code in rts51x_read_mem() af_netlink: Fix shift out of bounds in group mask calculation i2c: mux: demux-pinctrl: do not deactivate a master that is not active tcp: ensure PMTU updates are processed during fastopen mfd: asic3: Add missing iounmap() on error asic3_mfd_probe mxser: fix xmit_buf leak in activate when LSR == 0xff pwm: lpc18xx-sct: Initialize driver data and hardware before pwmchip_add() staging:iio:adc:ad7280a: Fix handing of device address bit reversing. serial: 8250_mid: Balance reference count for PCI DMA device serial: 8250: Fix race condition in RTS-after-send handling iio: adc: Add check for devm_request_threaded_irq clk: qcom: clk-rcg2: Update the frac table for pixel clock remoteproc: qcom_wcnss: Add missing of_node_put() in wcnss_alloc_memory_region clk: loongson1: Terminate clk_div_table with sentinel element clk: clps711x: Terminate clk_div_table with sentinel element clk: tegra: tegra124-emc: Fix missing put_device() call in emc_ensure_emc_driver NFS: remove unneeded check in decode_devicenotify_args() pinctrl: mediatek: Fix missing of_node_put() in mtk_pctrl_init pinctrl: nomadik: Add missing of_node_put() in nmk_pinctrl_probe pinctrl/rockchip: Add missing of_node_put() in rockchip_pinctrl_probe tty: hvc: fix return value of __setup handler kgdboc: fix return value of __setup handler kgdbts: fix return value of __setup handler jfs: fix divide error in dbNextAG netfilter: nf_conntrack_tcp: preserve liberal flag in tcp options xen: fix is_xen_pmu() net: phy: broadcom: Fix brcm_fet_config_init() qlcnic: dcb: default to returning -EOPNOTSUPP net/x25: Fix null-ptr-deref caused by x25_disconnect NFSv4/pNFS: Fix another issue with a list iterator pointing to the head lib/test: use after free in register_test_dev_kmod() selinux: use correct type for context length loop: use sysfs_emit() in the sysfs xxx show() Fix incorrect type in assignment of ipv6 port for audit irqchip/nvic: Release nvic_base upon failure ACPICA: Avoid walking the ACPI Namespace if it is not there ACPI/APEI: Limit printable size of BERT table data PM: core: keep irq flags in device_pm_check_callbacks() spi: tegra20: Use of_device_get_match_data() ext4: don't BUG if someone dirty pages without asking ext4 first ntfs: add sanity check on allocation size video: fbdev: nvidiafb: Use strscpy() to prevent buffer overflow video: fbdev: w100fb: Reset global state video: fbdev: cirrusfb: check pixclock to avoid divide by zero video: fbdev: omapfb: acx565akm: replace snprintf with sysfs_emit ARM: dts: qcom: fix gic_irq_domain_translate warnings for msm8960 ARM: dts: bcm2837: Add the missing L1/L2 cache information video: fbdev: omapfb: panel-dsi-cm: Use sysfs_emit() instead of snprintf() video: fbdev: omapfb: panel-tpo-td043mtea1: Use sysfs_emit() instead of snprintf() ASoC: soc-core: skip zero num_dai component in searching dai name media: cx88-mpeg: clear interrupt status register before streaming video ARM: tegra: tamonten: Fix I2C3 pad setting ARM: mmp: Fix failure to remove sram device video: fbdev: sm712fb: Fix crash in smtcfb_write() media: hdpvr: initialize dev->worker at hdpvr_register_videodev mmc: host: Return an error when ->enable_sdio_irq() ops is missing powerpc/lib/sstep: Fix 'sthcx' instruction powerpc/lib/sstep: Fix build errors with newer binutils scsi: qla2xxx: Fix warning for missing error code scsi: qla2xxx: Suppress a kernel complaint in qla_create_qpair() KVM: Prevent module exit until all VMs are freed ubifs: rename_whiteout: Fix double free for whiteout_ui->data ubifs: Add missing iput if do_tmpfile() failed in rename whiteout ubifs: setflags: Make dirtied_ino_d 8 bytes aligned ubifs: rename_whiteout: correct old_dir size computing can: mcba_usb: mcba_usb_start_xmit(): fix double dev_kfree_skb in error path can: mcba_usb: properly check endpoint type gfs2: Make sure FITRIM minlen is rounded up to fs block size pinctrl: pinconf-generic: Print arguments for bias-pull-* ubi: Fix race condition between ctrl_cdev_ioctl and ubi_cdev_ioctl ACPI: CPPC: Avoid out of bounds access when parsing _CPC data mm/mmap: return 1 from stack_guard_gap __setup() handler mm/memcontrol: return 1 from cgroup.memory __setup() handler ubi: fastmap: Return error code if memory allocation fails in add_aeb() ASoC: topology: Allow TLV control to be either read or write ARM: dts: spear1340: Update serial node properties ARM: dts: spear13xx: Update SPI dma properties openvswitch: Fixed nd target mask field in the flow dump. KVM: x86: Forbid VMM to set SYNIC/STIMER MSRs when SynIC wasn't activated ubifs: Rectify space amount budget for mkdir/tmpfile operations rtc: wm8350: Handle error for wm8350_register_irq ARM: 9187/1: JIVE: fix return value of __setup handler KVM: x86/svm: Clear reserved bits written to PerfEvtSeln MSRs ath5k: fix OOB in ath5k_eeprom_read_pcal_info_5111 ptp: replace snprintf with sysfs_emit powerpc: dts: t104xrdb: fix phy type for FMAN 4/5 scsi: mvsas: Replace snprintf() with sysfs_emit() scsi: bfa: Replace snprintf() with sysfs_emit() power: supply: axp20x_battery: properly report current when discharging powerpc: Set crashkernel offset to mid of RMA region PCI: aardvark: Fix support for MSI interrupts iommu/arm-smmu-v3: fix event handling soft lockup dm ioctl: prevent potential spectre v1 gadget scsi: pm8001: Fix pm8001_mpi_task_abort_resp() scsi: aha152x: Fix aha152x_setup() __setup handler return value net/smc: correct settings of RMB window update limit macvtap: advertise link netns via netlink bnxt_en: Eliminate unintended link toggle during FW reset MIPS: fix fortify panic when copying asm exception handlers scsi: libfc: Fix use after free in fc_exch_abts_resp() usb: dwc3: omap: fix "unbalanced disables for smps10_out1" on omap5evm xtensa: fix DTC warning unit_address_format Bluetooth: Fix use after free in hci_send_acl init/main.c: return 1 from handled __setup() functions w1: w1_therm: fixes w1_seq for ds28ea00 sensors SUNRPC/call_alloc: async tasks mustn't block waiting for memory NFS: swap IO handling is slightly different for O_DIRECT IO NFS: swap-out must always use STABLE writes. serial: samsung_tty: do not unlock port->lock for uart_write_wakeup() virtio_console: eliminate anonymous module_init & module_exit jfs: prevent NULL deref in diFree parisc: Fix CPU affinity for Lasi, WAX and Dino chips ipv6: add missing tx timestamping on IPPROTO_RAW net: add missing SOF_TIMESTAMPING_OPT_ID support mm: fix race between MADV_FREE reclaim and blkdev direct IO read drm/amdgpu: fix off by one in amdgpu_gfx_kiq_acquire() scsi: zorro7xx: Fix a resource leak in zorro7xx_remove_one() net: stmmac: Fix unset max_speed difference between DT and non-DT platforms drm/imx: Fix memory leak in imx_pd_connector_get_modes drbd: Fix five use after free bugs in get_initial_state Revert "mmc: sdhci-xenon: fix annoying 1.8V regulator warning" mmmremap.c: avoid pointless invalidate_range_start/end on mremap(old_size=0) mm/mempolicy: fix mpol_new leak in shared_policy_replace x86/pm: Save the MSR validity status at context setup x86/speculation: Restore speculation related MSRs during S3 resume btrfs: fix qgroup reserve overflow the qgroup limit arm64: patch_text: Fixup last cpu should be master perf: qcom_l2_pmu: fix an incorrect NULL check on list iterator tools build: Use $(shell ) instead of `` to get embedded libperl's ccopts dmaengine: Revert "dmaengine: shdma: Fix runtime PM imbalance on error" mm: don't skip swap entry even if zap_details specified arm64: module: remove (NOLOAD) from linker script mm/sparsemem: fix 'mem_section' will never be NULL gcc 12 warning cgroup: Use open-time credentials for process migraton perm checks cgroup: Allocate cgroup_file_ctx for kernfs_open_file->priv cgroup: Use open-time cgroup namespace for process migration perm checks xfrm: policy: match with both mark and mask on user interfaces memory: atmel-ebi: Fix missing of_node_put in atmel_ebi_probe veth: Ensure eth header is in skb's linear part gpiolib: acpi: use correct format characters mlxsw: i2c: Fix initialization error flow net: ethernet: stmmac: fix altr_tse_pcs function when using a fixed-link nfc: nci: add flush_workqueue to prevent uaf cifs: potential buffer overflow in handling symlinks drm/amd: Add USBC connector ID drm/amdkfd: Check for potential null return of kmalloc_array() Drivers: hv: vmbus: Prevent load re-ordering when reading ring buffer scsi: target: tcmu: Fix possible page UAF scsi: ibmvscsis: Increase INITIAL_SRP_LIMIT to 1024 net: micrel: fix KS8851_MLL Kconfig ata: libata-core: Disable READ LOG DMA EXT for Samsung 840 EVOs gpu: ipu-v3: Fix dev_dbg frequency output scsi: mvsas: Add PCI ID of RocketRaid 2640 drivers: net: slip: fix NPD bug in sl_tx_timeout() mm, page_alloc: fix build_zonerefs_node() mm: kmemleak: take a full lowmem check in kmemleak_*_phys() gcc-plugins: latent_entropy: use /dev/urandom ALSA: pcm: Test for "silence" field in struct "pcm_format_data" ARM: davinci: da850-evm: Avoid NULL pointer dereference smp: Fix offline cpu check in flush_smp_call_function_queue() i2c: pasemi: Wait for write xfers to finish Linux 4.14.276 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: I45d8292ce654c0236758030a89b4618cf3a3d87b
2022-04-20cifs: potential buffer overflow in handling symlinksHarshit Mogalapalli
[ Upstream commit 64c4a37ac04eeb43c42d272f6e6c8c12bfcf4304 ] Smatch printed a warning: arch/x86/crypto/poly1305_glue.c:198 poly1305_update_arch() error: __memcpy() 'dctx->buf' too small (16 vs u32max) It's caused because Smatch marks 'link_len' as untrusted since it comes from sscanf(). Add a check to ensure that 'link_len' is not larger than the size of the 'link_str' buffer. Fixes: c69c1b6eaea1 ("cifs: implement CIFSParseMFSymlink()") Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com> Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-04-20btrfs: fix qgroup reserve overflow the qgroup limitEthan Lien
commit b642b52d0b50f4d398cb4293f64992d0eed2e2ce upstream. We use extent_changeset->bytes_changed in qgroup_reserve_data() to record how many bytes we set for EXTENT_QGROUP_RESERVED state. Currently the bytes_changed is set as "unsigned int", and it will overflow if we try to fallocate a range larger than 4GiB. The result is we reserve less bytes and eventually break the qgroup limit. Unlike regular buffered/direct write, which we use one changeset for each ordered extent, which can never be larger than 256M. For fallocate, we use one changeset for the whole range, thus it no longer respects the 256M per extent limit, and caused the problem. The following example test script reproduces the problem: $ cat qgroup-overflow.sh #!/bin/bash DEV=/dev/sdj MNT=/mnt/sdj mkfs.btrfs -f $DEV mount $DEV $MNT # Set qgroup limit to 2GiB. btrfs quota enable $MNT btrfs qgroup limit 2G $MNT # Try to fallocate a 3GiB file. This should fail. echo echo "Try to fallocate a 3GiB file..." fallocate -l 3G $MNT/3G.file # Try to fallocate a 5GiB file. echo echo "Try to fallocate a 5GiB file..." fallocate -l 5G $MNT/5G.file # See we break the qgroup limit. echo sync btrfs qgroup show -r $MNT umount $MNT When running the test: $ ./qgroup-overflow.sh (...) Try to fallocate a 3GiB file... fallocate: fallocate failed: Disk quota exceeded Try to fallocate a 5GiB file... qgroupid         rfer         excl     max_rfer --------         ----         ----     -------- 0/5           5.00GiB      5.00GiB      2.00GiB Since we have no control of how bytes_changed is used, it's better to set it to u64. CC: stable@vger.kernel.org # 4.14+ Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Ethan Lien <ethanlien@synology.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-04-20jfs: prevent NULL deref in diFreeHaimin Zhang
[ Upstream commit a53046291020ec41e09181396c1e829287b48d47 ] Add validation check for JFS_IP(ipimap)->i_imap to prevent a NULL deref in diFree since diFree uses it without do any validations. When function jfs_mount calls diMount to initialize fileset inode allocation map, it can fail and JFS_IP(ipimap)->i_imap won't be initialized. Then it calls diFreeSpecial to close fileset inode allocation map inode and it will flow into jfs_evict_inode. Function jfs_evict_inode just validates JFS_SBI(inode->i_sb)->ipimap, then calls diFree. diFree use JFS_IP(ipimap)->i_imap directly, then it will cause a NULL deref. Reported-by: TCS Robot <tcs_robot@tencent.com> Signed-off-by: Haimin Zhang <tcs_kernel@tencent.com> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-04-20NFS: swap-out must always use STABLE writes.NeilBrown
[ Upstream commit c265de257f558a05c1859ee9e3fed04883b9ec0e ] The commit handling code is not safe against memory-pressure deadlocks when writing to swap. In particular, nfs_commitdata_alloc() blocks indefinitely waiting for memory, and this can consume all available workqueue threads. swap-out most likely uses STABLE writes anyway as COND_STABLE indicates that a stable write should be used if the write fits in a single request, and it normally does. However if we ever swap with a small wsize, or gather unusually large numbers of pages for a single write, this might change. For safety, make it explicit in the code that direct writes used for swap must always use FLUSH_STABLE. Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-04-20NFS: swap IO handling is slightly different for O_DIRECT IONeilBrown
[ Upstream commit 64158668ac8b31626a8ce48db4cad08496eb8340 ] 1/ Taking the i_rwsem for swap IO triggers lockdep warnings regarding possible deadlocks with "fs_reclaim". These deadlocks could, I believe, eventuate if a buffered read on the swapfile was attempted. We don't need coherence with the page cache for a swap file, and buffered writes are forbidden anyway. There is no other need for i_rwsem during direct IO. So never take it for swap_rw() 2/ generic_write_checks() explicitly forbids writes to swap, and performs checks that are not needed for swap. So bypass it for swap_rw(). Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-04-20ubifs: Rectify space amount budget for mkdir/tmpfile operationsZhihao Cheng
[ Upstream commit a6dab6607d4681d227905d5198710b575dbdb519 ] UBIFS should make sure the flash has enough space to store dirty (Data that is newer than disk) data (in memory), space budget is exactly designed to do that. If space budget calculates less data than we need, 'make_reservation()' will do more work(return -ENOSPC if no free space lelf, sometimes we can see "cannot reserve xxx bytes in jhead xxx, error -28" in ubifs error messages) with ubifs inodes locked, which may effect other syscalls. A simple way to decide how much space do we need when make a budget: See how much space is needed by 'make_reservation()' in ubifs_jnl_xxx() function according to corresponding operation. It's better to report ENOSPC in ubifs_budget_space(), as early as we can. Fixes: 474b93704f32163 ("ubifs: Implement O_TMPFILE") Fixes: 1e51764a3c2ac05 ("UBIFS: add new flash file system") Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com> Signed-off-by: Richard Weinberger <richard@nod.at> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-04-20gfs2: Make sure FITRIM minlen is rounded up to fs block sizeAndrew Price
commit 27ca8273fda398638ca994a207323a85b6d81190 upstream. Per fstrim(8) we must round up the minlen argument to the fs block size. The current calculation doesn't take into account devices that have a discard granularity and requested minlen less than 1 fs block, so the value can get shifted away to zero in the translation to fs blocks. The zero minlen passed to gfs2_rgrp_send_discards() then allows sb_issue_discard() to be called with nr_sects == 0 which returns -EINVAL and results in gfs2_rgrp_send_discards() returning -EIO. Make sure minlen is never < 1 fs block by taking the max of the requested minlen and the fs block size before comparing to the device's discard granularity and shifting to fs blocks. Fixes: 076f0faa764ab ("GFS2: Fix FITRIM argument handling") Signed-off-by: Andrew Price <anprice@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-04-20ubifs: rename_whiteout: correct old_dir size computingBaokun Li
commit 705757274599e2e064dd3054aabc74e8af31a095 upstream. When renaming the whiteout file, the old whiteout file is not deleted. Therefore, we add the old dentry size to the old dir like XFS. Otherwise, an error may be reported due to `fscki->calc_sz != fscki->size` in check_indes. Fixes: 9e0a1fff8db56ea ("ubifs: Implement RENAME_WHITEOUT") Reported-by: Zhihao Cheng <chengzhihao1@huawei.com> Signed-off-by: Baokun Li <libaokun1@huawei.com> Signed-off-by: Richard Weinberger <richard@nod.at> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-04-20ubifs: setflags: Make dirtied_ino_d 8 bytes alignedZhihao Cheng
commit 1b83ec057db16b4d0697dc21ef7a9743b6041f72 upstream. Make 'ui->data_len' aligned with 8 bytes before it is assigned to dirtied_ino_d. Since 8871d84c8f8b0c6b("ubifs: convert to fileattr") applied, 'setflags()' only affects regular files and directories, only xattr inode, symlink inode and special inode(pipe/char_dev/block_dev) have none- zero 'ui->data_len' field, so assertion '!(req->dirtied_ino_d & 7)' cannot fail in ubifs_budget_space(). To avoid assertion fails in future evolution(eg. setflags can operate special inodes), it's better to make dirtied_ino_d 8 bytes aligned, after all aligned size is still zero for regular files. Fixes: 1e51764a3c2ac05a ("UBIFS: add new flash file system") Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com> Signed-off-by: Richard Weinberger <richard@nod.at> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-04-20ubifs: Add missing iput if do_tmpfile() failed in rename whiteoutZhihao Cheng
commit 716b4573026bcbfa7b58ed19fe15554bac66b082 upstream. whiteout inode should be put when do_tmpfile() failed if inode has been initialized. Otherwise we will get following warning during umount: UBIFS error (ubi0:0 pid 1494): ubifs_assert_failed [ubifs]: UBIFS assert failed: c->bi.dd_growth == 0, in fs/ubifs/super.c:1930 VFS: Busy inodes after unmount of ubifs. Self-destruct in 5 seconds. Fixes: 9e0a1fff8db56ea ("ubifs: Implement RENAME_WHITEOUT") Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com> Suggested-by: Sascha Hauer <s.hauer@pengutronix.de> Signed-off-by: Richard Weinberger <richard@nod.at> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-04-20ubifs: rename_whiteout: Fix double free for whiteout_ui->dataZhihao Cheng
commit 40a8f0d5e7b3999f096570edab71c345da812e3e upstream. 'whiteout_ui->data' will be freed twice if space budget fail for rename whiteout operation as following process: rename_whiteout dev = kmalloc whiteout_ui->data = dev kfree(whiteout_ui->data) // Free first time iput(whiteout) ubifs_free_inode kfree(ui->data) // Double free! KASAN reports: ================================================================== BUG: KASAN: double-free or invalid-free in ubifs_free_inode+0x4f/0x70 Call Trace: kfree+0x117/0x490 ubifs_free_inode+0x4f/0x70 [ubifs] i_callback+0x30/0x60 rcu_do_batch+0x366/0xac0 __do_softirq+0x133/0x57f Allocated by task 1506: kmem_cache_alloc_trace+0x3c2/0x7a0 do_rename+0x9b7/0x1150 [ubifs] ubifs_rename+0x106/0x1f0 [ubifs] do_syscall_64+0x35/0x80 Freed by task 1506: kfree+0x117/0x490 do_rename.cold+0x53/0x8a [ubifs] ubifs_rename+0x106/0x1f0 [ubifs] do_syscall_64+0x35/0x80 The buggy address belongs to the object at ffff88810238bed8 which belongs to the cache kmalloc-8 of size 8 ================================================================== Let ubifs_free_inode() free 'whiteout_ui->data'. BTW, delete unused assignment 'whiteout_ui->data_len = 0', process 'ubifs_evict_inode() -> ubifs_jnl_delete_inode() -> ubifs_jnl_write_inode()' doesn't need it (because 'inc_nlink(whiteout)' won't be excuted by 'goto out_release', and the nlink of whiteout inode is 0). Fixes: 9e0a1fff8db56ea ("ubifs: Implement RENAME_WHITEOUT") Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com> Signed-off-by: Richard Weinberger <richard@nod.at> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-04-20ntfs: add sanity check on allocation sizeDongliang Mu
[ Upstream commit 714fbf2647b1a33d914edd695d4da92029c7e7c0 ] ntfs_read_inode_mount invokes ntfs_malloc_nofs with zero allocation size. It triggers one BUG in the __ntfs_malloc function. Fix this by adding sanity check on ni->attr_list_size. Link: https://lkml.kernel.org/r/20220120094914.47736-1-dzm91@hust.edu.cn Reported-by: syzbot+3c765c5248797356edaa@syzkaller.appspotmail.com Signed-off-by: Dongliang Mu <mudongliangabcd@gmail.com> Acked-by: Anton Altaparmakov <anton@tuxera.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-04-20ext4: don't BUG if someone dirty pages without asking ext4 firstTheodore Ts'o
[ Upstream commit cc5095747edfb054ca2068d01af20be3fcc3634f ] [un]pin_user_pages_remote is dirtying pages without properly warning the file system in advance. A related race was noted by Jan Kara in 2018[1]; however, more recently instead of it being a very hard-to-hit race, it could be reliably triggered by process_vm_writev(2) which was discovered by Syzbot[2]. This is technically a bug in mm/gup.c, but arguably ext4 is fragile in that if some other kernel subsystem dirty pages without properly notifying the file system using page_mkwrite(), ext4 will BUG, while other file systems will not BUG (although data will still be lost). So instead of crashing with a BUG, issue a warning (since there may be potential data loss) and just mark the page as clean to avoid unprivileged denial of service attacks until the problem can be properly fixed. More discussion and background can be found in the thread starting at [2]. [1] https://lore.kernel.org/linux-mm/20180103100430.GE4911@quack2.suse.cz [2] https://lore.kernel.org/r/Yg0m6IjcNmfaSokM@google.com Reported-by: syzbot+d59332e2db681cf18f0318a06e994ebbb529a8db@syzkaller.appspotmail.com Reported-by: Lee Jones <lee.jones@linaro.org> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Link: https://lore.kernel.org/r/YiDS9wVfq4mM2jGK@mit.edu Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-04-20NFSv4/pNFS: Fix another issue with a list iterator pointing to the headTrond Myklebust
[ Upstream commit 7c9d845f0612e5bcd23456a2ec43be8ac43458f1 ] In nfs4_callback_devicenotify(), if we don't find a matching entry for the deviceid, we're left with a pointer to 'struct nfs_server' that actually points to the list of super blocks associated with our struct nfs_client. Furthermore, even if we have a valid pointer, nothing pins the super block, and so the struct nfs_server could end up getting freed while we're using it. Since all we want is a pointer to the struct pnfs_layoutdriver_type, let's skip all the iteration over super blocks, and just use APIs to find the layout driver directly. Reported-by: Xiaomeng Tong <xiam0nd.tong@gmail.com> Fixes: 1be5683b03a7 ("pnfs: CB_NOTIFY_DEVICEID") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-04-20jfs: fix divide error in dbNextAGPavel Skripkin
[ Upstream commit 2cc7cc01c15f57d056318c33705647f87dcd4aab ] Syzbot reported divide error in dbNextAG(). The problem was in missing validation check for malicious image. Syzbot crafted an image with bmp->db_numag equal to 0. There wasn't any validation checks, but dbNextAG() blindly use bmp->db_numag in divide expression Fix it by validating bmp->db_numag in dbMount() and return an error if image is malicious Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Reported-and-tested-by: syzbot+46f5c25af73eb8330eb6@syzkaller.appspotmail.com Signed-off-by: Pavel Skripkin <paskripkin@gmail.com> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-04-20NFS: remove unneeded check in decode_devicenotify_args()Alexey Khoroshilov
[ Upstream commit cb8fac6d2727f79f211e745b16c9abbf4d8be652 ] [You don't often get email from khoroshilov@ispras.ru. Learn why this is important at http://aka.ms/LearnAboutSenderIdentification.] Overflow check in not needed anymore after we switch to kmalloc_array(). Signed-off-by: Alexey Khoroshilov <khoroshilov@ispras.ru> Fixes: a4f743a6bb20 ("NFSv4.1: Convert open-coded array allocation calls to kmalloc_array()") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-04-20ext2: correct max file size computingZhang Yi
[ Upstream commit 50b3a818991074177a56c87124c7a7bdf5fa4f67 ] We need to calculate the max file size accurately if the total blocks that can address by block tree exceed the upper_limit. But this check is not correct now, it only compute the total data blocks but missing metadata blocks are needed. So in the case of "data blocks < upper_limit && total blocks > upper_limit", we will get wrong result. Fortunately, this case could not happen in reality, but it's confused and better to correct the computing. bits data blocks metadatablocks upper_limit 10 16843020 66051 2147483647 11 134480396 263171 1073741823 12 1074791436 1050627 536870911 (*) 13 8594130956 4198403 268435455 (*) 14 68736258060 16785411 134217727 (*) 15 549822930956 67125251 67108863 (*) 16 4398314962956 268468227 33554431 (*) [*] Need to calculate in depth. Fixes: 1c2d14212b15 ("ext2: Fix underflow in ext2_max_size()") Link: https://lore.kernel.org/r/20220212050532.179055-1-yi.zhang@huawei.com Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-04-20jffs2: fix memory leak in jffs2_scan_mediumBaokun Li
commit 9cdd3128874f5fe759e2c4e1360ab7fb96a8d1df upstream. If an error is returned in jffs2_scan_eraseblock() and some memory has been added to the jffs2_summary *s, we can observe the following kmemleak report: -------------------------------------------- unreferenced object 0xffff88812b889c40 (size 64): comm "mount", pid 692, jiffies 4294838325 (age 34.288s) hex dump (first 32 bytes): 40 48 b5 14 81 88 ff ff 01 e0 31 00 00 00 50 00 @H........1...P. 00 00 01 00 00 00 01 00 00 00 02 00 00 00 09 08 ................ backtrace: [<ffffffffae93a3a3>] __kmalloc+0x613/0x910 [<ffffffffaf423b9c>] jffs2_sum_add_dirent_mem+0x5c/0xa0 [<ffffffffb0f3afa8>] jffs2_scan_medium.cold+0x36e5/0x4794 [<ffffffffb0f3dbe1>] jffs2_do_mount_fs.cold+0xa7/0x2267 [<ffffffffaf40acf3>] jffs2_do_fill_super+0x383/0xc30 [<ffffffffaf40c00a>] jffs2_fill_super+0x2ea/0x4c0 [<ffffffffb0315d64>] mtd_get_sb+0x254/0x400 [<ffffffffb0315f5f>] mtd_get_sb_by_nr+0x4f/0xd0 [<ffffffffb0316478>] get_tree_mtd+0x498/0x840 [<ffffffffaf40bd15>] jffs2_get_tree+0x25/0x30 [<ffffffffae9f358d>] vfs_get_tree+0x8d/0x2e0 [<ffffffffaea7a98f>] path_mount+0x50f/0x1e50 [<ffffffffaea7c3d7>] do_mount+0x107/0x130 [<ffffffffaea7c5c5>] __se_sys_mount+0x1c5/0x2f0 [<ffffffffaea7c917>] __x64_sys_mount+0xc7/0x160 [<ffffffffb10142f5>] do_syscall_64+0x45/0x70 unreferenced object 0xffff888114b54840 (size 32): comm "mount", pid 692, jiffies 4294838325 (age 34.288s) hex dump (first 32 bytes): c0 75 b5 14 81 88 ff ff 02 e0 02 00 00 00 02 00 .u.............. 00 00 84 00 00 00 44 00 00 00 6b 6b 6b 6b 6b a5 ......D...kkkkk. backtrace: [<ffffffffae93be24>] kmem_cache_alloc_trace+0x584/0x880 [<ffffffffaf423b04>] jffs2_sum_add_inode_mem+0x54/0x90 [<ffffffffb0f3bd44>] jffs2_scan_medium.cold+0x4481/0x4794 [...] unreferenced object 0xffff888114b57280 (size 32): comm "mount", pid 692, jiffies 4294838393 (age 34.357s) hex dump (first 32 bytes): 10 d5 6c 11 81 88 ff ff 08 e0 05 00 00 00 01 00 ..l............. 00 00 38 02 00 00 28 00 00 00 6b 6b 6b 6b 6b a5 ..8...(...kkkkk. backtrace: [<ffffffffae93be24>] kmem_cache_alloc_trace+0x584/0x880 [<ffffffffaf423c34>] jffs2_sum_add_xattr_mem+0x54/0x90 [<ffffffffb0f3a24f>] jffs2_scan_medium.cold+0x298c/0x4794 [...] unreferenced object 0xffff8881116cd510 (size 16): comm "mount", pid 692, jiffies 4294838395 (age 34.355s) hex dump (first 16 bytes): 00 00 00 00 00 00 00 00 09 e0 60 02 00 00 6b a5 ..........`...k. backtrace: [<ffffffffae93be24>] kmem_cache_alloc_trace+0x584/0x880 [<ffffffffaf423cc4>] jffs2_sum_add_xref_mem+0x54/0x90 [<ffffffffb0f3b2e3>] jffs2_scan_medium.cold+0x3a20/0x4794 [...] -------------------------------------------- Therefore, we should call jffs2_sum_reset_collected(s) on exit to release the memory added in s. In addition, a new tag "out_buf" is added to prevent the NULL pointer reference caused by s being NULL. (thanks to Zhang Yi for this analysis) Fixes: e631ddba5887 ("[JFFS2] Add erase block summary support (mount time improvement)") Cc: stable@vger.kernel.org Co-developed-with: Zhihao Cheng <chengzhihao1@huawei.com> Signed-off-by: Baokun Li <libaokun1@huawei.com> Signed-off-by: Richard Weinberger <richard@nod.at> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-04-20jffs2: fix memory leak in jffs2_do_mount_fsBaokun Li
commit d051cef784de4d54835f6b6836d98a8f6935772c upstream. If jffs2_build_filesystem() in jffs2_do_mount_fs() returns an error, we can observe the following kmemleak report: -------------------------------------------- unreferenced object 0xffff88811b25a640 (size 64): comm "mount", pid 691, jiffies 4294957728 (age 71.952s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<ffffffffa493be24>] kmem_cache_alloc_trace+0x584/0x880 [<ffffffffa5423a06>] jffs2_sum_init+0x86/0x130 [<ffffffffa5400e58>] jffs2_do_mount_fs+0x798/0xac0 [<ffffffffa540acf3>] jffs2_do_fill_super+0x383/0xc30 [<ffffffffa540c00a>] jffs2_fill_super+0x2ea/0x4c0 [...] unreferenced object 0xffff88812c760000 (size 65536): comm "mount", pid 691, jiffies 4294957728 (age 71.952s) hex dump (first 32 bytes): bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb ................ bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb ................ backtrace: [<ffffffffa493a449>] __kmalloc+0x6b9/0x910 [<ffffffffa5423a57>] jffs2_sum_init+0xd7/0x130 [<ffffffffa5400e58>] jffs2_do_mount_fs+0x798/0xac0 [<ffffffffa540acf3>] jffs2_do_fill_super+0x383/0xc30 [<ffffffffa540c00a>] jffs2_fill_super+0x2ea/0x4c0 [...] -------------------------------------------- This is because the resources allocated in jffs2_sum_init() are not released. Call jffs2_sum_exit() to release these resources to solve the problem. Fixes: e631ddba5887 ("[JFFS2] Add erase block summary support (mount time improvement)") Cc: stable@vger.kernel.org Signed-off-by: Baokun Li <libaokun1@huawei.com> Signed-off-by: Richard Weinberger <richard@nod.at> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-04-20jffs2: fix use-after-free in jffs2_clear_xattr_subsystemBaokun Li
commit 4c7c44ee1650677fbe89d86edbad9497b7679b5c upstream. When we mount a jffs2 image, assume that the first few blocks of the image are normal and contain at least one xattr-related inode, but the next block is abnormal. As a result, an error is returned in jffs2_scan_eraseblock(). jffs2_clear_xattr_subsystem() is then called in jffs2_build_filesystem() and then again in jffs2_do_fill_super(). Finally we can observe the following report: ================================================================== BUG: KASAN: use-after-free in jffs2_clear_xattr_subsystem+0x95/0x6ac Read of size 8 at addr ffff8881243384e0 by task mount/719 Call Trace: dump_stack+0x115/0x16b jffs2_clear_xattr_subsystem+0x95/0x6ac jffs2_do_fill_super+0x84f/0xc30 jffs2_fill_super+0x2ea/0x4c0 mtd_get_sb+0x254/0x400 mtd_get_sb_by_nr+0x4f/0xd0 get_tree_mtd+0x498/0x840 jffs2_get_tree+0x25/0x30 vfs_get_tree+0x8d/0x2e0 path_mount+0x50f/0x1e50 do_mount+0x107/0x130 __se_sys_mount+0x1c5/0x2f0 __x64_sys_mount+0xc7/0x160 do_syscall_64+0x45/0x70 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Allocated by task 719: kasan_save_stack+0x23/0x60 __kasan_kmalloc.constprop.0+0x10b/0x120 kasan_slab_alloc+0x12/0x20 kmem_cache_alloc+0x1c0/0x870 jffs2_alloc_xattr_ref+0x2f/0xa0 jffs2_scan_medium.cold+0x3713/0x4794 jffs2_do_mount_fs.cold+0xa7/0x2253 jffs2_do_fill_super+0x383/0xc30 jffs2_fill_super+0x2ea/0x4c0 [...] Freed by task 719: kmem_cache_free+0xcc/0x7b0 jffs2_free_xattr_ref+0x78/0x98 jffs2_clear_xattr_subsystem+0xa1/0x6ac jffs2_do_mount_fs.cold+0x5e6/0x2253 jffs2_do_fill_super+0x383/0xc30 jffs2_fill_super+0x2ea/0x4c0 [...] The buggy address belongs to the object at ffff8881243384b8 which belongs to the cache jffs2_xattr_ref of size 48 The buggy address is located 40 bytes inside of 48-byte region [ffff8881243384b8, ffff8881243384e8) [...] ================================================================== The triggering of the BUG is shown in the following stack: ----------------------------------------------------------- jffs2_fill_super jffs2_do_fill_super jffs2_do_mount_fs jffs2_build_filesystem jffs2_scan_medium jffs2_scan_eraseblock <--- ERROR jffs2_clear_xattr_subsystem <--- free jffs2_clear_xattr_subsystem <--- free again ----------------------------------------------------------- An error is returned in jffs2_do_mount_fs(). If the error is returned by jffs2_sum_init(), the jffs2_clear_xattr_subsystem() does not need to be executed. If the error is returned by jffs2_build_filesystem(), the jffs2_clear_xattr_subsystem() also does not need to be executed again. So move jffs2_clear_xattr_subsystem() from 'out_inohash' to 'out_root' to fix this UAF problem. Fixes: aa98d7cf59b5 ("[JFFS2][XATTR] XATTR support on JFFS2 (version. 5)") Cc: stable@vger.kernel.org Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Baokun Li <libaokun1@huawei.com> Signed-off-by: Richard Weinberger <richard@nod.at> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-04-20NFSD: prevent underflow in nfssvc_decode_writeargs()Dan Carpenter
commit 184416d4b98509fb4c3d8fc3d6dc1437896cc159 upstream. Smatch complains: fs/nfsd/nfsxdr.c:341 nfssvc_decode_writeargs() warn: no lower bound on 'args->len' Change the type to unsigned to prevent this issue. Cc: stable@vger.kernel.org Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-04-20fuse: fix pipe buffer lifetime for direct_ioMiklos Szeredi
commit 0c4bcfdecb1ac0967619ee7ff44871d93c08c909 upstream. In FOPEN_DIRECT_IO mode, fuse_file_write_iter() calls fuse_direct_write_iter(), which normally calls fuse_direct_io(), which then imports the write buffer with fuse_get_user_pages(), which uses iov_iter_get_pages() to grab references to userspace pages instead of actually copying memory. On the filesystem device side, these pages can then either be read to userspace (via fuse_dev_read()), or splice()d over into a pipe using fuse_dev_splice_read() as pipe buffers with &nosteal_pipe_buf_ops. This is wrong because after fuse_dev_do_read() unlocks the FUSE request, the userspace filesystem can mark the request as completed, causing write() to return. At that point, the userspace filesystem should no longer have access to the pipe buffer. Fix by copying pages coming from the user address space to new pipe buffers. Reported-by: Jann Horn <jannh@google.com> Fixes: c3021629a0d8 ("fuse: support splice() reading from fuse device") Cc: <stable@vger.kernel.org> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Zach O'Keefe <zokeefe@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-04-08ANDROID: incremental-fs: limit mount stack depthTadeusz Struk
Syzbot recently found a number of issues related to incremental-fs (see bug numbers below). All have to do with the fact that incr-fs allows mounts of the same source and target multiple times. This is a design decision and the user space component "Data Loader" expects this to work for app re-install use case. The mounting depth needs to be controlled, however, and only allowed to be two levels deep. In case of more than two mount attempts the driver needs to return an error. In case of the issues listed below the common pattern is that the reproducer calls: mount("./file0", "./file0", "incremental-fs", 0, NULL) many times and then invokes a file operation like chmod, setxattr, or open on the ./file0. This causes a recursive call for all the mounted instances, which eventually causes a stack overflow and a kernel crash: BUG: stack guard page was hit at ffffc90000c0fff8 kernel stack overflow (double-fault): 0000 [#1] PREEMPT SMP KASAN This change also cleans up the mount error path to properly clean allocated resources and call deactivate_locked_super(), which causes the incfs_kill_sb() to be called, where the sb is freed. Bug: 211066171 Bug: 213140206 Bug: 213215835 Bug: 211914587 Bug: 211213635 Bug: 213137376 Bug: 211161296 Signed-off-by: Tadeusz Struk <tadeusz.struk@linaro.org> Change-Id: I08d9b545a2715423296bf4beb67bdbbed78d1be1
2022-03-23Merge 4.14.273 into android-4.14-stableGreg Kroah-Hartman
Changes in 4.14.273 sctp: fix the processing for INIT chunk sctp: fix the processing for INIT_ACK chunk xfrm: Fix xfrm migrate issues when address family changes arm64: dts: rockchip: fix rk3399-puma eMMC HS400 signal integrity ARM: dts: rockchip: fix a typo on rk3288 crypto-controller MIPS: smp: fill in sibling and core maps earlier ARM: 9178/1: fix unmet dependency on BITREVERSE for HAVE_ARCH_BITREVERSE can: rcar_canfd: rcar_canfd_channel_probe(): register the CAN device when fully ready atm: firestream: check the return value of ioremap() in fs_init() nl80211: Update bss channel on channel switch for P2P_CLIENT tcp: make tcp_read_sock() more robust sfc: extend the locking on mcdi->seqno kselftest/vm: fix tests build with old libc fs: sysfs_emit: Remove PAGE_SIZE alignment check efi: fix return value of __setup handlers net/packet: fix slab-out-of-bounds access in packet_recvmsg() atm: eni: Add check for dma_map_single net: handle ARPHRD_PIMREG in dev_is_mac_header_xmit() usb: gadget: rndis: prevent integer overflow in rndis_set_response() usb: gadget: Fix use-after-free bug by not setting udc->dev.driver Input: aiptek - properly check endpoint type perf symbols: Fix symbol size calculation condition Linux 4.14.273 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: I98286ffad5f40e4f41f7001ec0ba5c6ccaf2532b
2022-03-23fs: sysfs_emit: Remove PAGE_SIZE alignment checkLucas Wei
For kernel releases older than 4.20, using the SLUB alloctor will cause this alignment check to fail as that allocator did NOT align kmalloc allocations on a PAGE_SIZE boundry. Remove the check for these older kernels as it is a false-positive and causes problems on many devices. Signed-off-by: Lucas Wei <lucaswei@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-03-16Merge 4.14.272 into android-4.14-stableGreg Kroah-Hartman
Changes in 4.14.272 net: qlogic: check the return value of dma_alloc_coherent() in qed_vf_hw_prepare() qed: return status of qed_iov_get_link ethernet: Fix error handling in xemaclite_of_probe net: ethernet: ti: cpts: Handle error for clk_enable net: ethernet: lpc_eth: Handle error for clk_enable ax25: Fix NULL pointer dereference in ax25_kill_by_device net/mlx5: Fix size field in bufferx_reg struct NFC: port100: fix use-after-free in port100_send_complete gpio: ts4900: Do not set DAT and OE together sctp: fix kernel-infoleak for SCTP sockets net-sysfs: add check for netdevice being present to speed_show Revert "xen-netback: remove 'hotplug-status' once it has served its purpose" Revert "xen-netback: Check for hotplug-status existence before watching" tracing: Ensure trace buffer is at least 4096 bytes large selftests/memfd: clean up mapping in mfd_fail_write ARM: Spectre-BHB: provide empty stub for non-config staging: gdm724x: fix use after free in gdm_lte_rx() virtio: unexport virtio_finalize_features virtio: acknowledge all features before access ARM: fix Thumb2 regression with Spectre BHB ext4: add check to prevent attempting to resize an fs with sparse_super2 btrfs: unlock newly allocated extent buffer after error Linux 4.14.272 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: Ibf801ecb8549f8aee1b0acd8705fc75dcd5442ed
2022-03-16btrfs: unlock newly allocated extent buffer after errorQu Wenruo
commit 19ea40dddf1833db868533958ca066f368862211 upstream. [BUG] There is a bug report that injected ENOMEM error could leave a tree block locked while we return to user-space: BTRFS info (device loop0): enabling ssd optimizations FAULT_INJECTION: forcing a failure. name failslab, interval 1, probability 0, space 0, times 0 CPU: 0 PID: 7579 Comm: syz-executor Not tainted 5.15.0-rc1 #16 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.org 04/01/2014 Call Trace: __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0x8d/0xcf lib/dump_stack.c:106 fail_dump lib/fault-inject.c:52 [inline] should_fail+0x13c/0x160 lib/fault-inject.c:146 should_failslab+0x5/0x10 mm/slab_common.c:1328 slab_pre_alloc_hook.constprop.99+0x4e/0xc0 mm/slab.h:494 slab_alloc_node mm/slub.c:3120 [inline] slab_alloc mm/slub.c:3214 [inline] kmem_cache_alloc+0x44/0x280 mm/slub.c:3219 btrfs_alloc_delayed_extent_op fs/btrfs/delayed-ref.h:299 [inline] btrfs_alloc_tree_block+0x38c/0x670 fs/btrfs/extent-tree.c:4833 __btrfs_cow_block+0x16f/0x7d0 fs/btrfs/ctree.c:415 btrfs_cow_block+0x12a/0x300 fs/btrfs/ctree.c:570 btrfs_search_slot+0x6b0/0xee0 fs/btrfs/ctree.c:1768 btrfs_insert_empty_items+0x80/0xf0 fs/btrfs/ctree.c:3905 btrfs_new_inode+0x311/0xa60 fs/btrfs/inode.c:6530 btrfs_create+0x12b/0x270 fs/btrfs/inode.c:6783 lookup_open+0x660/0x780 fs/namei.c:3282 open_last_lookups fs/namei.c:3352 [inline] path_openat+0x465/0xe20 fs/namei.c:3557 do_filp_open+0xe3/0x170 fs/namei.c:3588 do_sys_openat2+0x357/0x4a0 fs/open.c:1200 do_sys_open+0x87/0xd0 fs/open.c:1216 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x34/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0xae RIP: 0033:0x46ae99 Code: f7 d8 64 89 02 b8 ff ff ff ff c3 66 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 bc ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007f46711b9c48 EFLAGS: 00000246 ORIG_RAX: 0000000000000055 RAX: ffffffffffffffda RBX: 000000000078c0a0 RCX: 000000000046ae99 RDX: 0000000000000000 RSI: 00000000000000a1 RDI: 0000000020005800 RBP: 00007f46711b9c80 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000017 R13: 0000000000000000 R14: 000000000078c0a0 R15: 00007ffc129da6e0 ================================================ WARNING: lock held when returning to user space! 5.15.0-rc1 #16 Not tainted ------------------------------------------------ syz-executor/7579 is leaving the kernel with locks still held! 1 lock held by syz-executor/7579: #0: ffff888104b73da8 (btrfs-tree-01/1){+.+.}-{3:3}, at: __btrfs_tree_lock+0x2e/0x1a0 fs/btrfs/locking.c:112 [CAUSE] In btrfs_alloc_tree_block(), after btrfs_init_new_buffer(), the new extent buffer @buf is locked, but if later operations like adding delayed tree ref fail, we just free @buf without unlocking it, resulting above warning. [FIX] Unlock @buf in out_free_buf: label. Reported-by: Hao Sun <sunhao.th@gmail.com> Link: https://lore.kernel.org/linux-btrfs/CACkBjsZ9O6Zr0KK1yGn=1rQi6Crh1yeCRdTSBxx9R99L4xdn-Q@mail.gmail.com/ CC: stable@vger.kernel.org # 5.4+ Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Denis Efremov <denis.e.efremov@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-03-16ext4: add check to prevent attempting to resize an fs with sparse_super2Josh Triplett
commit b1489186cc8391e0c1e342f9fbc3eedf6b944c61 upstream. The in-kernel ext4 resize code doesn't support filesystem with the sparse_super2 feature. It fails with errors like this and doesn't finish the resize: EXT4-fs (loop0): resizing filesystem from 16640 to 7864320 blocks EXT4-fs warning (device loop0): verify_reserved_gdb:760: reserved GDT 2 missing grp 1 (32770) EXT4-fs warning (device loop0): ext4_resize_fs:2111: error (-22) occurred during file system resize EXT4-fs (loop0): resized filesystem to 2097152 To reproduce: mkfs.ext4 -b 4096 -I 256 -J size=32 -E resize=$((256*1024*1024)) -O sparse_super2 ext4.img 65M truncate -s 30G ext4.img mount ext4.img /mnt python3 -c 'import fcntl, os, struct ; fd = os.open("/mnt", os.O_RDONLY | os.O_DIRECTORY) ; fcntl.ioctl(fd, 0x40086610, struct.pack("Q", 30 * 1024 * 1024 * 1024 // 4096), False) ; os.close(fd)' dmesg | tail e2fsck ext4.img The userspace resize2fs tool has a check for this case: it checks if the filesystem has sparse_super2 set and if the kernel provides /sys/fs/ext4/features/sparse_super2. However, the former check requires manually reading and parsing the filesystem superblock. Detect this case in ext4_resize_begin and error out early with a clear error message. Signed-off-by: Josh Triplett <josh@joshtriplett.org> Link: https://lore.kernel.org/r/74b8ae78405270211943cd7393e65586c5faeed1.1623093259.git.josh@joshtriplett.org Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-03-11Merge 4.14.270 into android-4.14-stableGreg Kroah-Hartman
Changes in 4.14.270 mac80211_hwsim: report NOACK frames in tx_status mac80211_hwsim: initialize ieee80211_tx_info at hw_scan_work i2c: bcm2835: Avoid clock stretching timeouts Input: clear BTN_RIGHT/MIDDLE on buttonpads cifs: fix double free race when mount fails in cifs_get_root() dmaengine: shdma: Fix runtime PM imbalance on error i2c: cadence: allow COMPILE_TEST i2c: qup: allow COMPILE_TEST net: usb: cdc_mbim: avoid altsetting toggling for Telit FN990 usb: gadget: don't release an existing dev->buf usb: gadget: clear related members when goto fail ata: pata_hpt37x: fix PCI clock detection ALSA: intel_hdmi: Fix reference to PCM buffer address ASoC: ops: Shift tested values in snd_soc_put_volsw() by +min xfrm: fix MTU regression netfilter: fix use-after-free in __nf_register_net_hook() xfrm: enforce validity of offload input flags netfilter: nf_queue: don't assume sk is full socket netfilter: nf_queue: fix possible use-after-free batman-adv: Request iflink once in batadv-on-batadv check batman-adv: Request iflink once in batadv_get_real_netdevice batman-adv: Don't expect inter-netns unique iflink indices net: dcb: flush lingering app table entries for unregistered devices net/smc: fix unexpected SMC_CLC_DECL_ERR_REGRMB error generated by client net/smc: fix unexpected SMC_CLC_DECL_ERR_REGRMB error cause by server firmware: Fix a reference count leak. firmware: qemu_fw_cfg: fix kobject leak in probe error path mac80211: fix forwarded mesh frames AC & queue selection net: stmmac: fix return value of __setup handler net: sxgbe: fix return value of __setup handler net: arcnet: com20020: Fix null-ptr-deref in com20020pci_probe() efivars: Respect "block" flag in efivar_entry_set_safe() can: gs_usb: change active_channels's type from atomic_t to u8 ARM: 9182/1: mmu: fix returns from early_param() and __setup() functions soc: fsl: qe: Check of ioremap return value net: chelsio: cxgb3: check the return value of pci_find_capability() nl80211: Handle nla_memdup failures in handle_nan_filter Input: elan_i2c - move regulator_[en|dis]able() out of elan_[en|dis]able_power() Input: elan_i2c - fix regulator enable count imbalance after suspend/resume HID: add mapping for KEY_ALL_APPLICATIONS memfd: fix F_SEAL_WRITE after shmem huge page allocated net: dcb: disable softirqs in dcbnl_flush_dev() hamradio: fix macro redefine warning Linux 4.14.270 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: I347904c6d8e62225c5d0b642cddec4baeca819f2
2022-03-08Revert "ANDROID: incremental-fs: fix mount_fs issue"Tadeusz Struk
This reverts commit 4ab5bac1598e3ed91a6267f6cada336467312112 and 2da7ed1f781983a2bc818f1d6f7cc9f2fa3ff1d2. This is to fix the incrementalinstall test. Can now install the same apk twice, and repeated installs are stable. Bug: 217661925 Bug: 219731048 Signed-off-by: Tadeusz Struk <tadeusz.struk@linaro.org> Change-Id: I77f038b296041b329060ff9350e4c9711f5ed91d
2022-03-08cifs: fix double free race when mount fails in cifs_get_root()Ronnie Sahlberg
[ Upstream commit 3d6cc9898efdfb062efb74dc18cfc700e082f5d5 ] When cifs_get_root() fails during cifs_smb3_do_mount() we call deactivate_locked_super() which eventually will call delayed_free() which will free the context. In this situation we should not proceed to enter the out: section in cifs_smb3_do_mount() and free the same resources a second time. [Thu Feb 10 12:59:06 2022] BUG: KASAN: use-after-free in rcu_cblist_dequeue+0x32/0x60 [Thu Feb 10 12:59:06 2022] Read of size 8 at addr ffff888364f4d110 by task swapper/1/0 [Thu Feb 10 12:59:06 2022] CPU: 1 PID: 0 Comm: swapper/1 Tainted: G OE 5.17.0-rc3+ #4 [Thu Feb 10 12:59:06 2022] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS Hyper-V UEFI Release v4.0 12/17/2019 [Thu Feb 10 12:59:06 2022] Call Trace: [Thu Feb 10 12:59:06 2022] <IRQ> [Thu Feb 10 12:59:06 2022] dump_stack_lvl+0x5d/0x78 [Thu Feb 10 12:59:06 2022] print_address_description.constprop.0+0x24/0x150 [Thu Feb 10 12:59:06 2022] ? rcu_cblist_dequeue+0x32/0x60 [Thu Feb 10 12:59:06 2022] kasan_report.cold+0x7d/0x117 [Thu Feb 10 12:59:06 2022] ? rcu_cblist_dequeue+0x32/0x60 [Thu Feb 10 12:59:06 2022] __asan_load8+0x86/0xa0 [Thu Feb 10 12:59:06 2022] rcu_cblist_dequeue+0x32/0x60 [Thu Feb 10 12:59:06 2022] rcu_core+0x547/0xca0 [Thu Feb 10 12:59:06 2022] ? call_rcu+0x3c0/0x3c0 [Thu Feb 10 12:59:06 2022] ? __this_cpu_preempt_check+0x13/0x20 [Thu Feb 10 12:59:06 2022] ? lock_is_held_type+0xea/0x140 [Thu Feb 10 12:59:06 2022] rcu_core_si+0xe/0x10 [Thu Feb 10 12:59:06 2022] __do_softirq+0x1d4/0x67b [Thu Feb 10 12:59:06 2022] __irq_exit_rcu+0x100/0x150 [Thu Feb 10 12:59:06 2022] irq_exit_rcu+0xe/0x30 [Thu Feb 10 12:59:06 2022] sysvec_hyperv_stimer0+0x9d/0xc0 ... [Thu Feb 10 12:59:07 2022] Freed by task 58179: [Thu Feb 10 12:59:07 2022] kasan_save_stack+0x26/0x50 [Thu Feb 10 12:59:07 2022] kasan_set_track+0x25/0x30 [Thu Feb 10 12:59:07 2022] kasan_set_free_info+0x24/0x40 [Thu Feb 10 12:59:07 2022] ____kasan_slab_free+0x137/0x170 [Thu Feb 10 12:59:07 2022] __kasan_slab_free+0x12/0x20 [Thu Feb 10 12:59:07 2022] slab_free_freelist_hook+0xb3/0x1d0 [Thu Feb 10 12:59:07 2022] kfree+0xcd/0x520 [Thu Feb 10 12:59:07 2022] cifs_smb3_do_mount+0x149/0xbe0 [cifs] [Thu Feb 10 12:59:07 2022] smb3_get_tree+0x1a0/0x2e0 [cifs] [Thu Feb 10 12:59:07 2022] vfs_get_tree+0x52/0x140 [Thu Feb 10 12:59:07 2022] path_mount+0x635/0x10c0 [Thu Feb 10 12:59:07 2022] __x64_sys_mount+0x1bf/0x210 [Thu Feb 10 12:59:07 2022] do_syscall_64+0x5c/0xc0 [Thu Feb 10 12:59:07 2022] entry_SYSCALL_64_after_hwframe+0x44/0xae [Thu Feb 10 12:59:07 2022] Last potentially related work creation: [Thu Feb 10 12:59:07 2022] kasan_save_stack+0x26/0x50 [Thu Feb 10 12:59:07 2022] __kasan_record_aux_stack+0xb6/0xc0 [Thu Feb 10 12:59:07 2022] kasan_record_aux_stack_noalloc+0xb/0x10 [Thu Feb 10 12:59:07 2022] call_rcu+0x76/0x3c0 [Thu Feb 10 12:59:07 2022] cifs_umount+0xce/0xe0 [cifs] [Thu Feb 10 12:59:07 2022] cifs_kill_sb+0xc8/0xe0 [cifs] [Thu Feb 10 12:59:07 2022] deactivate_locked_super+0x5d/0xd0 [Thu Feb 10 12:59:07 2022] cifs_smb3_do_mount+0xab9/0xbe0 [cifs] [Thu Feb 10 12:59:07 2022] smb3_get_tree+0x1a0/0x2e0 [cifs] [Thu Feb 10 12:59:07 2022] vfs_get_tree+0x52/0x140 [Thu Feb 10 12:59:07 2022] path_mount+0x635/0x10c0 [Thu Feb 10 12:59:07 2022] __x64_sys_mount+0x1bf/0x210 [Thu Feb 10 12:59:07 2022] do_syscall_64+0x5c/0xc0 [Thu Feb 10 12:59:07 2022] entry_SYSCALL_64_after_hwframe+0x44/0xae Reported-by: Shyam Prasad N <sprasad@microsoft.com> Reviewed-by: Shyam Prasad N <sprasad@microsoft.com> Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-03-02Merge 4.14.269 into android-4.14-stableGreg Kroah-Hartman
Changes in 4.14.269 cgroup/cpuset: Fix a race between cpuset_attach() and cpu hotplug vhost/vsock: don't check owner in vhost_vsock_stop() while releasing parisc/unaligned: Fix fldd and fstd unaligned handlers on 32-bit kernel parisc/unaligned: Fix ldw() and stw() unalignment handlers sr9700: sanity check for packet length USB: zaurus: support another broken Zaurus serial: 8250: of: Fix mapped region size when using reg-offset property ping: remove pr_err from ping_lookup net: __pskb_pull_tail() & pskb_carve_frag_list() drop_monitor friends gso: do not skip outer ip header in case of ipip and net_failover openvswitch: Fix setting ipv6 fields causing hw csum failure drm/edid: Always set RGB444 net/mlx5e: Fix wrong return value on ioctl EEPROM query failure configfs: fix a race in configfs_{,un}register_subsystem() RDMA/ib_srp: Fix a deadlock iio: adc: men_z188_adc: Fix a resource leak in an error handling path ata: pata_hpt37x: disable primary channel on HPT371 Revert "USB: serial: ch341: add new Product ID for CH341A" usb: gadget: rndis: add spinlock for rndis response list USB: gadget: validate endpoint index for xilinx udc tracefs: Set the group ownership in apply_options() not parse_options() USB: serial: option: add support for DW5829e USB: serial: option: add Telit LE910R1 compositions usb: dwc3: gadget: Let the interrupt handler disable bottom halves. xhci: re-initialize the HC during resume if HCE was set xhci: Prevent futile URB re-submissions due to incorrect return value. tty: n_gsm: fix encoding of control signal octet bit DV tty: n_gsm: fix proper link termination after failed open Revert "drm/nouveau/pmu/gm200-: avoid touching PMU outside of DEVINIT/PREOS/ACR" memblock: use kfree() to release kmalloced memblock regions fget: clarify and improve __fget_files() implementation Linux 4.14.269 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: I0c7a1a638cac0693161ad06dd369075a6dd42402
2022-03-02fget: clarify and improve __fget_files() implementationLinus Torvalds
commit e386dfc56f837da66d00a078e5314bc8382fab83 upstream. Commit 054aa8d439b9 ("fget: check that the fd still exists after getting a ref to it") fixed a race with getting a reference to a file just as it was being closed. It was a fairly minimal patch, and I didn't think re-checking the file pointer lookup would be a measurable overhead, since it was all right there and cached. But I was wrong, as pointed out by the kernel test robot. The 'poll2' case of the will-it-scale.per_thread_ops benchmark regressed quite noticeably. Admittedly it seems to be a very artificial test: doing "poll()" system calls on regular files in a very tight loop in multiple threads. That means that basically all the time is spent just looking up file descriptors without ever doing anything useful with them (not that doing 'poll()' on a regular file is useful to begin with). And as a result it shows the extra "re-check fd" cost as a sore thumb. Happily, the regression is fixable by just writing the code to loook up the fd to be better and clearer. There's still a cost to verify the file pointer, but now it's basically in the noise even for that benchmark that does nothing else - and the code is more understandable and has better comments too. [ Side note: this patch is also a classic case of one that looks very messy with the default greedy Myers diff - it's much more legible with either the patience of histogram diff algorithm ] Link: https://lore.kernel.org/lkml/20211210053743.GA36420@xsang-OptiPlex-9020/ Link: https://lore.kernel.org/lkml/20211213083154.GA20853@linux.intel.com/ Reported-by: kernel test robot <oliver.sang@intel.com> Tested-by: Carel Si <beibei.si@intel.com> Cc: Jann Horn <jannh@google.com> Cc: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Baokun Li <libaokun1@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>