-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAlwLs3sACgkQONu9yGCS
aT69Tw//a/lkIfcFzpWkUtWGCViTtNXJxFv1zrkjgU1ldaZa13gJYHapqT1GAXic
rxI7kvltiacSUmd+hjqlLFV8tXevaQgIZ2X7jkWGvwSX2TO53ANajuWbQ3yjw/uI
6kUymLwhTLbxyW7xEsQAAUXvyLDuZEvYT10yL5oeAonSR7q15H48EULXQbEDvsvs
Pqsbnta8yToTyjcZGETBuIwXlV3yA6TyZKb7GPvMwYvZoSwHEWcVHN6heBMTTogb
Q6TyVKxI9ffhc+Ygodw0Aon/JLpw/gVMuuwKv7MEXR/UdlIu/fhJdgdtYN1HpGMP
BYuCUiHnh8ji4qFylfcvTOBf1/1PUuxPJct4B1EYz86UxA/rFCJg6I/qvhPNSq2z
jaZWVWKAU0OY+kgDkzK33thBca786ZC0SkrynqVKt7D9eDvv5uvxdSLxbxpqqbEf
EOQyJcrxtKyW9HVEpw+lxUSBp+ZCz7L2RJ6L0wknikeOV65N657zZleyXRyUggLC
skTlS4MCXSqvjizEm6yM2+UpFqEa6AG5xg1kfrRS0IN6Q0a2hEMx7zwJmSdN1ABl
w9hHaUM1Bwh9o6Z6SSzZMgkW83EN9khejpJWt+/0sSkhBA8kfgsTZYt5wbeSqBSj
c4v2aRAs4FeuigI1ibGhhzVkrESBE97vcTUnooGF0SNgpgS03OE=
=lEsp
-----END PGP SIGNATURE-----
Merge 4.9.144 into android-4.9
Changes in 4.9.144
Kbuild: suppress packed-not-aligned warning for default setting only
disable stringop truncation warnings for now
test_hexdump: use memcpy instead of strncpy
kobject: Replace strncpy with memcpy
unifdef: use memcpy instead of strncpy
kernfs: Replace strncpy with memcpy
ip_tunnel: Fix name string concatenate in __ip_tunnel_create()
drm: gma500: fix logic error
scsi: bfa: convert to strlcpy/strlcat
staging: rts5208: fix gcc-8 logic error warning
kdb: use memmove instead of overlapping memcpy
x86/power/64: Use char arrays for asm function names
iser: set sector for ambiguous mr status errors
uprobes: Fix handle_swbp() vs. unregister() + register() race once more
MIPS: ralink: Fix mt7620 nd_sd pinmux
mips: fix mips_get_syscall_arg o32 check
IB/mlx5: Avoid load failure due to unknown link width
drm/ast: Fix incorrect free on ioregs
drm: set is_master to 0 upon drm_new_set_master() failure
scsi: scsi_devinfo: cleanly zero-pad devinfo strings
ALSA: trident: Suppress gcc string warning
scsi: csiostor: Avoid content leaks and casts
kgdboc: Fix restrict error
kgdboc: Fix warning with module build
binder: fix proc->files use-after-free
svm: Add mutex_lock to protect apic_access_page_done on AMD systems
drm/mediatek: fix OF sibling-node lookup
Input: xpad - quirk all PDP Xbox One gamepads
Input: matrix_keypad - check for errors from of_get_named_gpio()
Input: elan_i2c - add ELAN0620 to the ACPI table
Input: elan_i2c - add ACPI ID for Lenovo IdeaPad 330-15ARR
Input: elan_i2c - add support for ELAN0621 touchpad
btrfs: Always try all copies when reading extent buffers
Btrfs: fix use-after-free when dumping free space
ARC: change defconfig defaults to ARCv2
arc: [devboards] Add support of NFSv3 ACL
udf: Allow mounting volumes with incorrect identification strings
reset: make optional functions really optional
reset: core: fix reset_control_put
reset: fix optional reset_control_get stubs to return NULL
reset: add exported __reset_control_get, return NULL if optional
reset: make device_reset_optional() really optional
reset: remove remaining WARN_ON() in <linux/reset.h>
mm: cleancache: fix corruption on missed inode invalidation
usb: gadget: dummy: fix nonsensical comparisons
net: qed: use correct strncpy() size
tipc: use destination length for copy string
libceph: drop len argument of *verify_authorizer_reply()
libceph: no need to drop con->mutex for ->get_authorizer()
libceph: store ceph_auth_handshake pointer in ceph_connection
libceph: factor out __prepare_write_connect()
libceph: factor out __ceph_x_decrypt()
libceph: factor out encrypt_authorizer()
libceph: add authorizer challenge
libceph: implement CEPHX_V2 calculation mode
libceph: weaken sizeof check in ceph_x_verify_authorizer_reply()
libceph: check authorizer reply/challenge length before reading
bpf/verifier: Add spi variable to check_stack_write()
bpf/verifier: Pass instruction index to check_mem_access() and check_xadd()
bpf: Prevent memory disambiguation attack
wil6210: missing length check in wmi_set_ie
mm/hugetlb.c: don't call region_abort if region_chg fails
hugetlbfs: fix offset overflow in hugetlbfs mmap
hugetlbfs: check for pgoff value overflow
btrfs: validate type when reading a chunk
btrfs: Verify that every chunk has corresponding block group at mount time
btrfs: Refactor check_leaf function for later expansion
btrfs: Check if item pointer overlaps with the item itself
btrfs: Add sanity check for EXTENT_DATA when reading out leaf
btrfs: Add checker for EXTENT_CSUM
btrfs: Move leaf and node validation checker to tree-checker.c
btrfs: struct-funcs, constify readers
btrfs: tree-checker: Enhance btrfs_check_node output
btrfs: tree-checker: Fix false panic for sanity test
btrfs: tree-checker: Add checker for dir item
btrfs: tree-checker: use %zu format string for size_t
btrfs: tree-check: reduce stack consumption in check_dir_item
btrfs: tree-checker: Verify block_group_item
btrfs: tree-checker: Detect invalid and empty essential trees
btrfs: Check that each block group has corresponding chunk at mount time
btrfs: tree-checker: Check level for leaves and nodes
btrfs: tree-checker: Fix misleading group system information
f2fs: fix a panic caused by NULL flush_cmd_control
f2fs: fix race condition in between free nid allocator/initializer
f2fs: detect wrong layout
f2fs: return error during fill_super
f2fs: check blkaddr more accuratly before issue a bio
f2fs: sanity check on sit entry
f2fs: enhance sanity_check_raw_super() to avoid potential overflow
f2fs: clean up with is_valid_blkaddr()
f2fs: introduce and spread verify_blkaddr
f2fs: fix to do sanity check with secs_per_zone
f2fs: fix to do sanity check with user_block_count
f2fs: Add sanity_check_inode() function
f2fs: fix to do sanity check with node footer and iblocks
f2fs: fix to do sanity check with block address in main area
f2fs: fix missing up_read
f2fs: fix to do sanity check with block address in main area v2
f2fs: free meta pages if sanity check for ckpt is failed
f2fs: fix to do sanity check with cp_pack_start_sum
xfs: don't fail when converting shortform attr to long form during ATTR_REPLACE
hugetlbfs: fix bug in pgoff overflow checking
Linux 4.9.144
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
commit 1cbb1f454e5321e47fc1e6b233066c7ccc979d15 upstream.
We have reader helpers for most of the on-disk structures that use
an extent_buffer and pointer as offset into the buffer that are
read-only. We should mark them as const and, in turn, allow consumers
of these interfaces to mark the buffers const as well.
No impact on code, but serves as documentation that a buffer is intended
not to be modified.
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAlvm/IUACgkQONu9yGCS
aT6mrw//ctcqOR9aZYTODrVHFZ4puE2xhae5Hr+hwtcE2WSjHWuxJfVkrEuJGlIH
4oQpUfek+eYf3yZy8Iw9WLZH1+P3evGkR0G4gBD/A4f25qCKCcHEXOAPiKgeadnC
tj49fEkiJgO3I9vRx8yJnUvhxR/Br5CTOUMdTYsWHbCsdewzCMHWlwpJhLwV053j
P9cCrpfJLD55HDdj/jwcn2jfooIVfYsYkut8jP0qTKI04rWEZgOrCSjahN8KHtQ5
GgykDU7db8mmP1IhM+bhGuQReSX7myx/MGx5dS7Mli+5aUtYCMlkqylpL96NuBbe
axFpie4nBTny6dIHXodZx59J/T1ERBws9zLzKF1oyxANHEeTiO7q+hbaw9vRLN5G
mNWyn0KZ8T0+BWSL1pyA+oVwZkjOcMDil5Gz7Y7A9kE4xj5grrl5IevAtSD6tb9X
zwAk5hjvaBmZVVM9NgbG2bGATPNLnv1l57TCRjsx91p9uzReg8gYxNrijIwGqGip
HrR/HJvgfI9Df52X8JtGfs+397mXevxl1Lo56Pv1nkagkD1fvhqFLRZgd3y1MoIO
DNjdUohw0tBorHqdpvgnZnifuwk3AcPiCMqqfCcGwkcguoM8XFhedTkTPrut5+f4
IPK0Qh25lcT9k+GHJUvDOEzQvx4CGcG8uVj0FgiebWdlS3KZ56s=
=0M4P
-----END PGP SIGNATURE-----
Merge 4.9.136 into android-4.9
Also revert commit b91d532928df ("ipv6: set rt6i_protocol properly in
the route when it is installed") as it breaks the test systems.
Changes in 4.9.136
xfrm: Validate address prefix lengths in the xfrm selector.
xfrm6: call kfree_skb when skb is toobig
mac80211: Always report TX status
cfg80211: reg: Init wiphy_idx in regulatory_hint_core()
mac80211: fix pending queue hang due to TX_DROP
cfg80211: Address some corner cases in scan result channel updating
mac80211: TDLS: fix skb queue/priority assignment
ARM: 8799/1: mm: fix pci_ioremap_io() offset check
xfrm: validate template mode
ARM: dts: BCM63xx: Fix incorrect interrupt specifiers
net: macb: Clean 64b dma addresses if they are not detected
soc: fsl: qbman: qman: avoid allocating from non existing gen_pool
soc: fsl: qe: Fix copy/paste bug in ucc_get_tdm_sync_shift()
nl80211: Fix possible Spectre-v1 for NL80211_TXRATE_HT
mac80211_hwsim: do not omit multicast announce of first added radio
Bluetooth: SMP: fix crash in unpairing
pxa168fb: prepare the clock
qed: Avoid implicit enum conversion in qed_roce_mode_to_flavor
qed: Avoid constant logical operation warning in qed_vf_pf_acquire
asix: Check for supported Wake-on-LAN modes
ax88179_178a: Check for supported Wake-on-LAN modes
lan78xx: Check for supported Wake-on-LAN modes
sr9800: Check for supported Wake-on-LAN modes
r8152: Check for supported Wake-on-LAN Modes
smsc75xx: Check for Wake-on-LAN modes
smsc95xx: Check for Wake-on-LAN modes
perf/ring_buffer: Prevent concurent ring buffer access
perf/x86/intel/uncore: Fix PCI BDF address of M3UPI on SKX
net: fec: fix rare tx timeout
declance: Fix continuation with the adapter identification message
net: cxgb3_main: fix a missing-check bug
perf symbols: Fix memory corruption because of zero length symbols
mm/memory_hotplug.c: fix overflow in test_pages_in_a_zone()
MIPS: microMIPS: Fix decoding of swsp16 instruction
MIPS: Handle non word sized instructions when examining frame
scsi: aacraid: Fix typo in blink status
f2fs: fix multiple f2fs_add_link() having same name for inline dentry
igb: Remove superfluous reset to PHY and page 0 selection
ACPI: sysfs: Make ACPI GPE mask kernel parameter cover all GPEs
PCI: Disable MSI for HiSilicon Hip06/Hip07 only in Root Port mode
i2c: bcm2835: Avoid possible NULL ptr dereference
efi/fb: Correct PCI_STD_RESOURCE_END usage
ipv6: set rt6i_protocol properly in the route when it is installed
platform/x86: acer-wmi: setup accelerometer when ACPI device was found
IB/ipoib: Do not warn if IPoIB debugfs doesn't exist
IB/core: Fix the validations of a multicast LID in attach or detach operations
orangefs: off by ones in xattr size checks
rxe: Fix a sleep-in-atomic bug in post_one_send
nvme-pci: fix CMB sysfs file removal in reset path
net: phy: marvell: Limit 88m1101 autoneg errata to 88E1145 as well.
net/mlx5: Fix command completion after timeout access invalid structure
tipc: Fix tipc_sk_reinit handling of -EAGAIN
tipc: fix a race condition of releasing subscriber object
bnxt_en: Don't use rtnl lock to protect link change logic in workqueue.
ath10k: fix NAPI enable/disable symmetry for AHB interface
ARM: dts: bcm283x: Reserve first page for firmware
btrfs: fiemap: Cache and merge fiemap extent before submit it to user
ata: sata_rcar: Handle return value of clk_prepare_enable
reset: hi6220: Set module license so that it can be loaded
ASoC: Intel: Skylake: Fix to parse consecutive string tkns in manifest
arch/sparc: increase CONFIG_NODES_SHIFT on SPARC64 to 5
mac80211: fix TX aggregation start/stop callback race
libata: fix error checking in in ata_parse_force_one()
net: ethernet: stmmac: Fix altr_tse_pcs SGMII Initialization
qlcnic: Fix tunnel offload for 82xx adapters
x86/cpu/cyrix: Add alternative Device ID of Geode GX1 SoC
ARM: 8677/1: boot/compressed: fix decompressor header layout for v7-M
gpu: ipu-v3: Fix CSI selection for VDIC
elevator: fix truncation of icq_cache_name
net: stmmac: ensure jumbo_frm error return is correctly checked for -ve value
Btrfs: clear EXTENT_DEFRAG bits in finish_ordered_io
ufs: we need to sync inode before freeing it
net/mlx5e: Fix fixpoint divide exception in mlx5e_am_stats_compare
ip6_tunnel: Correct tos value in collect_md mode
net/mlx5: Fix driver load error flow when firmware is stuck
perf evsel: Fix probing of precise_ip level for default cycles event
perf probe: Fix probe definition for inlined functions
net/mlx5: Fix health work queue spin lock to IRQ safe
usb: renesas_usbhs: gadget: fix spin_lock_init() for &uep->lock
usb: renesas_usbhs: gadget: fix unused-but-set-variable warning
usb: dwc3: omap: remove IRQ_NOAUTOEN used with shared irq
clk: samsung: Fix m2m scaler clock on Exynos542x
ptr_ring: fix up after recent ptr_ring changes
staging: wilc1000: Fix problem with wrong vif index
rds: ib: Fix missing call to rds_ib_dev_put in rds_ib_setup_qp
iio: adc: Revert "axp288: Drop bogus AXP288_ADC_TS_PIN_CTRL register modifications"
qed: Warn PTT usage by wrong hw-function
ocfs2: fix deadlock caused by recursive locking in xattr
net: cdc_ncm: GetNtbFormat endian fix
sctp: use right member as the param of list_for_each_entry
ALSA: hda - No loopback on ALC299 codec
ath10k: convert warning about non-existent OTP board id to debug message
ipv6: fix cleanup ordering for ip6_mr failure
IB/ipoib: Fix lockdep issue found on ipoib_ib_dev_heavy_flush
IB/rxe: put the pool on allocation failure
nbd: only set MSG_MORE when we have more to send
mm/frame_vector.c: release a semaphore in 'get_vaddr_frames()'
IB/mlx5: Avoid passing an invalid QP type to firmware
scsi: qla2xxx: Avoid double completion of abort command
drm: bochs: Don't remove uninitialized fbdev framebuffer
i40e: avoid NVM acquire deadlock during NVM update
Revert "IB/ipoib: Update broadcast object if PKey value was changed in index 0"
Btrfs: incremental send, fix invalid memory access
drm/msm: Fix possible null dereference on failure of get_pages()
module: fix DEBUG_SET_MODULE_RONX typo
iio: pressure: zpa2326: Remove always-true check which confuses gcc
l2tp: remove configurable payload offset
macsec: fix memory leaks when skb_to_sgvec fails
perf/core: Fix locking for children siblings group read
cifs: Use ULL suffix for 64-bit constant
futex: futex_wake_op, do not fail on invalid op
ALSA: hda - Fix incorrect usage of IS_REACHABLE()
test_bpf: Fix testing with CONFIG_BPF_JIT_ALWAYS_ON=y on other arches
xen-netfront: Update features after registering netdev
sparc64: Fix regression in pmdp_invalidate().
xen-netfront: Fix mismatched rtnl_unlock
enic: do not overwrite error code
bonding: ratelimit failed speed/duplex update warning
nvmet: fix space padding in serial number
iio: buffer: fix the function signature to match implementation
x86/paravirt: Fix some warning messages
IB/mlx4: Fix an error handling path in 'mlx4_ib_rereg_user_mr()'
libertas: call into generic suspend code before turning off power
xhci: Fix USB3 NULL pointer dereference at logical disconnect.
perf tests: Fix indexing when invoking subtests
ARM: dts: imx53-qsb: disable 1.2GHz OPP
rxrpc: Don't check RXRPC_CALL_TX_LAST after calling rxrpc_rotate_tx_window()
rxrpc: Only take the rwind and mtu values from latest ACK
net: ena: fix NULL dereference due to untimely napi initialization
fs/fat/fatent.c: add cond_resched() to fat_count_free_clusters()
mtd: spi-nor: Add support for is25wp series chips
Revert "netfilter: ipv6: nf_defrag: drop skb dst before queueing"
perf tools: Disable parallelism for 'make clean'
bridge: do not add port to router list when receives query with source 0.0.0.0
net: bridge: remove ipv6 zero address check in mcast queries
ipv6: mcast: fix a use-after-free in inet6_mc_check
ipv6/ndisc: Preserve IPv6 control buffer if protocol error handlers are called
llc: set SOCK_RCU_FREE in llc_sap_add_socket()
net/ipv6: Fix index counter for unicast addresses in in6_dump_addrs
net: sched: gred: pass the right attribute to gred_change_table_def()
net: socket: fix a missing-check bug
net: stmmac: Fix stmmac_mdio_reset() when building stmmac as modules
net: udp: fix handling of CHECKSUM_COMPLETE packets
r8169: fix NAPI handling under high load
sctp: fix race on sctp_id2asoc
vhost: Fix Spectre V1 vulnerability
ethtool: fix a privilege escalation bug
bonding: fix length of actor system
net: drop skb on failure in ip_check_defrag()
net: fix pskb_trim_rcsum_slow() with odd trim offset
rtnetlink: Disallow FDB configuration for non-Ethernet device
ip6_tunnel: Fix encapsulation layout
Revert "x86/mm: Expand static page table for fixmap space"
crypto: shash - Fix a sleep-in-atomic bug in shash_setkey_unaligned
ahci: don't ignore result code of ahci_reset_controller()
gpio: mxs: Get rid of external API call
xfs: truncate transaction does not modify the inobt
cachefiles: fix the race between cachefiles_bury_object() and rmdir(2)
ptp: fix Spectre v1 vulnerability
drm/edid: Add 6 bpc quirk for BOE panel in HP Pavilion 15-n233sl
RDMA/ucma: Fix Spectre v1 vulnerability
IB/ucm: Fix Spectre v1 vulnerability
cdc-acm: correct counting of UART states in serial state notification
usb: gadget: storage: Fix Spectre v1 vulnerability
USB: fix the usbfs flag sanitization for control transfers
Input: elan_i2c - add ACPI ID for Lenovo IdeaPad 330-15IGM
sched/fair: Fix throttle_list starvation with low CFS quota
x86/percpu: Fix this_cpu_read()
x86/time: Correct the attribute on jiffies' definition
net: fs_enet: do not call phy_stop() in interrupts
posix-timers: Sanitize overrun handling
Linux 4.9.136
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
[ Upstream commit 4751832da990a927c37526ae67b9226ea01eb99e ]
[BUG]
Cycle mount btrfs can cause fiemap to return different result.
Like:
# mount /dev/vdb5 /mnt/btrfs
# dd if=/dev/zero bs=16K count=4 oflag=dsync of=/mnt/btrfs/file
# xfs_io -c "fiemap -v" /mnt/btrfs/file
/mnt/test/file:
EXT: FILE-OFFSET BLOCK-RANGE TOTAL FLAGS
0: [0..127]: 25088..25215 128 0x1
# umount /mnt/btrfs
# mount /dev/vdb5 /mnt/btrfs
# xfs_io -c "fiemap -v" /mnt/btrfs/file
/mnt/test/file:
EXT: FILE-OFFSET BLOCK-RANGE TOTAL FLAGS
0: [0..31]: 25088..25119 32 0x0
1: [32..63]: 25120..25151 32 0x0
2: [64..95]: 25152..25183 32 0x0
3: [96..127]: 25184..25215 32 0x1
But after above fiemap, we get correct merged result if we call fiemap
again.
# xfs_io -c "fiemap -v" /mnt/btrfs/file
/mnt/test/file:
EXT: FILE-OFFSET BLOCK-RANGE TOTAL FLAGS
0: [0..127]: 25088..25215 128 0x1
[REASON]
Btrfs will try to merge extent map when inserting new extent map.
btrfs_fiemap(start=0 len=(u64)-1)
|- extent_fiemap(start=0 len=(u64)-1)
|- get_extent_skip_holes(start=0 len=64k)
| |- btrfs_get_extent_fiemap(start=0 len=64k)
| |- btrfs_get_extent(start=0 len=64k)
| | Found on-disk (ino, EXTENT_DATA, 0)
| |- add_extent_mapping()
| |- Return (em->start=0, len=16k)
|
|- fiemap_fill_next_extent(logic=0 phys=X len=16k)
|
|- get_extent_skip_holes(start=0 len=64k)
| |- btrfs_get_extent_fiemap(start=0 len=64k)
| |- btrfs_get_extent(start=16k len=48k)
| | Found on-disk (ino, EXTENT_DATA, 16k)
| |- add_extent_mapping()
| | |- try_merge_map()
| | Merge with previous em start=0 len=16k
| | resulting em start=0 len=32k
| |- Return (em->start=0, len=32K) << Merged result
|- Stripe off the unrelated range (0~16K) of return em
|- fiemap_fill_next_extent(logic=16K phys=X+16K len=16K)
^^^ Causing split fiemap extent.
And since in add_extent_mapping(), em is already merged, in next
fiemap() call, we will get merged result.
[FIX]
Here we introduce a new structure, fiemap_cache, which records previous
fiemap extent.
And will always try to merge current fiemap_cache result before calling
fiemap_fill_next_extent().
Only when we failed to merge current fiemap extent with cached one, we
will call fiemap_fill_next_extent() to submit cached one.
So by this method, we can merge all fiemap extents.
It can also be done in fs/ioctl.c, however the problem is if
fieinfo->fi_extents_max == 0, we have no space to cache previous fiemap
extent.
So I choose to merge it in btrfs.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Reviewed-by: Liu Bo <bo.li.liu@oracle.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAltsFNgACgkQONu9yGCS
aT6ZSBAAsAf1VBFKXLJDnCfMdft7Ecrvks3Lb+55qeoPbHgCPPR7ci9R0afaifUn
46O2RFCO5PJAPSGKnLzT6vRUozNIwXbR5zsu/KA9+4zEiej/4PFlF8Ty8yh7LTRI
q0OgyRylbxeUGRjFeTJtaiMG5NIJQD6TEgMhEZoCOSPHVwWjrEyC4ka2RUuXetTw
eXXbm8c+949flMAaxz8plbTUupOQaPiNAaTHzFJR73y5jiRfNTRejgOrawB97BrF
N8uM73sNp87bQeCxQUd00SDrzhHEfCrYueeMhO/qfaduMSx/iUNPpK0KmzF6RxDy
MsmlAFjs9AnPiM7/PH6pdAgwSqsEoIzlRF/oEMOJtONLwHitwGLna8x4NP49eiaQ
7Np6UJNSUJKTyZVar2SRuzKLAsOyoeZOraey9P4OQCKDVdvlLITLSQ2jbqgEn81K
8U92W0r5E/DI6oE/wubMRgus1xPh8yrlynoaa9+wFtSt7QHvBWEpmhe8RRXzAUNv
rSJ4Ovj98IS096f2x1y4zvFfzT3k9soC+5Gxb00JUrWcSbDaffCWTCq4WhrmeasB
I8YeP3lVDSRkkkqwxy1IY5otjYmqwUqhE1wXiai14I+XZ19pIfyRxooFwlCL5wwi
vBm4Lu+cI8a6lUsM1mFk3sxYpekwY6xJt62W75RorCz36MesbLE=
=daBd
-----END PGP SIGNATURE-----
Merge 4.9.119 into android-4.9
Changes in 4.9.119
scsi: qla2xxx: Fix ISP recovery on unload
scsi: qla2xxx: Return error when TMF returns
genirq: Make force irq threading setup more robust
nohz: Fix local_timer_softirq_pending()
netlink: Do not subscribe to non-existent groups
netlink: Don't shift with UB on nlk->ngroups
netlink: Don't shift on 64 for ngroups
ext4: fix false negatives *and* false positives in ext4_check_descriptors()
ACPI / PCI: Bail early in acpi_pci_add_bus() if there is no ACPI handle
ring_buffer: tracing: Inherit the tracing setting to next ring buffer
i2c: imx: Fix reinit_completion() use
Btrfs: fix file data corruption after cloning a range and fsync
tcp: add tcp_ooo_try_coalesce() helper
kmemleak: clear stale pointers from task stacks
fork: unconditionally clear stack on fork
IB/hfi1: Fix incorrect mixing of ERR_PTR and NULL return values
jfs: Fix inconsistency between memory allocation and ea_buf->max_size
Linux 4.9.119
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
commit bd3599a0e142cd73edd3b6801068ac3f48ac771a upstream.
When we clone a range into a file we can end up dropping existing
extent maps (or trimming them) and replacing them with new ones if the
range to be cloned overlaps with a range in the destination inode.
When that happens we add the new extent maps to the list of modified
extents in the inode's extent map tree, so that a "fast" fsync (the flag
BTRFS_INODE_NEEDS_FULL_SYNC not set in the inode) will see the extent maps
and log corresponding extent items. However, at the end of range cloning
operation we do truncate all the pages in the affected range (in order to
ensure future reads will not get stale data). Sometimes this truncation
will release the corresponding extent maps besides the pages from the page
cache. If this happens, then a "fast" fsync operation will miss logging
some extent items, because it relies exclusively on the extent maps being
present in the inode's extent tree, leading to data loss/corruption if
the fsync ends up using the same transaction used by the clone operation
(that transaction was not committed in the meanwhile). An extent map is
released through the callback btrfs_invalidatepage(), which gets called by
truncate_inode_pages_range(), and it calls __btrfs_releasepage(). The
later ends up calling try_release_extent_mapping() which will release the
extent map if some conditions are met, like the file size being greater
than 16Mb, gfp flags allow blocking and the range not being locked (which
is the case during the clone operation) nor being the extent map flagged
as pinned (also the case for cloning).
The following example, turned into a test for fstests, reproduces the
issue:
$ mkfs.btrfs -f /dev/sdb
$ mount /dev/sdb /mnt
$ xfs_io -f -c "pwrite -S 0x18 9000K 6908K" /mnt/foo
$ xfs_io -f -c "pwrite -S 0x20 2572K 156K" /mnt/bar
$ xfs_io -c "fsync" /mnt/bar
# reflink destination offset corresponds to the size of file bar,
# 2728Kb minus 4Kb.
$ xfs_io -c ""reflink ${SCRATCH_MNT}/foo 0 2724K 15908K" /mnt/bar
$ xfs_io -c "fsync" /mnt/bar
$ md5sum /mnt/bar
95a95813a8c2abc9aa75a6c2914a077e /mnt/bar
<power fail>
$ mount /dev/sdb /mnt
$ md5sum /mnt/bar
207fd8d0b161be8a84b945f0df8d5f8d /mnt/bar
# digest should be 95a95813a8c2abc9aa75a6c2914a077e like before the
# power failure
In the above example, the destination offset of the clone operation
corresponds to the size of the "bar" file minus 4Kb. So during the clone
operation, the extent map covering the range from 2572Kb to 2728Kb gets
trimmed so that it ends at offset 2724Kb, and a new extent map covering
the range from 2724Kb to 11724Kb is created. So at the end of the clone
operation when we ask to truncate the pages in the range from 2724Kb to
2724Kb + 15908Kb, the page invalidation callback ends up removing the new
extent map (through try_release_extent_mapping()) when the page at offset
2724Kb is passed to that callback.
Fix this by setting the bit BTRFS_INODE_NEEDS_FULL_SYNC whenever an extent
map is removed at try_release_extent_mapping(), forcing the next fsync to
search for modified extents in the fs/subvolume tree instead of relying on
the presence of extent maps in memory. This way we can continue doing a
"fast" fsync if the destination range of a clone operation does not
overlap with an existing range or if any of the criteria necessary to
remove an extent map at try_release_extent_mapping() is not met (file
size not bigger then 16Mb or gfp flags do not allow blocking).
CC: stable@vger.kernel.org # 3.16+
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sudip Mukherjee <sudipm.mukherjee@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cherry-picked from origin/upstream-f2fs-stable-linux-4.9.y:
f950fa4816 treewide: Use array_size in f2fs_kvzalloc()
61f4864691 treewide: Use array_size() in f2fs_kzalloc()
c4eb50dac3 treewide: Use array_size() in f2fs_kmalloc()
96a232926b overflow.h: Add allocation size calculation helpers
9a857bde9f f2fs: fix to clear FI_VOLATILE_FILE correctly
deb78d4c9a f2fs: let sync node IO interrupt async one
7507ad250a f2fs: don't change wbc->sync_mode
accb4064fb f2fs: fix to update mtime correctly
6285d972c3 fs: f2fs: insert space around that ':' and ', '
9ef10313f5 fs: f2fs: add missing blank lines after declarations
ee62ea28d5 fs: f2fs: changed variable type of offset "unsigned" to "loff_t"
6b4d6a8150 f2fs: clean up symbol namespace
6de81c8d5e f2fs: make set_de_type() static
fdd569a78f f2fs: make __f2fs_write_data_pages() static
48fa534336 f2fs: fix to avoid accessing cross the boundary
34880e00cf f2fs: fix to let caller retry allocating block address
8c46965183 disable loading f2fs module on PAGE_SIZE > 4KB
408285a40a f2fs: fix error path of move_data_page
9780d68db8 f2fs: don't drop dentry pages after fs shutdown
b9921f022b f2fs: fix to avoid race during access gc_thread pointer
bcbcda43c3 f2fs: clean up with clear_radix_tree_dirty_tag
4636af963f f2fs: fix to don't trigger writeback during recovery
5bc68f3050 f2fs: clear discard_wake earlier
8d74ddc1b2 f2fs: let discard thread wait a little longer if dev is busy
caf10c6f12 f2fs: avoid stucking GC due to atomic write
0390d83fdd f2fs: introduce sbi->gc_mode to determine the policy
bbab2dcb22 f2fs: keep migration IO order in LFS mode
2f7e488b70 f2fs: fix to wait page writeback during revoking atomic write
664de5990a f2fs: Fix deadlock in shutdown ioctl
458e47f3b2 f2fs: detect synchronous writeback more earlier
4d93a43daf mm: remove nr_pages argument from pagevec_lookup_{,range}_tag()
12034c751d ceph: use pagevec_lookup_range_nr_tag()
565d7441b9 mm: add variant of pagevec_lookup_range_tag() taking number of pages
bef1c39148 mm: use pagevec_lookup_range_tag() in write_cache_pages()
7e95dd50c6 mm: use pagevec_lookup_range_tag() in __filemap_fdatawait_range()
88c6c1f247 nilfs2: use pagevec_lookup_range_tag()
aa70b7019e gfs2: use pagevec_lookup_range_tag()
c4464307d4 f2fs: use find_get_pages_tag() for looking up single page
45ec63b247 f2fs: simplify page iteration loops
e06736589b f2fs: use pagevec_lookup_range_tag()
d7a592f660 ext4: use pagevec_lookup_range_tag()
b433151665 ceph: use pagevec_lookup_range_tag()
e286666cb1 btrfs: use pagevec_lookup_range_tag()
fb296a23c2 mm: implement find_get_pages_range_tag()
158d9bbbe1 f2fs: clean up with is_valid_blkaddr()
20893172b7 f2fs: fix to initialize min_mtime with ULLONG_MAX
a301e76180 f2fs: fix to let checkpoint guarantee atomic page persistence
2d96ad5b28 f2fs: fix to initialize i_current_depth according to inode type
6b7c7b4171 Revert "f2fs: add ovp valid_blocks check for bg gc victim to fg_gc"
04fa5dce03 f2fs: don't drop any page on f2fs_cp_error() case
0aca14fc01 f2fs: fix spelling mistake: "extenstion" -> "extension"
6abb03385b f2fs: enhance sanity_check_raw_super() to avoid potential overflows
f921fa8496 f2fs: treat volatile file's data as hot one
a2cbd9b5ee f2fs: introduce release_discard_addr() for cleanup
54f43787c9 f2fs: fix potential overflow
f6bd7d451a f2fs: rename dio_rwsem to i_gc_rwsem
48698e46f3 f2fs: move mnt_want_write_file after range check
707f7ae7db f2fs: fix missing clear FI_NO_PREALLOC in some error case
d487b1588e f2fs: enforce fsync_mode=strict for renamed directory
1c7d5f02bf f2fs: sanity check for total valid node blocks
0f0b18adcd f2fs: sanity check on sit entry
60143bfdb3 f2fs: avoid bug_on on corrupted inode
b733d0175e f2fs: give message and set need_fsck given broken node id
0b4d0d03c9 f2fs: clean up commit_inmem_pages()
80ee152c05 f2fs: do not check F2FS_INLINE_DOTS in recover
78d089f531 f2fs: remove duplicated dquot_initialize and fix error handling
198f637cdf f2fs: stop issue discard if something wrong with f2fs
2f903ef932 f2fs: fix return value in f2fs_ioc_commit_atomic_write
5bdfc7ee96 f2fs: allocate hot_data for atomic write more strictly
51ad1795a5 f2fs: check if inmem_pages list is empty correctly
72823f4c7a f2fs: fix race in between GC and atomic open
97d0c4e72c f2fs: change le32 to le16 of f2fs_inode->i_extra_size
5a60d4cde9 f2fs: check cur_valid_map_mir & raw_sit block count when flush sit entries
aa8b4b9926 f2fs: correct return value of f2fs_trim_fs
03a22433da f2fs: fix to show missing bits in FS_IOC_GETFLAGS
a0bbb36238 f2fs: remove unneeded F2FS_PROJINHERIT_FL
7aee058b66 f2fs: don't use GFP_ZERO for page caches
addb448b7e f2fs: issue all big range discards in umount process
dc93e586b3 f2fs: remove redundant block plug
d1aee08aab f2fs: remove unmatched zero_user_segment when convert inline dentry
9bafde62f5 f2fs: introduce private inode status mapping
a4842a1869 fscrypt: log the crypto algorithm implementations
c16d27ebb9 fscrypt: add Speck128/256 support
08ac7224b5 fscrypt: only derive the needed portion of the key
24cc7a8cbb fscrypt: separate key lookup from key derivation
78275d80f7 fscrypt: use a common logging function
56733c6207 fscrypt: remove internal key size constants
9add02d5c6 fscrypt: remove unnecessary check for non-logon key type
4ddc3a807e fscrypt: make fscrypt_operations.max_namelen an integer
6cf4ea2a0c fscrypt: drop empty name check from fname_decrypt()
e2a57840e1 fscrypt: drop max_namelen check from fname_decrypt()
d3679390f9 fscrypt: don't special-case EOPNOTSUPP from fscrypt_get_encryption_info()
1cfd158268 fscrypt: don't clear flags on crypto transform
cfd1d7bab1 fscrypt: remove stale comment from fscrypt_d_revalidate()
1e04ac8a20 fscrypt: remove error messages for skcipher_request_alloc() failure
c6b42b9bc7 fscrypt: remove unnecessary NULL check when allocating skcipher
3866025c17 fscrypt: clean up after fscrypt_prepare_lookup() conversions
b711ad8a21 ext4: switch to fscrypt_prepare_lookup()
f9866debb5 fscrypt: use unbound workqueue for decryption
Change-Id: Ia9a47ef30e9e47c4c3e1e9e91e37ae1dc018384f
Signed-off-by: Jaegeuk Kim <jaegeuk@google.com>
All users of pagevec_lookup() and pagevec_lookup_range() now pass
PAGEVEC_SIZE as a desired number of pages. Just drop the argument.
Link: http://lkml.kernel.org/r/20171009151359.31984-15-jack@suse.cz
Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We want only pages from given range in btree_write_cache_pages() and
extent_write_cache_pages(). Use pagevec_lookup_range_tag() instead of
pagevec_lookup_tag() and remove unnecessary code.
Link: http://lkml.kernel.org/r/20171009151359.31984-3-jack@suse.cz
Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: David Sterba <dsterba@suse.com>
Reviewed-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Cc: David Sterba <dsterba@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[ Upstream commit bff5baf8aa37a97293725a16c03f49872249c07e ]
The setting of return code ret should be based on the error code
passed into function end_extent_writepage and not on ret. Thanks
to Liu Bo for spotting this mistake in the original fix I submitted.
Detected by CoverityScan, CID#1414312 ("Logically dead code")
Fixes: 5dca6eea91 ("Btrfs: mark mapping with error flag to report errors to userspace")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Reviewed-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
We cast 0 to a u8 but then because of type promotion, it's immediately
cast to int back to int before we do a bitwise negate. The cast doesn't
matter in this case, the code works as intended. It causes a static
checker warning though so let's remove it.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
In convert_free_space_to_{bitmaps,extents}(), we buffer the free space
bitmaps in memory and copy them directly to/from the extent buffers with
{read,write}_extent_buffer(). The extent buffer bitmap helpers use byte
granularity, which is equivalent to a little-endian bitmap. This means
that on big-endian systems, the in-memory bitmaps will be written to
disk byte-swapped. To fix this, use byte-granularity for the bitmaps in
memory.
Fixes: a5ed918285 ("Btrfs: implement the free space B-tree")
Cc: stable@vger.kernel.org # 4.5+
Tested-by: Holger Hoffstätte <holger@applied-asynchrony.com>
Tested-by: Chandan Rajendra <chandan@linux.vnet.ibm.com>
Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This is an additional patch to
"Btrfs: memset to avoid stale content in btree node block".
This uses memset to initialize the unused space in a leaf to avoid
potential stale content, which may be incurred by pushing items
between sibling leaves.
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
For many printks, we want to know which file system issued the message.
This patch converts most pr_* calls to use the btrfs_* versions instead.
In some cases, this means adding plumbing to allow call sites access to
an fs_info pointer.
fs/btrfs/check-integrity.c is left alone for another day.
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This patch converts printk(KERN_* style messages to use the pr_* versions.
One side effect is that anything that was KERN_DEBUG is now automatically
a dynamic debug message.
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
CodingStyle chapter 2:
"[...] never break user-visible strings such as printk messages,
because that breaks the ability to grep for them."
This patch unsplits user-visible strings.
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
During updating btree, we could push items between sibling
nodes/leaves, for leaves data sections starts reversely from
the end of the block while for nodes we only have key pairs
which are stored one by one from the start of the block.
So we could do try to push key pairs from one node to the next
node right in the tree, and after that, we update the node's
nritems to reflect the correct end while leaving the stale
content in the node. One may intentionally corrupt the fs
image and access the stale content by bumping the nritems and
causes various crashes.
This takes the in-memory @nritems as the correct one and
gets to memset the unused part of a btree node.
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Nobody uses this, it makes no sense to do partial reads of extent buffers.
Signed-off-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
We have a lot of random ints in btrfs_fs_info that can be put into flags. This
is mostly equivalent with the exception of how we deal with quota going on or
off, now instead we set a flag when we are turning it on or off and deal with
that appropriately, rather than just having a pending state that the current
quota_enabled gets set to. Thanks,
Signed-off-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Extend btrfs_set_extent_delalloc() and extent_clear_unlock_delalloc()
parameters for both in-band dedupe and subpage sector size patchset.
This should reduce conflict of both patchset and the effort to rebase
them.
Cc: Chandan Rajendra <chandan@linux.vnet.ibm.com>
Cc: David Sterba <dsterba@suse.cz>
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
So we can read a btree block via readahead or intentional read,
and we can end up with a memory leak when something happens as
follows,
1) readahead starts to read block A but does not wait for read
completion,
2) btree_readpage_end_io_hook finds that block A is corrupted,
and it needs to clear all block A's pages' uptodate bit.
3) meanwhile an intentional read kicks in and checks block A's
pages' uptodate to decide which page needs to be read.
4) when some pages have the uptodate bit during 3)'s check so
3) doesn't count them for eb->io_pages, but they are later
cleared by 2) so we has to readpage on the page, we get
the wrong eb->io_pages which results in a memory leak of
this block.
This fixes the problem by firstly getting all pages's locking and
then checking pages' uptodate bit.
t1(readahead) t2(readahead endio) t3(the following read)
read_extent_buffer_pages end_bio_extent_readpage
for pg in eb: for page 0,1,2 in eb:
if pg is uptodate: btree_readpage_end_io_hook(pg)
num_reads++ if uptodate:
eb->io_pages = num_reads SetPageUptodate(pg) _______________
for pg in eb: for page 3 in eb: read_extent_buffer_pages
if pg is NOT uptodate: btree_readpage_end_io_hook(pg) for pg in eb:
__extent_read_full_page(pg) sanity check reports something wrong if pg is uptodate:
clear_extent_buffer_uptodate(eb) num_reads++
for pg in eb: eb->io_pages = num_reads
ClearPageUptodate(page) _______________
for pg in eb:
if pg is NOT uptodate:
__extent_read_full_page(pg)
So t3's eb->io_pages is not consistent with the number of pages it's reading,
and during endio(), atomic_dec_and_test(&eb->io_pages) will get a negative
number so that we're not able to free the eb.
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Only in the case of different root_id or different object_id, check_shared
identified extent as the shared. However, If a extent was referred by
different offset of same file, it should also be identified as shared.
In addition, check_shared's loop scale is at least n^3, so if a extent
has too many references, even causes soft hang up.
First, add all delayed_ref to the ref_tree and calculate the unqiue_refs,
if the unique_refs is greater than one, return BACKREF_FOUND_SHARED.
Then individually add the on-disk reference(inline/keyed) to the ref_tree
and calculate the unique_refs of the ref_tree to check if the unique_refs
is greater than one.Because once there are two references to return
SHARED, so the time complexity is close to the constant.
Reported-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com>
Signed-off-by: Lu Fengqi <lufq.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Pull block fixes from Jens Axboe:
"Here's the second round of block updates for this merge window.
It's a mix of fixes for changes that went in previously in this round,
and fixes in general. This pull request contains:
- Fixes for loop from Christoph
- A bdi vs gendisk lifetime fix from Dan, worth two cookies.
- A blk-mq timeout fix, when on frozen queues. From Gabriel.
- Writeback fix from Jan, ensuring that __writeback_single_inode()
does the right thing.
- Fix for bio->bi_rw usage in f2fs from me.
- Error path deadlock fix in blk-mq sysfs registration from me.
- Floppy O_ACCMODE fix from Jiri.
- Fix to the new bio op methods from Mike.
One more followup will be coming here, ensuring that we don't
propagate the block types outside of block. That, and a rename of
bio->bi_rw is coming right after -rc1 is cut.
- Various little fixes"
* 'for-linus' of git://git.kernel.dk/linux-block:
mm/block: convert rw_page users to bio op use
loop: make do_req_filebacked more robust
loop: don't try to use AIO for discards
blk-mq: fix deadlock in blk_mq_register_disk() error path
Include: blkdev: Removed duplicate 'struct request;' declaration.
Fixup direct bi_rw modifiers
block: fix bdi vs gendisk lifetime mismatch
blk-mq: Allow timeouts to run while queue is freezing
nbd: fix race in ioctl
block: fix use-after-free in seq file
f2fs: drop bio->bi_rw manual assignment
block: add missing group association in bio-cloning functions
blkcg: kill unused field nr_undestroyed_grps
writeback: Write dirty times for WB_SYNC_ALL writeback
floppy: fix open(O_ACCMODE) for ioctl-only open
Pull more btrfs updates from Chris Mason:
"This is part two of my btrfs pull, which is some cleanups and a batch
of fixes.
Most of the code here is from Jeff Mahoney, making the pointers we
pass around internally more consistent and less confusing overall. I
noticed a small problem right before I sent this out yesterday, so I
fixed it up and re-tested overnight"
* 'for-linus-4.8' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: (40 commits)
Btrfs: fix __MAX_CSUM_ITEMS
btrfs: btrfs_abort_transaction, drop root parameter
btrfs: add btrfs_trans_handle->fs_info pointer
btrfs: btrfs_relocate_chunk pass extent_root to btrfs_end_transaction
btrfs: convert nodesize macros to static inlines
btrfs: introduce BTRFS_MAX_ITEM_SIZE
btrfs: cleanup, remove prototype for btrfs_find_root_ref
btrfs: copy_to_sk drop unused root parameter
btrfs: simpilify btrfs_subvol_inherit_props
btrfs: tests, use BTRFS_FS_STATE_DUMMY_FS_INFO instead of dummy root
btrfs: tests, require fs_info for root
btrfs: tests, move initialization into tests/
btrfs: btrfs_test_opt and friends should take a btrfs_fs_info
btrfs: prefix fsid to all trace events
btrfs: plumb fs_info into btrfs_work
btrfs: remove obsolete part of comment in statfs
btrfs: hide test-only member under ifdef
btrfs: Ratelimit "no csum found" info message
btrfs: Add ratelimit to btrfs printing
Btrfs: fix unexpected balance crash due to BUG_ON
...
bi_rw should be using bio_set_op_attrs to set bi_rw.
Signed-off-by: Shaun Tancheff <shaun@tancheff.com>
Cc: Chris Mason <clm@fb.com>
Cc: Josef Bacik <jbacik@fb.com>
Cc: David Sterba <dsterba@suse.com>
Cc: Mike Christie <mchristi@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
When a bio is cloned, the newly created bio must be associated with
the same blkcg as the original bio (if BLK_CGROUP is enabled). If
this operation is not performed, then the new bio is not associated
with any group, and the group of the current task is returned when
the group of the bio is requested.
Depending on the cloning frequency, this may cause a large
percentage of the bios belonging to a given group to be treated
as if belonging to other groups (in most cases as if belonging to
the root group). The expected group isolation may thereby be broken.
This commit adds the missing association in bio-cloning functions.
Fixes: da2f0f74cf ("Btrfs: add support for blkio controllers")
Cc: stable@vger.kernel.org # v4.3+
Signed-off-by: Paolo Valente <paolo.valente@linaro.org>
Reviewed-by: Nikolay Borisov <kernel@kyup.com>
Reviewed-by: Jeff Moyer <jmoyer@redhat.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@fb.com>
Merge updates from Andrew Morton:
- a few misc bits
- ocfs2
- most(?) of MM
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (125 commits)
thp: fix comments of __pmd_trans_huge_lock()
cgroup: remove unnecessary 0 check from css_from_id()
cgroup: fix idr leak for the first cgroup root
mm: memcontrol: fix documentation for compound parameter
mm: memcontrol: remove BUG_ON in uncharge_list
mm: fix build warnings in <linux/compaction.h>
mm, thp: convert from optimistic swapin collapsing to conservative
mm, thp: fix comment inconsistency for swapin readahead functions
thp: update Documentation/{vm/transhuge,filesystems/proc}.txt
shmem: split huge pages beyond i_size under memory pressure
thp: introduce CONFIG_TRANSPARENT_HUGE_PAGECACHE
khugepaged: add support of collapse for tmpfs/shmem pages
shmem: make shmem_inode_info::lock irq-safe
khugepaged: move up_read(mmap_sem) out of khugepaged_alloc_page()
thp: extract khugepaged from mm/huge_memory.c
shmem, thp: respect MADV_{NO,}HUGEPAGE for file mappings
shmem: add huge pages support
shmem: get_unmapped_area align huge page
shmem: prepare huge= mount option and sysfs knob
mm, rmap: account shmem thp pages
...
Vladimir has noticed that we might declare memcg oom even during
readahead because read_pages only uses GFP_KERNEL (with mapping_gfp
restriction) while __do_page_cache_readahead uses
page_cache_alloc_readahead which adds __GFP_NORETRY to prevent from
OOMs. This gfp mask discrepancy is really unfortunate and easily
fixable. Drop page_cache_alloc_readahead() which only has one user and
outsource the gfp_mask logic into readahead_gfp_mask and propagate this
mask from __do_page_cache_readahead down to read_pages.
This alone would have only very limited impact as most filesystems are
implementing ->readpages and the common implementation mpage_readpages
does GFP_KERNEL (with mapping_gfp restriction) again. We can tell it to
use readahead_gfp_mask instead as this function is called only during
readahead as well. The same applies to read_cache_pages.
ext4 has its own ext4_mpage_readpages but the path which has pages !=
NULL can use the same gfp mask. Btrfs, cifs, f2fs and orangefs are
doing a very similar pattern to mpage_readpages so the same can be
applied to them as well.
[akpm@linux-foundation.org: coding-style fixes]
[mhocko@suse.com: restrict gfp mask in mpage_alloc]
Link: http://lkml.kernel.org/r/20160610074223.GC32285@dhcp22.suse.cz
Link: http://lkml.kernel.org/r/1465301556-26431-1-git-send-email-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Cc: Vladimir Davydov <vdavydov@parallels.com>
Cc: Chris Mason <clm@fb.com>
Cc: Steve French <sfrench@samba.org>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Jan Kara <jack@suse.cz>
Cc: Mike Marshall <hubcap@omnibond.com>
Cc: Jaegeuk Kim <jaegeuk@kernel.org>
Cc: Changman Lee <cm224.lee@samsung.com>
Cc: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull core block updates from Jens Axboe:
- the big change is the cleanup from Mike Christie, cleaning up our
uses of command types and modified flags. This is what will throw
some merge conflicts
- regression fix for the above for btrfs, from Vincent
- following up to the above, better packing of struct request from
Christoph
- a 2038 fix for blktrace from Arnd
- a few trivial/spelling fixes from Bart Van Assche
- a front merge check fix from Damien, which could cause issues on
SMR drives
- Atari partition fix from Gabriel
- convert cfq to highres timers, since jiffies isn't granular enough
for some devices these days. From Jan and Jeff
- CFQ priority boost fix idle classes, from me
- cleanup series from Ming, improving our bio/bvec iteration
- a direct issue fix for blk-mq from Omar
- fix for plug merging not involving the IO scheduler, like we do for
other types of merges. From Tahsin
- expose DAX type internally and through sysfs. From Toshi and Yigal
* 'for-4.8/core' of git://git.kernel.dk/linux-block: (76 commits)
block: Fix front merge check
block: do not merge requests without consulting with io scheduler
block: Fix spelling in a source code comment
block: expose QUEUE_FLAG_DAX in sysfs
block: add QUEUE_FLAG_DAX for devices to advertise their DAX support
Btrfs: fix comparison in __btrfs_map_block()
block: atari: Return early for unsupported sector size
Doc: block: Fix a typo in queue-sysfs.txt
cfq-iosched: Charge at least 1 jiffie instead of 1 ns
cfq-iosched: Fix regression in bonnie++ rewrite performance
cfq-iosched: Convert slice_resid from u64 to s64
block: Convert fifo_time from ulong to u64
blktrace: avoid using timespec
block/blk-cgroup.c: Declare local symbols static
block/bio-integrity.c: Add #include "blk.h"
block/partition-generic.c: Remove a set-but-not-used variable
block: bio: kill BIO_MAX_SIZE
cfq-iosched: temporarily boost queue priority for idle classes
block: drbd: avoid to use BIO_MAX_SIZE
block: bio: remove BIO_MAX_SECTORS
...
eb->io_pages is set in read_extent_buffer_pages().
In case of readpage failure, for pages that have been added to bio,
it calls bio_endio and later readpage_io_failed_hook() does the work.
When this eb's page (couldn't be the 1st page) fails to add itself to bio
due to failure in merge_bio(), it cannot decrease eb->io_pages via bio_endio,
and ends up with a memory leak eventually.
This lets __do_readpage propagate errors to callers and adds the
'atomic_dec(&eb->io_pages)'.
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
One can use btrfs-corrupt-block to hit BUG_ON() in merge_bio(),
thus this aims to stop anyone to panic the whole system by using
their btrfs.
Since the error in merge_bio can only come from __btrfs_map_block()
when chunk tree mapping has something insane and __btrfs_map_block()
has already had printed the reason, we can just return errors in
merge_bio.
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
BTRFS is using a variety of slab caches to satisfy internal needs.
Those slab caches are always allocated with the SLAB_RECLAIM_ACCOUNT,
meaning allocations from the caches are going to be accounted as
SReclaimable. At the same time btrfs is not registering any shrinkers
whatsoever, thus preventing memory from the slabs to be shrunk. This
means those caches are not in fact reclaimable.
To fix this remove the SLAB_RECLAIM_ACCOUNT on all caches apart from the
inode cache, since this one is being freed by the generic VFS super_block
shrinker. Also set the transaction related caches as SLAB_TEMPORARY,
to better document the lifetime of the objects (it just translates
to SLAB_RECLAIM_ACCOUNT).
Signed-off-by: Nikolay Borisov <n.borisov.lkml@gmail.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
map_private_extent_buffer() can return -EINVAL in two different cases,
1. when the requested contents span two pages if nodesize is larger
than pagesize,
2. when it detects something insane.
The 2nd one used to be only a WARN_ON(1), and we decided to return a error
to callers, but we didn't fix up all its callers, which will be
addressed by this patch.
Without this, btrfs may end up with 'general protection', ie.
reading invalid memory.
Reported-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Chris Mason <clm@fb.com>
Thanks to fuzz testing, we can pass an invalid bytenr to extent buffer
via alloc_extent_buffer(). An unaligned eb can have more pages than it
should have, which ends up extent buffer's leak or some corrupted content
in extent buffer.
This adds a warning to let us quickly know what was happening.
Now that alloc_extent_buffer() no more returns NULL, this changes its
caller and callers of its caller to match with the new error
handling.
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The bio REQ_OP and bi_rw rq_flag_bits are now always setup, so there is
no need to pass around the rq_flag_bits bits too. btrfs users should
should access the bio insead.
Signed-off-by: Mike Christie <mchristi@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
This patch has btrfs's submit_one_bio users set the bio op using
bio_set_op_attrs and get the op using bio_op.
The next patches will continue to convert btrfs,
so submit_bio_hook and merge_bio_hook
related code will be modified to take only the bio. I did
not do it in this patch to try and keep it smaller.
Signed-off-by: Mike Christie <mchristi@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
This has callers of submit_bio/submit_bio_wait set the bio->bi_rw
instead of passing it in. This makes that use the same as
generic_make_request and how we set the other bio fields.
Signed-off-by: Mike Christie <mchristi@redhat.com>
Fixed up fs/ext4/crypto.c
Signed-off-by: Jens Axboe <axboe@fb.com>
self-tests code assumes 4k as the sectorsize and nodesize. This commit
fix hardcoded 4K. Enables the self-tests code to be executed on non-4k
page sized systems (e.g. ppc64).
Reviewed-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Feifei Xu <xufeifei@linux.vnet.ibm.com>
Signed-off-by: Chandan Rajendra <chandan@linux.vnet.ibm.com>
Signed-off-by: David Sterba <dsterba@suse.com>
While we are finishing a device replace operation we can have a concurrent
task trying to do a read repair operation, in which case it will call
btrfs_map_block() to get a struct btrfs_bio which can have a stripe that
points to the source device of the device replace operation. This allows
for the read repair task to dereference the stripe's device pointer after
the device replace operation has freed the source device, resulting in
an invalid memory access. This is similar to the problem solved by my
previous patch in the same series and named "Btrfs: fix race between
device replace and discard".
So fix this by surrounding the call to btrfs_map_block() and the code
that uses the returned struct btrfs_bio with calls to
btrfs_bio_counter_inc_blocked() and btrfs_bio_counter_dec(), giving the
proper serialization with the finishing phase of the device replace
operation.
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: Josef Bacik <jbacik@fb.com>
btrfs's fiemap is supposed to return 0 on success and return < 0 on
error. however, ret becomes 1 after looking up the last file extent:
btrfs_lookup_file_extent ->
btrfs_search_slot(..., ins_len=0, cow=0)
and if the offset is beyond EOF, we'll get 'path' pointed to the place
of potentail insertion, and ret == 1.
This may confuse applications using ioctl(FIEL_IOC_FIEMAP).
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
It seems to be long time unused, since 2008 and
6885f308b5 ("Btrfs: Misc 2.6.25 updates").
Propagating the removal touches some code but has no functional effect.
Signed-off-by: David Sterba <dsterba@suse.com>
Single caller passes GFP_NOFS. We can get rid of the
gfpflags_allow_blocking checks as NOFS can block but does not recurse to
filesystem through reclaim.
Signed-off-by: David Sterba <dsterba@suse.com>
Similar to __clear_extent_bit, do not fail if the state preallocation
fails as we might not need it. One less BUG_ON.
Signed-off-by: David Sterba <dsterba@suse.com>