-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAlt/6CsACgkQONu9yGCS
aT7KOw//boywfZ/3pWvs5bp6MsNBrYxa9mzvv7MDQFj10nibTdwY+XFf7O370j3q
Udt3A3V4a57KnploVglMSv4452LETDhMFdlQlxVTMcWAISiZoviVemDTi6GoRCwI
PJg30COP3kE4u51O5i4sGDrff9mRiUPH2XR4JP27CY1aYbjE1IYoOTfB7jUSZhNn
nzghkYLuxIpXc3GlNiWwXSxjRoDz9hw/0jaY4oZC9CRGi/Z/YBqvXLvj9dMO/2R2
0R1ftiI/CEpuQ10XZ9z3h2WlgiWz9Pj183qBMwCUVR0IDVRo+A6lN1Zz7X+eG0no
+CsDAhwfxu2BmDZ4eTDCgcKTk1LDmSVAo3cOoIdW+UkNbeoz99gFivy+jhEyKwQA
KYfMciX31qmrbVfqGT9mZTsDgrYnLCqOmRnhIoAbA3Tb/4gzO6pQXt08eGK4l+LI
u2Yv4L3BiNtZlRPBZYaibtnAAWOPZEuMIf/cRxXJ4//AcWsLirS1Dy3Z70z7J6ew
jkxtDf2F34YYrj1QGRAKfELiZJOJYX7Ig81y80M4tBD2sjRfb29z7QEK5XbDrIID
OJ8Kh57YrdnkGBOFbG8Yvfy4QV1js8VsReasK1pgad8iL8suBYPTPQwmUztGWXz5
AQsU+aQxNYicTtsRIfEN4Tr1Nb/2IqfShC+lwFpcagMvjzjDT/Q=
=GH5x
-----END PGP SIGNATURE-----
Merge 4.9.124 into android-4.9
Changes in 4.9.124
x86/entry/64: Remove %ebx handling from error_entry/exit
ARC: Explicitly add -mmedium-calls to CFLAGS
usb: dwc3: of-simple: fix use-after-free on remove
netfilter: ipv6: nf_defrag: reduce struct net memory waste
selftests: pstore: return Kselftest Skip code for skipped tests
selftests: static_keys: return Kselftest Skip code for skipped tests
selftests: user: return Kselftest Skip code for skipped tests
selftests: zram: return Kselftest Skip code for skipped tests
selftests: sync: add config fragment for testing sync framework
ARM: dts: NSP: Fix i2c controller interrupt type
ARM: dts: NSP: Fix PCIe controllers interrupt types
ARM: dts: Cygnus: Fix I2C controller interrupt type
ARM: dts: Cygnus: Fix PCIe controller interrupt type
arm64: dts: ns2: Fix I2C controller interrupt type
drm: mali-dp: Enable Global SE interrupts mask for DP500
IB/rxe: Fix missing completion for mem_reg work requests
libahci: Fix possible Spectre-v1 pmp indexing in ahci_led_store()
usb: dwc2: fix isoc split in transfer with no data
usb: gadget: composite: fix delayed_status race condition when set_interface
usb: gadget: dwc2: fix memory leak in gadget_init()
xen: add error handling for xenbus_printf
scsi: xen-scsifront: add error handling for xenbus_printf
xen/scsiback: add error handling for xenbus_printf
arm64: make secondary_start_kernel() notrace
qed: Add sanity check for SIMD fastpath handler.
enic: initialize enic->rfs_h.lock in enic_probe
net: hamradio: use eth_broadcast_addr
net: propagate dev_get_valid_name return code
net: stmmac: socfpga: add additional ocp reset line for Stratix10
nvmet: reset keep alive timer in controller enable
ARC: Enable machine_desc->init_per_cpu for !CONFIG_SMP
net: davinci_emac: match the mdio device against its compatible if possible
KVM: arm/arm64: Drop resource size check for GICV window
locking/lockdep: Do not record IRQ state within lockdep code
ipv6: mcast: fix unsolicited report interval after receiving querys
Smack: Mark inode instant in smack_task_to_inode
batman-adv: Fix bat_ogm_iv best gw refcnt after netlink dump
batman-adv: Fix bat_v best gw refcnt after netlink dump
cxgb4: when disabling dcb set txq dcb priority to 0
iio: pressure: bmp280: fix relative humidity unit
brcmfmac: stop watchdog before detach and free everything
ARM: dts: am437x: make edt-ft5x06 a wakeup source
ALSA: seq: Fix UBSAN warning at SNDRV_SEQ_IOCTL_QUERY_NEXT_CLIENT ioctl
usb: xhci: remove the code build warning
usb: xhci: increase CRS timeout value
NFC: pn533: Fix wrong GFP flag usage
perf test session topology: Fix test on s390
perf report powerpc: Fix crash if callchain is empty
perf bench: Fix numa report output code
netfilter: nf_log: fix uninit read in nf_log_proc_dostring
ceph: fix dentry leak in splice_dentry()
selftests/x86/sigreturn/64: Fix spurious failures on AMD CPUs
selftests/x86/sigreturn: Do minor cleanups
ARM: dts: da850: Fix interrups property for gpio
dmaengine: pl330: report BURST residue granularity
dmaengine: k3dma: Off by one in k3_of_dma_simple_xlate()
md/raid10: fix that replacement cannot complete recovery after reassemble
nl80211: relax ht operation checks for mesh
drm/exynos: gsc: Fix support for NV16/61, YUV420/YVU420 and YUV422 modes
drm/exynos: decon5433: Fix per-plane global alpha for XRGB modes
drm/exynos: decon5433: Fix WINCONx reset value
bpf, s390: fix potential memleak when later bpf_jit_prog fails
PCI: xilinx: Add missing of_node_put()
PCI: xilinx-nwl: Add missing of_node_put()
bnx2x: Fix receiving tx-timeout in error or recovery state.
acpi/nfit: fix cmd_rc for acpi_nfit_ctl to always return a value
m68k: fix "bad page state" oops on ColdFire boot
objtool: Support GCC 8 '-fnoreorder-functions'
ipvlan: call dev_change_flags when ipvlan mode is reset
HID: wacom: Correct touch maximum XY of 2nd-gen Intuos
ARM: imx_v6_v7_defconfig: Select ULPI support
ARM: imx_v4_v5_defconfig: Select ULPI support
tracing: Use __printf markup to silence compiler
kasan: fix shadow_size calculation error in kasan_module_alloc
smsc75xx: Add workaround for gigabit link up hardware errata.
samples/bpf: add missing <linux/if_vlan.h>
samples/bpf: Check the error of write() and read()
ieee802154: 6lowpan: set IFLA_LINK
netfilter: x_tables: set module owner for icmp(6) matches
ipv6: make ipv6_renew_options() interrupt/kernel safe
net: qrtr: Broadcast messages only from control port
sh_eth: fix invalid context bug while calling auto-negotiation by ethtool
sh_eth: fix invalid context bug while changing link options by ethtool
ravb: fix invalid context bug while calling auto-negotiation by ethtool
ravb: fix invalid context bug while changing link options by ethtool
ARM: pxa: irq: fix handling of ICMR registers in suspend/resume
net/sched: act_tunnel_key: fix NULL dereference when 'goto chain' is used
ieee802154: at86rf230: switch from BUG_ON() to WARN_ON() on problem
ieee802154: at86rf230: use __func__ macro for debug messages
ieee802154: fakelb: switch from BUG_ON() to WARN_ON() on problem
drm/armada: fix colorkey mode property
netfilter: nf_conntrack: Fix possible possible crash on module loading.
ARC: Improve cmpxchg syscall implementation
bnxt_en: Always set output parameters in bnxt_get_max_rings().
bnxt_en: Fix for system hang if request_irq fails
perf llvm-utils: Remove bashism from kernel include fetch script
nfit: fix unchecked dereference in acpi_nfit_ctl
RDMA/mlx5: Fix memory leak in mlx5_ib_create_srq() error path
ARM: 8780/1: ftrace: Only set kernel memory back to read-only after boot
ARM: DRA7/OMAP5: Enable ACTLR[0] (Enable invalidates of BTB) for secondary cores
ARM: dts: am3517.dtsi: Disable reference to OMAP3 OTG controller
ixgbe: Be more careful when modifying MAC filters
tools: build: Use HOSTLDFLAGS with fixdep
packet: reset network header if packet shorter than ll reserved space
qlogic: check kstrtoul() for errors
tcp: remove DELAYED ACK events in DCTCP
pinctrl: nsp: off by ones in nsp_pinmux_enable()
pinctrl: nsp: Fix potential NULL dereference
drm/nouveau/gem: off by one bugs in nouveau_gem_pushbuf_reloc_apply()
net/ethernet/freescale/fman: fix cross-build error
net: usb: rtl8150: demote allmulti message to dev_dbg()
PCI: OF: Fix I/O space page leak
PCI: versatile: Fix I/O space page leak
net: qca_spi: Avoid packet drop during initial sync
net: qca_spi: Make sure the QCA7000 reset is triggered
net: qca_spi: Fix log level if probe fails
tcp: identify cryptic messages as TCP seq # bugs
KVM: irqfd: fix race between EPOLLHUP and irq_bypass_register_consumer
ext4: fix spectre gadget in ext4_mb_regular_allocator()
parisc: Remove ordered stores from syscall.S
xfrm_user: prevent leaking 2 bytes of kernel memory
netfilter: conntrack: dccp: treat SYNC/SYNCACK as invalid if no prior state
packet: refine ring v3 block size test to hold one frame
parisc: Remove unnecessary barriers from spinlock.h
PCI: hotplug: Don't leak pci_slot on registration failure
PCI: Skip MPS logic for Virtual Functions (VFs)
PCI: pciehp: Fix use-after-free on unplug
PCI: pciehp: Fix unprotected list iteration in IRQ handler
i2c: imx: Fix race condition in dma read
reiserfs: fix broken xattr handling (heap corruption, bad retval)
Linux 4.9.124
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
[ Upstream commit 02a2f000a3629274bfad60bfc4de9edec49e63e7 ]
test_task_rename() and test_urandom_read()
can be failed during write() and read(),
So check the result of them.
Reviewed-by: David Laight <David.Laight@ACULAB.COM>
Signed-off-by: Taeung Song <treeze.taeung@gmail.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 4d5d33a085335ef469c9a87792bcaaaa8e64d8c4 ]
This fixes build error regarding redefinition:
CLANG-bpf samples/bpf/parse_varlen.o
samples/bpf/parse_varlen.c:111:8: error: redefinition of 'vlan_hdr'
struct vlan_hdr {
^
./include/linux/if_vlan.h:38:8: note: previous definition is here
So remove duplicate 'struct vlan_hdr' in sample code and include if_vlan.h
Signed-off-by: Taeung Song <treeze.taeung@gmail.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
If BPF_F_ALLOW_OVERRIDE flag is used in BPF_PROG_ATTACH command
to the given cgroup the descendent cgroup will be able to override
effective bpf program that was inherited from this cgroup.
By default it's not passed, therefore override is disallowed.
Examples:
1.
prog X attached to /A with default
prog Y fails to attach to /A/B and /A/B/C
Everything under /A runs prog X
2.
prog X attached to /A with allow_override.
prog Y fails to attach to /A/B with default (non-override)
prog M attached to /A/B with allow_override.
Everything under /A/B runs prog M only.
3.
prog X attached to /A with allow_override.
prog Y fails to attach to /A with default.
The user has to detach first to switch the mode.
In the future this behavior may be extended with a chain of
non-overridable programs.
Also fix the bug where detach from cgroup where nothing is attached
was not throwing error. Return ENOENT in such case.
Add several testcases and adjust libbpf.
Fixes: 3007098494be ("cgroup: add support for eBPF programs")
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Tejun Heo <tj@kernel.org>
Acked-by: Daniel Mack <daniel@zonque.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fixes: Change-Id: I3df35d8d3b1261503f9b5bcd90b18c9358f1ac28
("cgroup: add support for eBPF programs")
[AmitP: Refactored original patch for android-4.9 where libbpf sources
are in samples/bpf/ and test_cgrp2_attach2, test_cgrp2_sock,
and test_cgrp2_sock2 sample tests do not exist.]
(cherry picked from commit 7f677633379b4abb3281cdbe7e7006f049305c03)
Signed-off-by: Amit Pundir <amit.pundir@linaro.org>
Cherry-pick from commit d8c5b17f2bc0de09fbbfa14d90e8168163a579e7
Add a simple userpace program to demonstrate the new API to attach eBPF
programs to cgroups. This is what it does:
* Create arraymap in kernel with 4 byte keys and 8 byte values
* Load eBPF program
The eBPF program accesses the map passed in to store two pieces of
information. The number of invocations of the program, which maps
to the number of packets received, is stored to key 0. Key 1 is
incremented on each iteration by the number of bytes stored in
the skb.
* Detach any eBPF program previously attached to the cgroup
* Attach the new program to the cgroup using BPF_PROG_ATTACH
* Once a second, read map[0] and map[1] to see how many bytes and
packets were seen on any socket of tasks in the given cgroup.
The program takes a cgroup path as 1st argument, and either "ingress"
or "egress" as 2nd. Optionally, "drop" can be passed as 3rd argument,
which will make the generated eBPF program return 0 instead of 1, so
the kernel will drop the packet.
libbpf gained two new wrappers for the new syscall commands.
Signed-off-by: Daniel Mack <daniel@zonque.org>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Bug: 30950746
Change-Id: I011436a755abd62050edd22e47995c166a0bd8a2
[ Upstream commit 332270fdc8b6fba07d059a9ad44df9e1a2ad4529 ]
llvm 4.0 and above generates the code like below:
....
440: (b7) r1 = 15
441: (05) goto pc+73
515: (79) r6 = *(u64 *)(r10 -152)
516: (bf) r7 = r10
517: (07) r7 += -112
518: (bf) r2 = r7
519: (0f) r2 += r1
520: (71) r1 = *(u8 *)(r8 +0)
521: (73) *(u8 *)(r2 +45) = r1
....
and the verifier complains "R2 invalid mem access 'inv'" for insn #521.
This is because verifier marks register r2 as unknown value after #519
where r2 is a stack pointer and r1 holds a constant value.
Teach verifier to recognize "stack_ptr + imm" and
"stack_ptr + reg with const val" as valid stack_ptr with new offset.
Signed-off-by: Yonghong Song <yhs@fb.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The files "sampleip_kern.c" and "trace_event_kern.c" directly access
"ctx->regs.ip" which is not available on s390x. Fix this and use the
PT_REGS_IP() macro instead.
Also fix the macro for s390x and use "psw.addr" from "pt_regs".
Reported-by: Zvonko Kosic <zvonko.kosic@de.ibm.com>
Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The test creates two netns, ns1 and ns2. The host (the default netns)
has an ipip or ip6tnl dev configured for tunneling traffic to the ns2.
ping VIPS from ns1 <----> host <--tunnel--> ns2 (VIPs at loopback)
The test is to have ns1 pinging VIPs configured at the loopback
interface in ns2.
The VIPs are 10.10.1.102 and 2401:face::66 (which are configured
at lo@ns2). [Note: 0x66 => 102].
At ns1, the VIPs are routed _via_ the host.
At the host, bpf programs are installed at the veth to redirect packets
from a veth to the ipip/ip6tnl. The test is configured in a way so
that both ingress and egress can be tested.
At ns2, the ipip/ip6tnl dev is configured with the local and remote address
specified. The return path is routed to the dev ipip/ip6tnl.
During egress test, the host also locally tests pinging the VIPs to ensure
that bpf_redirect at egress also works for the direct egress (i.e. not
forwarding from dev ve1 to ve2).
Acked-by: Alexei Starovoitov <ast@fb.com>
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Some of the sample files are causing issues when they are loaded with tc
and cls_bpf, meaning tc bails out while trying to parse the resulting ELF
file as program/map/etc sections are not present, which can be easily
spotted with readelf(1).
Currently, BPF samples are including some of the kernel headers and mid
term we should change them to refrain from this, really. When dynamic
debugging is enabled, we bail out due to undeclared KBUILD_MODNAME, which
is easily overlooked in the build as clang spills this along with other
noisy warnings from various header includes, and llc still generates an
ELF file with mentioned characteristics. For just playing around with BPF
examples, this can be a bit of a hurdle to take.
Just add a fake KBUILD_MODNAME as a band-aid to fix the issue, same is
done in xdp*_kern samples already.
Fixes: 65d472fb00 ("samples/bpf: add 'pointer to packet' tests")
Fixes: 6afb1e28b8 ("samples/bpf: Add tunnel set/get tests.")
Fixes: a3f7461734 ("cgroup: bpf: Add an example to do cgroup checking in BPF")
Reported-by: Chandrasekar Kannan <ckannan@console.to>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Suppose you have a map array value that is something like this
struct foo {
unsigned iter;
int array[SOME_CONSTANT];
};
You can easily insert this into an array, but you cannot modify the contents of
foo->array[] after the fact. This is because we have no way to verify we won't
go off the end of the array at verification time. This patch provides a start
for this work. We accomplish this by keeping track of a minimum and maximum
value a register could be while we're checking the code. Then at the time we
try to do an access into a MAP_VALUE we verify that the maximum offset into that
region is a valid access into that memory region. So in practice, code such as
this
unsigned index = 0;
if (foo->iter >= SOME_CONSTANT)
foo->iter = index;
else
index = foo->iter++;
foo->array[index] = bar;
would be allowed, as we can verify that index will always be between 0 and
SOME_CONSTANT-1. If you wish to use signed values you'll have to have an extra
check to make sure the index isn't less than 0, or do something like index %=
SOME_CONSTANT.
Signed-off-by: Josef Bacik <jbacik@fb.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
seccomp_phase1() does not exist anymore. Instead, update sample to use
__seccomp_filter(). While at it, set max locked memory to unlimited.
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
These samples fail to compile as 'struct flow_keys' conflicts with
definition in net/flow_dissector.h. Fix the same by renaming the
structure used in the sample.
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add couple of test cases for direct write and the negative size issue, and
also adjust the direct packet access test4 since it asserts that writes are
not possible, but since we've just added support for writes, we need to
invert the verdict to ACCEPT, of course. Summary: 133 PASSED, 0 FAILED.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
the test creates 3 namespaces with veth connected via bridge.
First two namespaces simulate two different hosts with the same
IPv4 and IPv6 addresses configured on the tunnel interface and they
communicate with outside world via standard tunnels.
Third namespace creates collect_md tunnel that is driven by BPF
program which selects different remote host (either first or
second namespace) based on tcp dest port number while tcp dst
ip is the same.
This scenario is rough approximation of load balancer use case.
The tests check both traditional tunnel configuration and collect_md mode.
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
extend existing tests for vxlan, geneve, gre to include IPIP tunnel.
It tests both traditional tunnel configuration and
dynamic via bpf helpers.
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
LLVM can generate code that tests for direct packet access via
skb->data/data_end in a way that currently gets rejected by the
verifier, example:
[...]
7: (61) r3 = *(u32 *)(r6 +80)
8: (61) r9 = *(u32 *)(r6 +76)
9: (bf) r2 = r9
10: (07) r2 += 54
11: (3d) if r3 >= r2 goto pc+12
R1=inv R2=pkt(id=0,off=54,r=0) R3=pkt_end R4=inv R6=ctx
R9=pkt(id=0,off=0,r=0) R10=fp
12: (18) r4 = 0xffffff7a
14: (05) goto pc+430
[...]
from 11 to 24: R1=inv R2=pkt(id=0,off=54,r=0) R3=pkt_end R4=inv
R6=ctx R9=pkt(id=0,off=0,r=0) R10=fp
24: (7b) *(u64 *)(r10 -40) = r1
25: (b7) r1 = 0
26: (63) *(u32 *)(r6 +56) = r1
27: (b7) r2 = 40
28: (71) r8 = *(u8 *)(r9 +20)
invalid access to packet, off=20 size=1, R9(id=0,off=0,r=0)
The reason why this gets rejected despite a proper test is that we
currently call find_good_pkt_pointers() only in case where we detect
tests like rX > pkt_end, where rX is of type pkt(id=Y,off=Z,r=0) and
derived, for example, from a register of type pkt(id=Y,off=0,r=0)
pointing to skb->data. find_good_pkt_pointers() then fills the range
in the current branch to pkt(id=Y,off=0,r=Z) on success.
For above case, we need to extend that to recognize pkt_end >= rX
pattern and mark the other branch that is taken on success with the
appropriate pkt(id=Y,off=0,r=Z) type via find_good_pkt_pointers().
Since eBPF operates on BPF_JGT (>) and BPF_JGE (>=), these are the
only two practical options to test for from what LLVM could have
generated, since there's no such thing as BPF_JLT (<) or BPF_JLE (<=)
that we would need to take into account as well.
After the fix:
[...]
7: (61) r3 = *(u32 *)(r6 +80)
8: (61) r9 = *(u32 *)(r6 +76)
9: (bf) r2 = r9
10: (07) r2 += 54
11: (3d) if r3 >= r2 goto pc+12
R1=inv R2=pkt(id=0,off=54,r=0) R3=pkt_end R4=inv R6=ctx
R9=pkt(id=0,off=0,r=0) R10=fp
12: (18) r4 = 0xffffff7a
14: (05) goto pc+430
[...]
from 11 to 24: R1=inv R2=pkt(id=0,off=54,r=54) R3=pkt_end R4=inv
R6=ctx R9=pkt(id=0,off=0,r=54) R10=fp
24: (7b) *(u64 *)(r10 -40) = r1
25: (b7) r1 = 0
26: (63) *(u32 *)(r6 +56) = r1
27: (b7) r2 = 40
28: (71) r8 = *(u8 *)(r9 +20)
29: (bf) r1 = r8
30: (25) if r8 > 0x3c goto pc+47
R1=inv56 R2=imm40 R3=pkt_end R4=inv R6=ctx R8=inv56
R9=pkt(id=0,off=0,r=54) R10=fp
31: (b7) r1 = 1
[...]
Verifier test cases are also added in this work, one that demonstrates
the mentioned example here and one that tries a bad packet access for
the current/fall-through branch (the one with types pkt(id=X,off=Y,r=0),
pkt(id=X,off=0,r=0)), then a case with good and bad accesses, and two
with both test variants (>, >=).
Fixes: 969bf05eb3 ("bpf: direct packet access")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
sample instruction pointer and frequency count in a BPF map
Signed-off-by: Brendan Gregg <bgregg@netflix.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The bpf program is called 50 times a second and does hashmap[kern&user_stackid]++
It's primary purpose to check that key bpf helpers like map lookup, update,
get_stackid, trace_printk and ctx access are all working.
It checks:
- PERF_COUNT_HW_CPU_CYCLES on all cpus
- PERF_COUNT_HW_CPU_CYCLES for current process and inherited perf_events to children
- PERF_COUNT_SW_CPU_CLOCK on all cpus
- PERF_COUNT_SW_CPU_CLOCK for current process
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The patch creates sample code exercising bpf_skb_{set,get}_tunnel_key,
and bpf_skb_{set,get}_tunnel_opt for GRE, VXLAN, and GENEVE. A native
tunnel device is created in a namespace to interact with a lwtunnel
device out of the namespace, with metadata enabled. The bpf_skb_set_*
program is attached to tc egress and bpf_skb_get_* is attached to egress
qdisc. A ping between two tunnels is used to verify correctness and
the result of bpf_skb_get_* printed by bpf_trace_printk.
Signed-off-by: William Tu <u9012063@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Minor overlapping changes for both merge conflicts.
Resolution work done by Stephen Rothwell was used
as a reference.
Signed-off-by: David S. Miller <davem@davemloft.net>
test various corner cases of the helper function access to the packet
via crafted XDP programs.
Signed-off-by: Aaron Yue <haoxuany@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
While hashing out BPF's current_task_under_cgroup helper bits, it came
to discussion that the skb_in_cgroup helper name was suboptimally chosen.
Tejun says:
So, I think in_cgroup should mean that the object is in that
particular cgroup while under_cgroup in the subhierarchy of that
cgroup. Let's rename the other subhierarchy test to under too. I
think that'd be a lot less confusing going forward.
[...]
It's more intuitive and gives us the room to implement the real
"in" test if ever necessary in the future.
Since this touches uapi bits, we need to change this as long as v4.8
is not yet officially released. Thus, change the helper enum and rename
related bits.
Fixes: 4a482f34af ("cgroup: bpf: Add bpf_skb_in_cgroup_proto")
Reference: http://patchwork.ozlabs.org/patch/658500/
Suggested-by: Sargun Dhillon <sargun@sargun.me>
Suggested-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
This test has a BPF program which writes the last known pid to call the
sync syscall within a given cgroup to a map.
The user mode program creates its own mount namespace, and mounts the
cgroupsv2 hierarchy in there, as on all current test systems
(Ubuntu 16.04, Debian), the cgroupsv2 vfs is unmounted by default.
Once it does this, it proceeds to test.
The test checks for positive and negative condition. It ensures that
when it's part of a given cgroup, its pid is captured in the map,
and that when it leaves the cgroup, this doesn't happen.
It populate a cgroups arraymap prior to execution in userspace. This means
that the program must be run in the same cgroups namespace as the programs
that are being traced.
Signed-off-by: Sargun Dhillon <sargun@sargun.me>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Tejun Heo <tj@kernel.org>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The commit 555c8a8623 ("bpf: avoid stack copy and use skb ctx for event output")
started using 20 of initially reserved upper 32-bits of 'flags' argument
in bpf_perf_event_output(). Adjust corresponding prototype in samples/bpf/bpf_helpers.h
Signed-off-by: Adam Barth <arb@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
increase test coverage to check previously missing 'update when full'
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This example shows using a kprobe to act as a dnat mechanism to divert
traffic for arbitrary endpoints. It rewrite the arguments to a syscall
while they're still in userspace, and before the syscall has a chance
to copy the argument into kernel space.
Although this is an example, it also acts as a test because the mapped
address is 255.255.255.255:555 -> real address, and that's not a legal
address to connect to. If the helper is broken, the example will fail
on the intermediate steps, as well as the final step to verify the
rewrite of userspace memory succeeded.
Signed-off-by: Sargun Dhillon <sargun@sargun.me>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This allows user memory to be written to during the course of a kprobe.
It shouldn't be used to implement any kind of security mechanism
because of TOC-TOU attacks, but rather to debug, divert, and
manipulate execution of semi-cooperative processes.
Although it uses probe_kernel_write, we limit the address space
the probe can write into by checking the space with access_ok.
We do this as opposed to calling copy_to_user directly, in order
to avoid sleeping. In addition we ensure the threads's current fs
/ segment is USER_DS and the thread isn't exiting nor a kernel thread.
Given this feature is meant for experiments, and it has a risk of
crashing the system, and running programs, we print a warning on
when a proglet that attempts to use this helper is installed,
along with the pid and process name.
Signed-off-by: Sargun Dhillon <sargun@sargun.me>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The naming choice of index is not terribly descriptive, and dropcnt is
in fact incorrect for xdp2. Pick better names for these: ipproto and
rxcnt.
Signed-off-by: Brenden Blanco <bblanco@plumgrid.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a sample that rewrites and forwards packets out on the same
interface. Observed single core forwarding performance of ~10Mpps.
Since the mlx4 driver under test recycles every single packet page, the
perf output shows almost exclusively just the ring management and bpf
program work. Slowdowns are likely occurring due to cache misses.
Signed-off-by: Brenden Blanco <bblanco@plumgrid.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
test_cgrp2_array_pin.c:
A userland program that creates a bpf_map (BPF_MAP_TYPE_GROUP_ARRAY),
pouplates/updates it with a cgroup2's backed fd and pins it to a
bpf-fs's file. The pinned file can be loaded by tc and then used
by the bpf prog later. This program can also update an existing pinned
array and it could be useful for debugging/testing purpose.
test_cgrp2_tc_kern.c:
A bpf prog which should be loaded by tc. It is to demonstrate
the usage of bpf_skb_in_cgroup.
test_cgrp2_tc.sh:
A script that glues the test_cgrp2_array_pin.c and
test_cgrp2_tc_kern.c together. The idea is like:
1. Load the test_cgrp2_tc_kern.o by tc
2. Use test_cgrp2_array_pin.c to populate a BPF_MAP_TYPE_CGROUP_ARRAY
with a cgroup fd
3. Do a 'ping -6 ff02::1%ve' to ensure the packet has been
dropped because of a match on the cgroup
Most of the lines in test_cgrp2_tc.sh is the boilerplate
to setup the cgroup/bpf-fs/net-devices/netns...etc. It is
not bulletproof on errors but should work well enough and
give enough debug info if things did not go well.
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Cc: Alexei Starovoitov <ast@fb.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Tejun Heo <tj@kernel.org>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
add few tests for "pointer to packet" logic of the verifier
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
parse_simple.c - packet parser exapmle with single length check that
filters out udp packets for port 9
parse_varlen.c - variable length parser that understand multiple vlan headers,
ipip, ipip6 and ip options to filter out udp or tcp packets on port 9.
The packet is parsed layer by layer with multitple length checks.
parse_ldabs.c - classic style of packet parsing using LD_ABS instruction.
Same functionality as parse_simple.
simple = 24.1Mpps per core
varlen = 22.7Mpps
ldabs = 21.4Mpps
Parser with LD_ABS instructions is slower than full direct access parser
which does more packet accesses and checks.
These examples demonstrate the choice bpf program authors can make between
flexibility of the parser vs speed.
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Conflicts:
net/ipv4/ip_gre.c
Minor conflicts between tunnel bug fixes in net and
ipv6 tunnel cleanups in net-next.
Signed-off-by: David S. Miller <davem@davemloft.net>
Users are likely to manually compile both LLVM 'llc' and 'clang'
tools. Thus, also allow redefining CLANG and verify command exist.
Makefile implementation wise, the target that verify the command have
been generalized.
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
It is not intuitive that 'make' must be run from the top level
directory with argument "samples/bpf/" to compile these eBPF samples.
Introduce a kbuild make file trick that allow make to be run from the
"samples/bpf/" directory itself. It basically change to the top level
directory and call "make samples/bpf/" with the "/" slash after the
directory name.
Also add a clean target that only cleans this directory, by taking
advantage of the kbuild external module setting M=$PWD.
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Getting started with using examples in samples/bpf/ is not
straightforward. There are several dependencies, and specific
versions of these dependencies.
Just compiling the example tool is also slightly obscure, e.g. one
need to call make like:
make samples/bpf/
Do notice the "/" slash after the directory name.
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Make compiling samples/bpf more user friendly, by detecting if LLVM
compiler tool 'llc' is available, and also detect if the 'bpf' target
is available in this version of LLVM.
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
It is practical to be-able-to redefine the location of the LLVM
command 'llc', because not all distros have a LLVM version with bpf
target support. Thus, it is sometimes required to compile LLVM from
source, and sometimes it is not desired to overwrite the distros
default LLVM version.
This feature was removed with 128d1514be ("samples/bpf: Use llc in
PATH, rather than a hardcoded value").
Add this features back. Note that it is possible to redefine the LLC
on the make command like:
make samples/bpf/ LLC=~/git/llvm/build/bin/llc
Fixes: 128d1514be ("samples/bpf: Use llc in PATH, rather than a hardcoded value")
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
llvm cannot always recognize memset as builtin function and optimize
it away, so just delete it. It was a leftover from testing
of bpf_perf_event_output() with large data structures.
Fixes: 39111695b1 ("samples: bpf: add bpf_perf_event_output example")
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This adds test cases mostly around ARG_PTR_TO_RAW_STACK to check the
verifier behaviour.
[...]
#84 raw_stack: no skb_load_bytes OK
#85 raw_stack: skb_load_bytes, no init OK
#86 raw_stack: skb_load_bytes, init OK
#87 raw_stack: skb_load_bytes, spilled regs around bounds OK
#88 raw_stack: skb_load_bytes, spilled regs corruption OK
#89 raw_stack: skb_load_bytes, spilled regs corruption 2 OK
#90 raw_stack: skb_load_bytes, spilled regs + data OK
#91 raw_stack: skb_load_bytes, invalid access 1 OK
#92 raw_stack: skb_load_bytes, invalid access 2 OK
#93 raw_stack: skb_load_bytes, invalid access 3 OK
#94 raw_stack: skb_load_bytes, invalid access 4 OK
#95 raw_stack: skb_load_bytes, invalid access 5 OK
#96 raw_stack: skb_load_bytes, invalid access 6 OK
#97 raw_stack: skb_load_bytes, large access OK
Summary: 98 PASSED, 0 FAILED
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove the zero initialization in the sample programs where appropriate.
Note that this is an optimization which is now possible, old programs
still doing the zero initialization are just fine as well. Also, make
sure we don't have padding issues when we don't memset() the entire
struct anymore.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
the first microbenchmark does
fd=open("/proc/self/comm");
for() {
write(fd, "test");
}
and on 4 cpus in parallel:
writes per sec
base (no tracepoints, no kprobes) 930k
with kprobe at __set_task_comm() 420k
with tracepoint at task:task_rename 730k
For kprobe + full bpf program manully fetches oldcomm, newcomm via bpf_probe_read.
For tracepint bpf program does nothing, since arguments are copied by tracepoint.
2nd microbenchmark does:
fd=open("/dev/urandom");
for() {
read(fd, buf);
}
and on 4 cpus in parallel:
reads per sec
base (no tracepoints, no kprobes) 300k
with kprobe at urandom_read() 279k
with tracepoint at random:urandom_read 290k
bpf progs attached to kprobe and tracepoint are noop.
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
modify offwaketime to work with sched/sched_switch tracepoint
instead of kprobe into finish_task_switch
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Recognize "tracepoint/" section name prefix and attach the program
to that tracepoint.
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add the necessary definitions for building bpf samples on ppc.
Since ppc doesn't store function return address on the stack, modify how
PT_REGS_RET() and PT_REGS_FP() work.
Also, introduce PT_REGS_IP() to access the instruction pointer.
Cc: Alexei Starovoitov <ast@fb.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: David S. Miller <davem@davemloft.net>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
While at it, remove the generation of .s files and fix some typos in the
related comment.
Cc: Alexei Starovoitov <ast@fb.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Building BPF samples is failing with the below error:
samples/bpf/map_perf_test_user.c: In function ‘main’:
samples/bpf/map_perf_test_user.c:134:9: error: variable ‘r’ has
initializer but incomplete type
struct rlimit r = {RLIM_INFINITY, RLIM_INFINITY};
^
samples/bpf/map_perf_test_user.c:134:21: error: ‘RLIM_INFINITY’
undeclared (first use in this function)
struct rlimit r = {RLIM_INFINITY, RLIM_INFINITY};
^
samples/bpf/map_perf_test_user.c:134:21: note: each undeclared
identifier is reported only once for each function it appears in
samples/bpf/map_perf_test_user.c:134:9: warning: excess elements in
struct initializer [enabled by default]
struct rlimit r = {RLIM_INFINITY, RLIM_INFINITY};
^
samples/bpf/map_perf_test_user.c:134:9: warning: (near initialization
for ‘r’) [enabled by default]
samples/bpf/map_perf_test_user.c:134:9: warning: excess elements in
struct initializer [enabled by default]
samples/bpf/map_perf_test_user.c:134:9: warning: (near initialization
for ‘r’) [enabled by default]
samples/bpf/map_perf_test_user.c:134:16: error: storage size of ‘r’
isn’t known
struct rlimit r = {RLIM_INFINITY, RLIM_INFINITY};
^
samples/bpf/map_perf_test_user.c:139:2: warning: implicit declaration of
function ‘setrlimit’ [-Wimplicit-function-declaration]
setrlimit(RLIMIT_MEMLOCK, &r);
^
samples/bpf/map_perf_test_user.c:139:12: error: ‘RLIMIT_MEMLOCK’
undeclared (first use in this function)
setrlimit(RLIMIT_MEMLOCK, &r);
^
samples/bpf/map_perf_test_user.c:134:16: warning: unused variable ‘r’
[-Wunused-variable]
struct rlimit r = {RLIM_INFINITY, RLIM_INFINITY};
^
make[2]: *** [samples/bpf/map_perf_test_user.o] Error 1
Fix this by including the necessary header file.
Cc: Alexei Starovoitov <ast@fb.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: David S. Miller <davem@davemloft.net>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>