aboutsummaryrefslogtreecommitdiff
path: root/tools/perf
AgeCommit message (Collapse)Author
2018-10-01Merge tag 'v4.4.159' into linux-linaro-lsk-v4.4lsk-v4.4-18.09Mark Brown
This is the 4.4.159 stable release # gpg: Signature made Sat 29 Sep 2018 11:08:56 BST # gpg: using RSA key 647F28654894E3BD457199BE38DBBDC86092693E # gpg: Good signature from "Greg Kroah-Hartman <gregkh@linuxfoundation.org>" [unknown] # gpg: aka "Greg Kroah-Hartman (Linux kernel stable release signing key) <greg@kroah.com>" [unknown] # gpg: aka "Greg Kroah-Hartman <gregkh@kernel.org>" [unknown] # gpg: WARNING: This key is not certified with a trusted signature! # gpg: There is no indication that the signature belongs to the owner. # Primary key fingerprint: 647F 2865 4894 E3BD 4571 99BE 38DB BDC8 6092 693E
2018-09-26perf powerpc: Fix callchain ip filteringSandipan Das
[ Upstream commit c715fcfda5a08edabaa15508742be926b7ee51db ] For powerpc64, redundant entries in the callchain are filtered out by determining the state of the return address and the stack frame using DWARF debug information. For making these filtering decisions we must analyze the debug information for the location corresponding to the program counter value, i.e. the first entry in the callchain, and not the LR value; otherwise, perf may filter out either the second or the third entry in the callchain incorrectly. This can be observed on a powerpc64le system running Fedora 27 as shown below. Case 1 - Attaching a probe at inet_pton+0x8 (binary offset 0x15af28). Return address is still in LR and a new stack frame is not yet allocated. The LR value, i.e. the second entry, should not be filtered out. # objdump -d /usr/lib64/libc-2.26.so | less ... 000000000010eb10 <gaih_inet.constprop.7>: ... 10fa48: 78 bb e4 7e mr r4,r23 10fa4c: 0a 00 60 38 li r3,10 10fa50: d9 b4 04 48 bl 15af28 <inet_pton+0x8> 10fa54: 00 00 00 60 nop 10fa58: ac f4 ff 4b b 10ef04 <gaih_inet.constprop.7+0x3f4> ... 0000000000110450 <getaddrinfo>: ... 1105a8: 54 00 ff 38 addi r7,r31,84 1105ac: 58 00 df 38 addi r6,r31,88 1105b0: 69 e5 ff 4b bl 10eb18 <gaih_inet.constprop.7+0x8> 1105b4: 78 1b 71 7c mr r17,r3 1105b8: 50 01 7f e8 ld r3,336(r31) ... 000000000015af20 <inet_pton>: 15af20: 0b 00 4c 3c addis r2,r12,11 15af24: e0 c1 42 38 addi r2,r2,-15904 15af28: a6 02 08 7c mflr r0 15af2c: f0 ff c1 fb std r30,-16(r1) 15af30: f8 ff e1 fb std r31,-8(r1) ... # perf probe -x /usr/lib64/libc-2.26.so -a inet_pton+0x8 # perf record -e probe_libc:inet_pton -g ping -6 -c 1 ::1 # perf script Before: ping 4507 [002] 514985.546540: probe_libc:inet_pton: (7fffa7dbaf28) 7fffa7dbaf28 __GI___inet_pton+0x8 (/usr/lib64/libc-2.26.so) 7fffa7d705b4 getaddrinfo+0x164 (/usr/lib64/libc-2.26.so) 13fb52d70 _init+0xbfc (/usr/bin/ping) 7fffa7c836a0 generic_start_main.isra.0+0x140 (/usr/lib64/libc-2.26.so) 7fffa7c83898 __libc_start_main+0xb8 (/usr/lib64/libc-2.26.so) 0 [unknown] ([unknown]) After: ping 4507 [002] 514985.546540: probe_libc:inet_pton: (7fffa7dbaf28) 7fffa7dbaf28 __GI___inet_pton+0x8 (/usr/lib64/libc-2.26.so) 7fffa7d6fa54 gaih_inet.constprop.7+0xf44 (/usr/lib64/libc-2.26.so) 7fffa7d705b4 getaddrinfo+0x164 (/usr/lib64/libc-2.26.so) 13fb52d70 _init+0xbfc (/usr/bin/ping) 7fffa7c836a0 generic_start_main.isra.0+0x140 (/usr/lib64/libc-2.26.so) 7fffa7c83898 __libc_start_main+0xb8 (/usr/lib64/libc-2.26.so) 0 [unknown] ([unknown]) Case 2 - Attaching a probe at _int_malloc+0x180 (binary offset 0x9cf10). Return address in still in LR and a new stack frame has already been allocated but not used. The caller's caller, i.e. the third entry, is invalid and should be filtered out and not the second one. # objdump -d /usr/lib64/libc-2.26.so | less ... 000000000009cd90 <_int_malloc>: 9cd90: 17 00 4c 3c addis r2,r12,23 9cd94: 70 a3 42 38 addi r2,r2,-23696 9cd98: 26 00 80 7d mfcr r12 9cd9c: f8 ff e1 fb std r31,-8(r1) 9cda0: 17 00 e4 3b addi r31,r4,23 9cda4: d8 ff 61 fb std r27,-40(r1) 9cda8: 78 23 9b 7c mr r27,r4 9cdac: 1f 00 bf 2b cmpldi cr7,r31,31 9cdb0: f0 ff c1 fb std r30,-16(r1) 9cdb4: b0 ff c1 fa std r22,-80(r1) 9cdb8: 78 1b 7e 7c mr r30,r3 9cdbc: 08 00 81 91 stw r12,8(r1) 9cdc0: 11 ff 21 f8 stdu r1,-240(r1) 9cdc4: 4c 01 9d 41 bgt cr7,9cf10 <_int_malloc+0x180> 9cdc8: 20 00 a4 2b cmpldi cr7,r4,32 ... 9cf08: 00 00 00 60 nop 9cf0c: 00 00 42 60 ori r2,r2,0 9cf10: e4 06 ff 7b rldicr r31,r31,0,59 9cf14: 40 f8 a4 7f cmpld cr7,r4,r31 9cf18: 68 05 9d 41 bgt cr7,9d480 <_int_malloc+0x6f0> ... 000000000009e3c0 <tcache_init.part.4>: ... 9e420: 40 02 80 38 li r4,576 9e424: 78 fb e3 7f mr r3,r31 9e428: 71 e9 ff 4b bl 9cd98 <_int_malloc+0x8> 9e42c: 00 00 a3 2f cmpdi cr7,r3,0 9e430: 78 1b 7e 7c mr r30,r3 ... 000000000009f7a0 <__libc_malloc>: ... 9f8f8: 00 00 89 2f cmpwi cr7,r9,0 9f8fc: 1c ff 9e 40 bne cr7,9f818 <__libc_malloc+0x78> 9f900: c9 ea ff 4b bl 9e3c8 <tcache_init.part.4+0x8> 9f904: 00 00 00 60 nop 9f908: e8 90 22 e9 ld r9,-28440(r2) ... # perf probe -x /usr/lib64/libc-2.26.so -a _int_malloc+0x180 # perf record -e probe_libc:_int_malloc -g ./test-malloc # perf script Before: test-malloc 6554 [009] 515975.797403: probe_libc:_int_malloc: (7fffa6e6cf10) 7fffa6e6cf10 _int_malloc+0x180 (/usr/lib64/libc-2.26.so) 7fffa6dd0000 [unknown] (/usr/lib64/libc-2.26.so) 7fffa6e6f904 malloc+0x164 (/usr/lib64/libc-2.26.so) 7fffa6e6f9fc malloc+0x25c (/usr/lib64/libc-2.26.so) 100006b4 main+0x38 (/home/testuser/test-malloc) 7fffa6df36a0 generic_start_main.isra.0+0x140 (/usr/lib64/libc-2.26.so) 7fffa6df3898 __libc_start_main+0xb8 (/usr/lib64/libc-2.26.so) 0 [unknown] ([unknown]) After: test-malloc 6554 [009] 515975.797403: probe_libc:_int_malloc: (7fffa6e6cf10) 7fffa6e6cf10 _int_malloc+0x180 (/usr/lib64/libc-2.26.so) 7fffa6e6e42c tcache_init.part.4+0x6c (/usr/lib64/libc-2.26.so) 7fffa6e6f904 malloc+0x164 (/usr/lib64/libc-2.26.so) 7fffa6e6f9fc malloc+0x25c (/usr/lib64/libc-2.26.so) 100006b4 main+0x38 (/home/sandipan/test-malloc) 7fffa6df36a0 generic_start_main.isra.0+0x140 (/usr/lib64/libc-2.26.so) 7fffa6df3898 __libc_start_main+0xb8 (/usr/lib64/libc-2.26.so) 0 [unknown] ([unknown]) Signed-off-by: Sandipan Das <sandipan@linux.ibm.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Maynard Johnson <maynard@us.ibm.com> Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Cc: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com> Cc: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com> Fixes: a60335ba3298 ("perf tools powerpc: Adjust callchain based on DWARF debug info") Link: http://lkml.kernel.org/r/24bb726d91ed173aebc972ec3f41a2ef2249434e.1530724939.git.sandipan@linux.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-09-26perf powerpc: Fix callchain ip filtering when return address is in a registerSandipan Das
[ Upstream commit 9068533e4f470daf2b0f29c71d865990acd8826e ] For powerpc64, perf will filter out the second entry in the callchain, i.e. the LR value, if the return address of the function corresponding to the probed location has already been saved on its caller's stack. The state of the return address is determined using debug information. At any point within a function, if the return address is already saved somewhere, a DWARF expression can tell us about its location. If the return address in still in LR only, no DWARF expression would exist. Typically, the instructions in a function's prologue first copy the LR value to R0 and then pushes R0 on to the stack. If LR has already been copied to R0 but R0 is yet to be pushed to the stack, we can still get a DWARF expression that says that the return address is in R0. This is indicating that getting a DWARF expression for the return address does not guarantee the fact that it has already been saved on the stack. This can be observed on a powerpc64le system running Fedora 27 as shown below. # objdump -d /usr/lib64/libc-2.26.so | less ... 000000000015af20 <inet_pton>: 15af20: 0b 00 4c 3c addis r2,r12,11 15af24: e0 c1 42 38 addi r2,r2,-15904 15af28: a6 02 08 7c mflr r0 15af2c: f0 ff c1 fb std r30,-16(r1) 15af30: f8 ff e1 fb std r31,-8(r1) 15af34: 78 1b 7f 7c mr r31,r3 15af38: 78 23 83 7c mr r3,r4 15af3c: 78 2b be 7c mr r30,r5 15af40: 10 00 01 f8 std r0,16(r1) 15af44: c1 ff 21 f8 stdu r1,-64(r1) 15af48: 28 00 81 f8 std r4,40(r1) ... # readelf --debug-dump=frames-interp /usr/lib64/libc-2.26.so | less ... 00027024 0000000000000024 00027028 FDE cie=00000000 pc=000000000015af20..000000000015af88 LOC CFA r30 r31 ra 000000000015af20 r1+0 u u u 000000000015af34 r1+0 c-16 c-8 r0 000000000015af48 r1+64 c-16 c-8 c+16 000000000015af5c r1+0 c-16 c-8 c+16 000000000015af78 r1+0 u u ... # perf probe -x /usr/lib64/libc-2.26.so -a inet_pton+0x18 # perf record -e probe_libc:inet_pton -g ping -6 -c 1 ::1 # perf script Before: ping 2829 [005] 512917.460174: probe_libc:inet_pton: (7fff7e2baf38) 7fff7e2baf38 __GI___inet_pton+0x18 (/usr/lib64/libc-2.26.so) 7fff7e2705b4 getaddrinfo+0x164 (/usr/lib64/libc-2.26.so) 12f152d70 _init+0xbfc (/usr/bin/ping) 7fff7e1836a0 generic_start_main.isra.0+0x140 (/usr/lib64/libc-2.26.so) 7fff7e183898 __libc_start_main+0xb8 (/usr/lib64/libc-2.26.so) 0 [unknown] ([unknown]) After: ping 2829 [005] 512917.460174: probe_libc:inet_pton: (7fff7e2baf38) 7fff7e2baf38 __GI___inet_pton+0x18 (/usr/lib64/libc-2.26.so) 7fff7e26fa54 gaih_inet.constprop.7+0xf44 (/usr/lib64/libc-2.26.so) 7fff7e2705b4 getaddrinfo+0x164 (/usr/lib64/libc-2.26.so) 12f152d70 _init+0xbfc (/usr/bin/ping) 7fff7e1836a0 generic_start_main.isra.0+0x140 (/usr/lib64/libc-2.26.so) 7fff7e183898 __libc_start_main+0xb8 (/usr/lib64/libc-2.26.so) 0 [unknown] ([unknown]) Reported-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Signed-off-by: Sandipan Das <sandipan@linux.ibm.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Maynard Johnson <maynard@us.ibm.com> Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Cc: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com> Cc: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com> Link: http://lkml.kernel.org/r/66e848a7bdf2d43b39210a705ff6d828a0865661.1530724939.git.sandipan@linux.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-09-20Merge tag 'v4.4.157' into linux-linaro-lsk-v4.4Mark Brown
This is the 4.4.157 stable release
2018-09-19perf tools: Allow overriding MAX_NR_CPUS at compile timeChristophe Leroy
[ Upstream commit 21b8732eb4479b579bda9ee38e62b2c312c2a0e5 ] After update of kernel, the perf tool doesn't run anymore on my 32MB RAM powerpc board, but still runs on a 128MB RAM board: ~# strace perf execve("/usr/sbin/perf", ["perf"], [/* 12 vars */]) = -1 ENOMEM (Cannot allocate memory) --- SIGSEGV {si_signo=SIGSEGV, si_code=SI_KERNEL, si_addr=0} --- +++ killed by SIGSEGV +++ Segmentation fault objdump -x shows that .bss section has a huge size of 24Mbytes: 27 .bss 016baca8 101cebb8 101cebb8 001cd988 2**3 With especially the following objects having quite big size: 10205f80 l O .bss 00140000 runtime_cycles_stats 10345f80 l O .bss 00140000 runtime_stalled_cycles_front_stats 10485f80 l O .bss 00140000 runtime_stalled_cycles_back_stats 105c5f80 l O .bss 00140000 runtime_branches_stats 10705f80 l O .bss 00140000 runtime_cacherefs_stats 10845f80 l O .bss 00140000 runtime_l1_dcache_stats 10985f80 l O .bss 00140000 runtime_l1_icache_stats 10ac5f80 l O .bss 00140000 runtime_ll_cache_stats 10c05f80 l O .bss 00140000 runtime_itlb_cache_stats 10d45f80 l O .bss 00140000 runtime_dtlb_cache_stats 10e85f80 l O .bss 00140000 runtime_cycles_in_tx_stats 10fc5f80 l O .bss 00140000 runtime_transaction_stats 11105f80 l O .bss 00140000 runtime_elision_stats 11245f80 l O .bss 00140000 runtime_topdown_total_slots 11385f80 l O .bss 00140000 runtime_topdown_slots_retired 114c5f80 l O .bss 00140000 runtime_topdown_slots_issued 11605f80 l O .bss 00140000 runtime_topdown_fetch_bubbles 11745f80 l O .bss 00140000 runtime_topdown_recovery_bubbles This is due to commit 4d255766d28b1 ("perf: Bump max number of cpus to 1024"), because many tables are sized with MAX_NR_CPUS This patch gives the opportunity to redefine MAX_NR_CPUS via $ make EXTRA_CFLAGS=-DMAX_NR_CPUS=1 Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: linuxppc-dev@lists.ozlabs.org Link: http://lkml.kernel.org/r/20170922112043.8349468C57@po15668-vm-win7.idsi0.si.c-s.fr Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-09-10Merge tag 'v4.4.155' into linux-linaro-lsk-v4.4Mark Brown
This is the 4.4.155 stable release
2018-09-09perf auxtrace: Fix queue resizeAdrian Hunter
commit 99cbbe56eb8bede625f410ab62ba34673ffa7d21 upstream. When the number of queues grows beyond 32, the array of queues is resized but not all members were being copied. Fix by also copying 'tid', 'cpu' and 'set'. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: stable@vger.kernel.org Fixes: e502789302a6e ("perf auxtrace: Add helpers for queuing AUX area tracing data") Link: http://lkml.kernel.org/r/20180814084608.6563-1-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-08-29Merge tag 'v4.4.153' into linux-linaro-lsk-v4.4Mark Brown
This is the 4.4.153 stable release
2018-08-24perf llvm-utils: Remove bashism from kernel include fetch scriptKim Phillips
[ Upstream commit f6432b9f65001651412dbc3589d251534822d4ab ] Like system(), popen() calls /bin/sh, which may/may not be bash. Script when run on dash and encounters the line, yields: exit: Illegal number: -1 checkbashisms report on script content: possible bashism (exit|return with negative status code): exit -1 Remove the bashism and use the more portable non-zero failure status code 1. Signed-off-by: Kim Phillips <kim.phillips@arm.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Hendrik Brueckner <brueckner@linux.vnet.ibm.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sandipan Das <sandipan@linux.vnet.ibm.com> Cc: Thomas Richter <tmricht@linux.vnet.ibm.com> Link: http://lkml.kernel.org/r/20180629124652.8d0af7e2281fd3fd8262cacc@arm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-08-24perf report powerpc: Fix crash if callchain is emptySandipan Das
[ Upstream commit 143c99f6ac6812d23254e80844d6e34be897d3e1 ] For some cases, the callchain provided by the kernel may be empty. So, the callchain ip filtering code will cause a crash if we do not check whether the struct ip_callchain pointer is NULL before accessing any members. This can be observed on a powerpc64le system running Fedora 27 as shown below. # perf record -b -e cycles:u ls Before: # perf report --branch-history perf: Segmentation fault -------- backtrace -------- perf[0x1027615c] linux-vdso64.so.1(__kernel_sigtramp_rt64+0x0)[0x7fff856304d8] perf(arch_skip_callchain_idx+0x44)[0x10257c58] perf[0x1017f2e4] perf(thread__resolve_callchain+0x124)[0x1017ff5c] perf(sample__resolve_callchain+0xf0)[0x10172788] ... After: # perf report --branch-history Samples: 25 of event 'cycles:u', Event count (approx.): 2306870 Overhead Source:Line Symbol Shared Object + 11.60% _init+35736 [.] _init ls + 9.84% strcoll_l.c:137 [.] __strcoll_l libc-2.26.so + 9.16% memcpy.S:175 [.] __memcpy_power7 libc-2.26.so + 9.01% gconv_charset.h:54 [.] _nl_find_locale libc-2.26.so + 8.87% dl-addr.c:52 [.] _dl_addr libc-2.26.so + 8.83% _init+236 [.] _init ls ... Reported-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Signed-off-by: Sandipan Das <sandipan@linux.ibm.com> Acked-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Cc: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com> Link: http://lkml.kernel.org/r/20180611104049.11048-1-sandipan@linux.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-08-24perf test session topology: Fix test on s390Thomas Richter
[ Upstream commit b930e62ecd362843002bdf84c2940439822af321 ] On s390 this test case fails because the socket identifiction numbers assigned to the CPU are higher than the CPU identification numbers. F/ix this by adding the platform architecture into the perf data header flag information. This helps identifiing the test platform and handles s390 specifics in process_cpu_topology(). Before: [root@p23lp27 perf]# perf test -vvvvv -F 39 39: Session topology : --- start --- templ file: /tmp/perf-test-iUv755 socket_id number is too big.You may need to upgrade the perf tool. ---- end ---- Session topology: Skip [root@p23lp27 perf]# After: [root@p23lp27 perf]# perf test -vvvvv -F 39 39: Session topology : --- start --- templ file: /tmp/perf-test-8X8VTs CPU 0, core 0, socket 6 CPU 1, core 1, socket 3 ---- end ---- Session topology: Ok [root@p23lp27 perf]# Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Reviewed-by: Hendrik Brueckner <brueckner@linux.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Fixes: c84974ed9fb6 ("perf test: Add entry to test cpu topology") Link: http://lkml.kernel.org/r/20180611073153.15592-2-tmricht@linux.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-07-19Merge tag 'v4.4.142' into linux-linaro-lsk-v4.4Mark Brown
This is the 4.4.142 stable release
2018-07-19perf tools: Move syscall number fallbacks from perf-sys.h to ↵Arnaldo Carvalho de Melo
tools/arch/x86/include/asm/ commit cec07f53c398f22576df77052c4777dc13f14962 upstream. And remove the empty tools/arch/x86/include/asm/unistd_{32,64}.h files introduced by eae7a755ee81 ("perf tools, x86: Build perf on older user-space as well"). This way we get closer to mirroring the kernel for cases where __NR_ can't be found for some include path/_GNU_SOURCE/whatever scenario. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-kpj6m3mbjw82kg6krk2z529e@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-07-05Merge tag 'v4.4.139' into linux-linaro-lsk-v4.4Mark Brown
This is the 4.4.139 stable release
2018-07-03perf intel-pt: Fix packet decoding of CYC packetsAdrian Hunter
commit 621a5a327c1e36ffd7bb567f44a559f64f76358f upstream. Use a 64-bit type so that the cycle count is not limited to 32-bits. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/1528371002-8862-1-git-send-email-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-07-03perf intel-pt: Fix "Unexpected indirect branch" errorAdrian Hunter
commit 9fb523363f6e3984457fee95bb7019395384ffa7 upstream. Some Atom CPUs can produce FUP packets that contain NLIP (next linear instruction pointer) instead of CLIP (current linear instruction pointer). That will result in "Unexpected indirect branch" errors. Fix by comparing IP to NLIP in that case. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/1527762225-26024-5-git-send-email-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-07-03perf intel-pt: Fix MTC timing after overflowAdrian Hunter
commit dd27b87ab5fcf3ea1c060b5e3ab5d31cc78e9f4c upstream. On some platforms, overflows will clear before MTC wraparound, and there is no following TSC/TMA packet. In that case the previous TMA is valid. Since there will be a valid TMA either way, stop setting 'have_tma' to false upon overflow. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/1527762225-26024-4-git-send-email-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-07-03perf intel-pt: Fix decoding to accept CBR between FUP and corresponding TIPAdrian Hunter
commit bd2e49ec48feb1855f7624198849eea4610e2286 upstream. It is possible to have a CBR packet between a FUP packet and corresponding TIP packet. Stop treating it as an error. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/1527762225-26024-3-git-send-email-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-07-03perf intel-pt: Fix sync_switch INTEL_PT_SS_NOT_TRACINGAdrian Hunter
commit dbcb82b93f3e8322891e47472c89e63058b81e99 upstream. sync_switch is a facility to synchronize decoding more closely with the point in the kernel when the context actually switched. In one case, INTEL_PT_SS_NOT_TRACING state was not correctly transitioning to INTEL_PT_SS_TRACING state due to a missing case clause. Add it. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/1527762225-26024-2-git-send-email-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-07-03perf tools: Fix symbol and object code resolution for vdso32 and vdsox32Adrian Hunter
commit aef4feace285f27c8ed35830a5d575bec7f3e90a upstream. Fix __kmod_path__parse() so that perf tools does not treat vdso32 and vdsox32 as kernel modules and fail to find the object. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Wang Nan <wangnan0@huawei.com> Cc: stable@vger.kernel.org Fixes: 1f121b03d058 ("perf tools: Deal with kernel module names in '[]' correctly") Link: http://lkml.kernel.org/r/1528117014-30032-3-git-send-email-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-05-30Merge tag 'v4.4.134' into linux-linaro-lsk-v4.4Mark Brown
This is the 4.4.134 stable release
2018-05-30perf report: Fix memory corruption in --branch-history mode --branch-historyJiri Olsa
[ Upstream commit e3ebaa465136ecfedf9c6f4671df02bf625f8125 ] Jin Yao reported memory corrupton in perf report with branch info used for stack trace: > Following command lines will cause perf crash. > perf record -j call -g -a <application> > perf report --branch-history > > *** Error in `perf': double free or corruption (!prev): 0x00000000104aa040 *** > ======= Backtrace: ========= > /lib/x86_64-linux-gnu/libc.so.6(+0x77725)[0x7f6b37254725] > /lib/x86_64-linux-gnu/libc.so.6(+0x7ff4a)[0x7f6b3725cf4a] > /lib/x86_64-linux-gnu/libc.so.6(cfree+0x4c)[0x7f6b37260abc] > perf[0x51b914] > perf(hist_entry_iter__add+0x1e5)[0x51f305] > perf[0x43cf01] > perf[0x4fa3bf] > perf[0x4fa923] > perf[0x4fd396] > perf[0x4f9614] > perf(perf_session__process_events+0x89e)[0x4fc38e] > perf(cmd_report+0x15d2)[0x43f202] > perf[0x4a059f] > perf(main+0x631)[0x427b71] > /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0)[0x7f6b371fd830] > perf(_start+0x29)[0x427d89] For the cumulative output, we allocate the he_cache array based on the --max-stack option value and populate it with data from 'callchain_cursor'. The --max-stack option value does not ensure now the limit for number of callchain_cursor nodes, so the cumulative iter code will allocate smaller array than it's actually needed and cause above corruption. I think the --max-stack limit does not apply here anyway, because we add callchain data as normal hist entries, while the --max-stack control the limit of single entry callchain depth. Using the callchain_cursor.nr as he_cache array count to fix this. Also removing struct hist_entry_iter::max_stack, because there's no longer any use for it. We need more fixes to ensure that the branch stack code follows properly the logic of --max-stack, which is not the case at the moment. Original-patch-by: Jin Yao <yao.jin@linux.intel.com> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Reported-by: Jin Yao <yao.jin@linux.intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20180216123619.GA9945@krava Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-05-30perf tests: Use arch__compare_symbol_names to compare symbolsJiri Olsa
[ Upstream commit ab6e9a99345131cd8e54268d1d0dc04a33f7ed11 ] The symbol search called by machine__find_kernel_symbol_by_name is using internally arch__compare_symbol_names function to compare 2 symbol names, because different archs have different ways of comparing symbols. Mostly for skipping '.' prefixes and similar. In test 1 when we try to find matching symbols in kallsyms and vmlinux, by address and by symbol name. When either is found we compare the pair symbol names by simple strcmp, which is not good enough for reasons explained in previous paragraph. On powerpc this can cause lockup, because even thought we found the pair, the compared names are different and don't match simple strcmp. Following code path is executed, that leads to lockup: - we find the pair in kallsyms by sym->start next_pair: - we compare the names and it fails - we find the pair by sym->name - the pair addresses match so we call goto next_pair because we assume the names match in this case Signed-off-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Acked-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Fixes: 031b84c407c3 ("perf probe ppc: Enable matching against dot symbols automatically") Link: http://lkml.kernel.org/r/20180215122635.24029-10-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-05-30perf callchain: Fix attr.sample_max_stack settingArnaldo Carvalho de Melo
[ Upstream commit 249d98e567e25dd03e015e2d31e1b7b9648f34df ] When setting the "dwarf" unwinder for a specific event and not specifying the max-stack, the attr.sample_max_stack ended up using an uninitialized callchain_param.max_stack, fix it by using designated initializers for that callchain_param variable, zeroing all non explicitely initialized struct members. Here is what happened: # perf trace -vv --no-syscalls --max-stack 4 -e probe_libc:inet_pton/call-graph=dwarf/ ping -6 -c 1 ::1 callchain: type DWARF callchain: stack dump size 8192 perf_event_attr: type 2 size 112 config 0x730 { sample_period, sample_freq } 1 sample_type IP|TID|TIME|ADDR|CALLCHAIN|CPU|PERIOD|RAW|REGS_USER|STACK_USER|DATA_SRC exclude_callchain_user 1 { wakeup_events, wakeup_watermark } 1 sample_regs_user 0xff0fff sample_stack_user 8192 sample_max_stack 50656 sys_perf_event_open failed, error -75 Value too large for defined data type # perf trace -vv --no-syscalls --max-stack 4 -e probe_libc:inet_pton/call-graph=dwarf/ ping -6 -c 1 ::1 callchain: type DWARF callchain: stack dump size 8192 perf_event_attr: type 2 size 112 config 0x730 sample_type IP|TID|TIME|ADDR|CALLCHAIN|CPU|PERIOD|RAW|REGS_USER|STACK_USER|DATA_SRC exclude_callchain_user 1 sample_regs_user 0xff0fff sample_stack_user 8192 sample_max_stack 30448 sys_perf_event_open failed, error -75 Value too large for defined data type # Now the attr.sample_max_stack is set to zero and the above works as expected: # perf trace --no-syscalls --max-stack 4 -e probe_libc:inet_pton/call-graph=dwarf/ ping -6 -c 1 ::1 PING ::1(::1) 56 data bytes 64 bytes from ::1: icmp_seq=1 ttl=64 time=0.072 ms --- ::1 ping statistics --- 1 packets transmitted, 1 received, 0% packet loss, time 0ms rtt min/avg/max/mdev = 0.072/0.072/0.072/0.000 ms 0.000 probe_libc:inet_pton:(7feb7a998350)) __inet_pton (inlined) gaih_inet.constprop.7 (/usr/lib64/libc-2.26.so) __GI_getaddrinfo (inlined) [0xffffaa39b6108f3f] (/usr/bin/ping) # Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Hendrick Brueckner <brueckner@linux.vnet.ibm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Thomas Richter <tmricht@linux.vnet.ibm.com> Cc: Wang Nan <wangnan0@huawei.com> Link: https://lkml.kernel.org/n/tip-is9tramondqa9jlxxsgcm9iz@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-05-18Merge tag 'v4.4.132' into linux-linaro-lsk-v4.4Mark Brown
This is the 4.4.132 stable release
2018-04-24Revert "perf tests: Decompress kernel module before objdump"Greg Kroah-Hartman
This reverts commit b0761b57e0bf11ada4c45e68f4cba1370363d90d which is commit 94df1040b1e6aacd8dec0ba3c61d7e77cd695f26 upstream. It breaks the build of perf on 4.4.y, so I'm dropping it. Reported-by: Pavlos Parissis <pavlos.parissis@gmail.com> Reported-by: Lei Chen <chenl.lei@gmail.com> Reported-by: Maxime Hadjinlian <maxime.hadjinlian@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: David Ahern <dsahern@gmail.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Wang Nan <wangnan0@huawei.com> Cc: kernel-team@lge.com Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-04-24perf intel-pt: Fix timestamp following overflowAdrian Hunter
commit 91d29b288aed3406caf7c454bf2b898c96cfd177 upstream. timestamp_insn_cnt is used to estimate the timestamp based on the number of instructions since the last known timestamp. If the estimate is not accurate enough decoding might not be correctly synchronized with side-band events causing more trace errors. However there are always timestamps following an overflow, so the estimate is not needed and can indeed result in more errors. Suppress the estimate by setting timestamp_insn_cnt to zero. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/1520431349-30689-5-git-send-email-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-04-24perf intel-pt: Fix error recovery from missing TIP packetAdrian Hunter
commit 1c196a6c771c47a2faa63d38d913e03284f73a16 upstream. When a TIP packet is expected but there is a different packet, it is an error. However the unexpected packet might be something important like a TSC packet, so after the error, it is necessary to continue from there, rather than the next packet. That is achieved by setting pkt_step to zero. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/1520431349-30689-4-git-send-email-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-04-24perf intel-pt: Fix sync_switchAdrian Hunter
commit 63d8e38f6ae6c36dd5b5ba0e8c112e8861532ea2 upstream. sync_switch is a facility to synchronize decoding more closely with the point in the kernel when the context actually switched. The flag when sync_switch is enabled was global to the decoding, whereas it is really specific to the CPU. The trace data for different CPUs is put on different queues, so add sync_switch to the intel_pt_queue structure and use that in preference to the global setting in the intel_pt structure. That fixes problems decoding one CPU's trace because sync_switch was disabled on a different CPU's queue. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/1520431349-30689-3-git-send-email-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-04-24perf intel-pt: Fix overlap detection to identify consecutive buffers correctlyAdrian Hunter
commit 117db4b27bf08dba412faf3924ba55fe970c57b8 upstream. Overlap detection was not not updating the buffer's 'consecutive' flag. Marking buffers consecutive has the advantage that decoding begins from the start of the buffer instead of the first PSB. Fix overlap detection to identify consecutive buffers correctly. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/1520431349-30689-2-git-send-email-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-04-13perf tools: Fix copyfile_offset update of output offsetJiri Olsa
[ Upstream commit fa1195ccc0af2d121abe0fe266a1caee8c265eea ] We need to increase output offset in each iteration, not decrease it as we currently do. I guess we were lucky to finish in most cases in first iteration, so the bug never showed. However it shows a lot when working with big (~4GB) size data. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Fixes: 9c9f5a2f1944 ("perf tools: Introduce copyfile_offset() function") Link: http://lkml.kernel.org/r/20180109133923.25406-1-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-04-13perf tests: Decompress kernel module before objdumpNamhyung Kim
[ Upstream commit 94df1040b1e6aacd8dec0ba3c61d7e77cd695f26 ] If a kernel modules is compressed, it should be decompressed before running objdump to parse binary data correctly. This fixes a failure of object code reading test for me. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Adrian Hunter <adrian.hunter@intel.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: David Ahern <dsahern@gmail.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Wang Nan <wangnan0@huawei.com> Cc: kernel-team@lge.com Link: http://lkml.kernel.org/r/20170608073109.30699-8-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-04-13perf report: Ensure the perf DSO mapping matches what libdw seesMilian Wolff
[ Upstream commit 2538b9e2450ae255337c04356e9e0f8cb9ec48d9 ] In some situations the libdw unwinder stopped working properly. I.e. with libunwind we see: ~~~~~ heaptrack_gui 2228 135073.400112: 641314 cycles: e8ed _dl_fixup (/usr/lib/ld-2.25.so) 15f06 _dl_runtime_resolve_sse_vex (/usr/lib/ld-2.25.so) ed94c KDynamicJobTracker::KDynamicJobTracker (/home/milian/projects/compiled/kf5/lib64/libKF5KIOWidgets.so.5.35.0) 608f3 _GLOBAL__sub_I_kdynamicjobtracker.cpp (/home/milian/projects/compiled/kf5/lib64/libKF5KIOWidgets.so.5.35.0) f199 call_init.part.0 (/usr/lib/ld-2.25.so) f2a5 _dl_init (/usr/lib/ld-2.25.so) db9 _dl_start_user (/usr/lib/ld-2.25.so) ~~~~~ But with libdw and without this patch this sample is not properly unwound: ~~~~~ heaptrack_gui 2228 135073.400112: 641314 cycles: e8ed _dl_fixup (/usr/lib/ld-2.25.so) 15f06 _dl_runtime_resolve_sse_vex (/usr/lib/ld-2.25.so) ed94c KDynamicJobTracker::KDynamicJobTracker (/home/milian/projects/compiled/kf5/lib64/libKF5KIOWidgets.so.5.35.0) ~~~~~ Debug output showed me that libdw found a module for the last frame address, but it thinks it belongs to /usr/lib/ld-2.25.so. This patch double-checks what libdw sees and what perf knows. If the mappings mismatch, we now report the elf known to perf. This fixes the situation above, and the libdw unwinder produces the same stack as libunwind. Signed-off-by: Milian Wolff <milian.wolff@kdab.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: http://lkml.kernel.org/r/20170602143753.16907-1-milian.wolff@kdab.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-04-13perf header: Set proper module name when build-id event foundNamhyung Kim
[ Upstream commit 1deec1bd96ccd8beb04d2112a6d12fe20505c3a6 ] When perf processes build-id event, it creates DSOs with the build-id. But it didn't set the module short name (like '[module-name]') so when processing a kernel mmap event of the module, it cannot found the DSO as it only checks the short names. That leads for perf to create a same DSO without the build-id info and it'll lookup the system path even if the DSO is already in the build-id cache. After kernel was updated, perf cannot find the DSO and cannot show symbols in it anymore. You can see this if you have an old data file (w/ old kernel version): $ perf report -i perf.data.old -v |& grep scsi_mod build id event received for /lib/modules/3.19.2-1-ARCH/kernel/drivers/scsi/scsi_mod.ko.gz : cafe1ce6ca13a98a5d9ed3425cde249e57a27fc1 Failed to open /lib/modules/3.19.2-1-ARCH/kernel/drivers/scsi/scsi_mod.ko.gz, continuing without symbols ... The second message didn't show the build-id. With this patch: $ perf report -i perf.data.old -v |& grep scsi_mod build id event received for /lib/modules/3.19.2-1-ARCH/kernel/drivers/scsi/scsi_mod.ko.gz: cafe1ce6ca13a98a5d9ed3425cde249e57a27fc1 /lib/modules/3.19.2-1-ARCH/kernel/drivers/scsi/scsi_mod.ko.gz with build id cafe1ce6ca13a98a5d9ed3425cde249e57a27fc1 not found, continuing without symbols ... Now it shows the build-id but still cannot load the symbol table. This is a different problem which will be fixed in the next patch. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Andi Kleen <andi@firstfloor.org> Cc: David Ahern <dsahern@gmail.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: kernel-team@lge.com Link: http://lkml.kernel.org/r/20170531120105.21731-1-namhyung@kernel.org [ Fix the build on older compilers (debian <= 8, fedora <= 21, etc) wrt kmod_path var init ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-04-13perf trace: Add mmap alias for s390Jiri Olsa
[ Upstream commit 54265664c15a68905d8d67d19205e9a767636434 ] The s390 architecture maps sys_mmap (nr 90) into sys_old_mmap. For this reason perf trace can't find the proper syscall event to get args format from and displays it wrongly as 'continued'. To fix that fill the "alias" field with "old_mmap" for trace's mmap record to get the correct translation. Before: 0.042 ( 0.011 ms): vest/43052 fstat(statbuf: 0x3ffff89fd90 ) = 0 0.042 ( 0.028 ms): vest/43052 ... [continued]: mmap()) = 0x3fffd6e2000 0.072 ( 0.025 ms): vest/43052 read(buf: 0x3fffd6e2000, count: 4096 ) = 6 After: 0.045 ( 0.011 ms): fstat(statbuf: 0x3ffff8a0930 ) = 0 0.057 ( 0.018 ms): mmap(arg: 0x3ffff8a0858 ) = 0x3fffd14a000 0.076 ( 0.025 ms): read(buf: 0x3fffd14a000, count: 4096 ) = 6 Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: David Ahern <dsahern@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/20170531113557.19175-1-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-04-13perf probe: Add warning message if there is unexpected event nameMasami Hiramatsu
[ Upstream commit 9f5c6d8777a2d962b0eeacb2a16f37da6bea545b ] This improve the error message so that user can know event-name error before writing new events to kprobe-events interface. E.g. ====== #./perf probe -x /lib64/libc-2.25.so malloc_get_state* Internal error: "malloc_get_state@GLIBC_2" is an invalid event name. Error: Failed to add events. ====== Reported-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Acked-by: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com> Reviewed-by: Thomas Richter <tmricht@linux.vnet.ibm.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Paul Clarke <pc@us.ibm.com> Cc: bhargavb <bhargavaramudu@gmail.com> Cc: linux-rt-users@vger.kernel.org Link: http://lkml.kernel.org/r/151275040665.24652.5188568529237584489.stgit@devbox Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-26 Merge tag 'v4.4.124' into linux-linaro-lsk-v4.4lsk-v4.4-18.03Alex Shi
This is the 4.4.124 stable release
2018-03-24perf tests kmod-path: Don't fail if compressed modules aren't supportedKim Phillips
[ Upstream commit 805b151a1afd24414706a7f6ae275fbb9649be74 ] __kmod_path__parse() uses is_supported_compression() to determine and parse out compressed module file extensions. On systems without zlib, this test fails and __kmod_path__parse() continues to strcmp "ko" with "gz". Don't do this on those systems. Signed-off-by: Kim Phillips <kim.phillips@arm.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Fixes: 3c8a67f50a1e ("perf tools: Add kmod_path__parse function") Link: http://lkml.kernel.org/r/20170503131402.c66e314460026c80cd787b34@arm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-23 Merge tag 'v4.4.123' into linux-linaro-lsk-v4.4Alex Shi
This is the 4.4.123 stable release
2018-03-22perf session: Don't rely on evlist in pipe modeDavid Carrillo-Cisneros
[ Upstream commit 0973ad97c187e06aece61f685b9c3b2d93290a73 ] Session sets a number parameters that rely on evlist. These parameters are not used in pipe-mode and should not be set, since evlist is unavailable. Fix that. Signed-off-by: David Carrillo-Cisneros <davidcc@google.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: He Kuang <hekuang@huawei.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Paul Turner <pjt@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Simon Que <sque@chromium.org> Cc: Stephane Eranian <eranian@google.com> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/r/20170410201432.24807-6-davidcc@google.com [ Check if file != NULL in perf_session__new(), like when used by builtin-top.c ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-22perf inject: Copy events when reordering events in pipe modeDavid Carrillo-Cisneros
[ Upstream commit 1e0d4f0200e4dbdfc38d818f329d8a0955f7c6f5 ] __perf_session__process_pipe_events reuses the same memory buffer to process all events in the pipe. When reordering is needed (e.g. -b option), events are not immediately flushed, but kept around until reordering is possible, causing memory corruption. The problem is usually observed by a "Unknown sample error" output. It can easily be reproduced by: perf record -o - noploop | perf inject -b > output Committer testing: Before: $ perf record -o - stress -t 2 -c 2 | perf inject -b > /dev/null stress: info: [8297] dispatching hogs: 2 cpu, 0 io, 0 vm, 0 hdd stress: info: [8297] successful run completed in 2s [ perf record: Woken up 3 times to write data ] [ perf record: Captured and wrote 0.000 MB - ] Warning: Found 1 unknown events! Is this an older tool processing a perf.data file generated by a more recent tool? If that is not the case, consider reporting to linux-kernel@vger.kernel.org. $ After: $ perf record -o - stress -t 2 -c 2 | perf inject -b > /dev/null stress: info: [9027] dispatching hogs: 2 cpu, 0 io, 0 vm, 0 hdd stress: info: [9027] successful run completed in 2s [ perf record: Woken up 3 times to write data ] [ perf record: Captured and wrote 0.000 MB - ] no symbols found in /usr/bin/stress, maybe install a debug package? no symbols found in /usr/bin/stress, maybe install a debug package? $ Signed-off-by: David Carrillo-Cisneros <davidcc@google.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: He Kuang <hekuang@huawei.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Paul Turner <pjt@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Simon Que <sque@chromium.org> Cc: Stephane Eranian <eranian@google.com> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/r/20170410201432.24807-3-davidcc@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-22perf probe: Return errno when not hitting any eventKefeng Wang
[ Upstream commit 70946723eeb859466f026274b29c6196e39149c4 ] On old perf, when using 'perf probe -d' to delete an inexistent event, it returns errno, eg, -bash-4.3# perf probe -d xxx || echo $? Info: Event "*:xxx" does not exist. Error: Failed to delete events. 255 But now perf_del_probe_events() will always set ret = 0, different from previous del_perf_probe_events(). After this, it returns errno again, eg, -bash-4.3# ./perf probe -d xxx || echo $? "xxx" does not hit any event. Error: Failed to delete events. 254 And it is more appropriate to return -ENOENT instead of -EPERM. Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Cc: Hanjun Guo <guohanjun@huawei.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Wang Nan <wangnan0@huawei.com> Fixes: dddc7ee32fa1 ("perf probe: Fix an error when deleting probes successfully") Link: http://lkml.kernel.org/r/1489738592-61011-1-git-send-email-wangkefeng.wang@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-22perf tools: Make perf_event__synthesize_mmap_events() scaleStephane Eranian
[ Upstream commit 88b897a30c525c2eee6e7f16e1e8d0f18830845e ] This patch significantly improves the execution time of perf_event__synthesize_mmap_events() when running perf record on systems where processes have lots of threads. It just happens that cat /proc/pid/maps support uses a O(N^2) algorithm to generate each map line in the maps file. If you have 1000 threads, then you have necessarily 1000 stacks. For each vma, you need to check if it corresponds to a thread's stack. With a large number of threads, this can take a very long time. I have seen latencies >> 10mn. As of today, perf does not use the fact that a mapping is a stack, therefore we can work around the issue by using /proc/pid/tasks/pid/maps. This entry does not try to map a vma to stack and is thus much faster with no loss of functonality. The proc-map-timeout logic is kept in case users still want some upper limit. In V2, we fix the file path from /proc/pid/tasks/pid/maps to actual /proc/pid/task/pid/maps, tasks -> task. Thanks Arnaldo for catching this. Committer note: This problem seems to have been elliminated in the kernel since commit : b18cb64ead40 ("fs/proc: Stop trying to report thread stacks"). Signed-off-by: Stephane Eranian <eranian@google.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20170315135059.GC2177@redhat.com Link: http://lkml.kernel.org/r/1489598233-25586-1-git-send-email-eranian@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-22perf sort: Fix segfault with basic block 'cycles' sort dimensionChangbin Du
[ Upstream commit 4b0b3aa6a2756e6115fdf275c521e4552a7082f3 ] Skip the sample which doesn't have branch_info to avoid segmentation fault: The fault can be reproduced by: perf record -a perf report -F cycles Signed-off-by: Changbin Du <changbin.du@intel.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Fixes: 0e332f033a82 ("perf tools: Add support for cycles, weight branch_info field") Link: http://lkml.kernel.org/r/20170313083148.23568-1-changbin.du@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-02-26 Merge tag 'v4.4.118' into linux-linaro-lsk-v4.4Alex Shi
This is the 4.4.118 stable release
2018-02-25perf bench numa: Fixup discontiguous/sparse numa nodesSatheesh Rajendran
[ Upstream commit 321a7c35c90cc834851ceda18a8ee18f1d032b92 ] Certain systems are designed to have sparse/discontiguous nodes. On such systems, 'perf bench numa' hangs, shows wrong number of nodes and shows values for non-existent nodes. Handle this by only taking nodes that are exposed by kernel to userspace. Signed-off-by: Satheesh Rajendran <sathnaga@linux.vnet.ibm.com> Reviewed-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Acked-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Link: http://lkml.kernel.org/r/1edbcd353c009e109e93d78f2f46381930c340fe.1511368645.git.sathnaga@linux.vnet.ibm.com Signed-off-by: Balamuruhan S <bala24@linux.vnet.ibm.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-02-25perf top: Fix window dimensions change handlingJiri Olsa
[ Upstream commit 89d0aeab4252adc2a7ea693637dd21c588bfa2d1 ] The stdio perf top crashes when we change the terminal window size. The reason is that we assumed we get the perf_top pointer as a signal handler argument which is not the case. Changing the SIGWINCH handler logic to change global resize variable, which is checked in the main thread loop. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Tested-by: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: David Ahern <dsahern@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-ysuzwz77oev1ftgvdscn9bpu@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-12-21 Merge tag 'v4.4.107' into linux-linaro-lsk-v4.4Alex Shi
This is the 4.4.107 stable release
2017-12-20perf symbols: Fix symbols__fixup_end heuristic for corner casesDaniel Borkmann
[ Upstream commit e7ede72a6d40cb3a30c087142d79381ca8a31dab ] The current symbols__fixup_end() heuristic for the last entry in the rb tree is suboptimal as it leads to not being able to recognize the symbol in the call graph in a couple of corner cases, for example: i) If the symbol has a start address (f.e. exposed via kallsyms) that is at a page boundary, then the roundup(curr->start, 4096) for the last entry will result in curr->start == curr->end with a symbol length of zero. ii) If the symbol has a start address that is shortly before a page boundary, then also here, curr->end - curr->start will just be very few bytes, where it's unrealistic that we could perform a match against. Instead, change the heuristic to roundup(curr->start, 4096) + 4096, so that we can catch such corner cases and have a better chance to find that specific symbol. It's still just best effort as the real end of the symbol is unknown to us (and could even be at a larger offset than the current range), but better than the current situation. Alexei reported that he recently run into case i) with a JITed eBPF program (these are all page aligned) as the last symbol which wasn't properly shown in the call graph (while other eBPF program symbols in the rb tree were displayed correctly). Since this is a generic issue, lets try to improve the heuristic a bit. Reported-and-Tested-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Fixes: 2e538c4a1847 ("perf tools: Improve kernel/modules symbol lookup") Link: http://lkml.kernel.org/r/bb5c80d27743be6f12afc68405f1956a330e1bc9.1489614365.git.daniel@iogearbox.net Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Sasha Levin <alexander.levin@verizon.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-12-10 Merge tag 'v4.4.105' into linux-linaro-lsk-v4.4Alex Shi
This is the 4.4.105 stable release