1. 首页
  2. IT资讯


摘要:Linux服务器上经常遇到一些系统和应用上的问题,如何分析排查,需要利器,下面总结列表了一些常用工具、trace tool;最后也列举了最近hadoop社区在开发发展的分布式系统的trace tool。 概览: 引用linux-performance-analysis-and-tools中图片,

Linux服务器上经常遇到一些系统和应用上的问题,如何分析排查,需要利器,下面总结列表了一些常用工具、trace tool;最后也列举了最近hadoop社区在开发发展的分布式系统的trace tool。

概览: http://www.brendangregg.com/index.html http://www.slideshare.net/brendangregg/linux-performance-analysis-and-tools https://github.com/brendangregg/perf-tools/ http://www.brendangregg.com/linuxperf.html 引用linux-performance-analysis-and-tools中图片,说明这些tool试用层次位置 Linux常用性能调优工具索引其中提到了的工具,大部分在我日常工具箱里或者在实践的案例里面使用过, 都有很高的价值,这里方便大家索引下:

  • nicstat: 参见这里
  • oprofile: 参见这里
  • perf: 参见这里
  • systemtap: 参见这里
  • iotop: 参见这里
  • blktrace: 参见这里
  • dstat: 参见这里
  • strace: 参见这里
  • pidstat: 参见这里
  • vmstat: 参见这里
  • slabtop: 参见这里
  • tcpdump: 参见这里
  • free: 参见这里
  • mpstat: 参见这里
  • netstat: 参见这里
  • tcprstat: 参见这里

OS系统命令 系统信息(RHEL/Fedora)

  • uname -a 或 cat /proc/version #print system information
  • Linux hadoopst2.cm6 2.6.18-164.el5 #1 SMP Tue Aug 18 15:51:48 EDT 2009 x86_64 x86_64 x86_64 GNU/Linux
  • uptime
  • 15:42:46 up 674 days, 6 min, 35 users, load average: 1.30, 5.97, 11.53
  • cat /etc/redhat-release
  • Red Hat Enterprise Linux Server release 5.4 (Tikanga)
  • lsb_release
  • LSB Version: :core-3.1-amd64:core-3.1-ia32:core-3.1-noarch:graphics-3.1-amd64:graphics-3.1-ia32:graphics-3.1-noarch
  • cat /proc/cpuinfo
  • cat /proc/meminfo
  • lspci – list all PCI devices
  • lsusb – list USB devices
  • last, lastb – show listing of last logged in users
  • lsmod — show the status of modules in the Linux Kernel
  • modprobe – add and remove modules from the Linux Kernel


  • ps
  • To print a process tree: ps -ejH / ps axjf
  • To get info about threads: ps -eLf / ps axms
  • ulimit -a
  • lsof – list open files, UNIX一切皆文件
  • lsof -p PID
  • rpm/yum
  • rpm -qf FILE #文件所属rpm包
  • rpm -ql RPM #rpm包含文件
  • /var/log/yum.log #yum 更新包日志
  • /etc/XXX #系统级程序配置目录, 如
  • /etc/yum.repos.d/ yum源配置
  • /var/log/XXX #日志目录, 如
  • /var/log/cron #crontab日志,可以查看调度执行情况
  • ntpd – Network Time Protocol (NTP) daemon,同步集群中机器时间
  • squid – proxy caching server,集群WebUI的代理


  • mpstat – Report processors related statistics. 注意%sys%iowait值
  • vmstat – Report virtual memory statistics
  • iostat – Report Central Processing Unit (CPU) statistics and input/output statistics for devices and partitions.
  • netstat – Print network connections, routing tables, interface statistics, masquerade connections, and multicast memberships
  • netstat -atpn | grep PID
  • ganglia– a scalable distributed monitoring system for high-performance computing systems such as clusters and Grids.
  • sar/tsar – Collect, report, or save system activity information; tsar是淘宝自己改进的版本
  • 定时采样(每分钟),可查历史记录(默认5分钟),可弥补ganglia显示更详细信息
  • iftop – the “top” bandwidth consumers shown.iftop wiki
  • iotop
  • vmtouch, Portable file system cache diagnostics and control


  • telnet/nc IP PORT – 确认目标端口是否可访问,只ping通不一定端口可访问,可能防火墙等禁止
  • ifconfig/ifup/ifdown – configure a network interface
  • traceroute – print the route packets trace to network host
  • nslookup – query Internet name servers interactively
  • tcpdump – dump traffic on a network, 类似开源工具wireshark,netsniff-ng,更多工具比较
  • lynx – a general purpose distributed information browser for the World Wide Web
  • tcpcp– allows cooperating applications to pass ownership of TCP connection endpoints from one Linux host to another one.

程序/进程相关 静态信息

  • ldconfig – configure dynamic linker run time bindings
  • ldconfig -p | grep SO 查看so是否在link cache中
  • ldd – print shared library dependencies, 查看exe或so依赖的so
  • nm – list symbols from object files,可grep查找是否存在相关的symbol,是否Undefined.
  • readelf – Displays information about ELF files. 可现实elf相关信息,如32/64位,适用的OS,处理器


  • gdb
  • cat /proc/$PID/[cmdline|environ|limits|status|…] – 进程相关信息
  • pstack – print a stack trace of a running process
  • pmap – report memory map of a process


  • JDK Tools and Utilities
  • Java Troubleshooting Tools
  • jinfo – print java process information, 如classpath,java.libary.path(jni so目录)
  • jstack – print a stack trace of a running java process,可查看死锁情况
  • jmap – report memory map of a java process
  • jmap -histo:live 可触发full gc
  • jmap -dump:live,file=$FILE 可dump heap内存,用于jhat等工具debug分析object在heap的占用情况
  • jhat – Heap Dump Browser – Starts a web server on a heap dump file (eg, produced by jmap -dump), allowing the heap to be browsed.
  • 起http服务,浏览器访问查看
  • -J-mxXXXm ,分析大文件时需要加大heap大小
  • 若有对象数据超大或内存占用过多,极有可能memory leak
  • Memory Analyzer (MAT)– eclipse plugin,Java heap analyzer
  • 可视化工具,但受到机器内存的限制,无法分析太大的heap dump file
  • jdb – 可起服务做server,eclipse等工具远程连接调试
  • jstat – Java Virtual Machine Statistics Monitoring Tool
  • jstatd – Virtual Machine jstat Daemon,可配合jvisualvm
  • jvisualvm – Java Virtual Machine Monitoring, Troubleshooting, and Profiling Tool;可远程连接jstatd/jmx, 可视化展示工具:演示
  • jvmtop– In a top-like manner, displays JVM internal metrics (e.g. memory information) of running java processes.
  • JVM performance optimizationJVM开发者写的优化文章
  1. Overview
  1. Compilers
  1. Garbage collection
  1. Concurrently compacting GC
  1. Scalability
  • HPROF – Heap Profiler: java -agentlib:hprof

Trace/Debug/Profiling工具 通用工具

  • 写log,但系统在线或无法源码时
  • strace – trace system calls and signals
  • 示例:strace/ltrace的应用实例
  • 示例:可跟踪系统调用时间,如机器cpu:%sys高的问题

% time seconds usecs/call calls errors syscall —— ———– ———– ——— ——— —————- 67.90 3966.320849 496 7992161 3050250 futex 25.80 1507.326693 127093 11860 epoll_wait ………………..

  • blktrace, generate traces of the i/o traffic on block devices
  • ltrace – A library call tracer
  • xtrace
  • gprof– a performance analysis tool, sampling and call-graph profiling
  • valgrind– an instrumentation framework for building dynamic analysis tools. automatically detect many memory management and threading bugs, and profile your programs in detail
  • systemtap– a simple command line interface and scripting language for writing instrumentation for a live running kernel plus user-space applications for complex tasks that may require live analysis, programmable on-line response, and whole-system symbolic access.
  • Linux版DTrace(SUN在Solaris上开发的)
  • 功能强大,kernel, user-space app,cross language(java perl python ruby),build-in markers(pg mysql)
  • can write and reuse simple scripts to deeply examine the activities of a live system
  • Data can be extracted, filtered, and summarized quickly and safely, to enable diagnoses of complex performance or functional problems
  • 丰富的 “tapset” script library

java trace工具

  • btrace– dynamic tracing tool for the Java platform.UserGuide
  • 基于动态字节码修改技术(Hotswap)来实现运行时java程序的跟踪和替换,实现原理
  • BTrace使用总结
  • 详细介绍
  • byteman– simplifies tracing and testing of Java programs. Can modify a running application without needing to stop and restart it.
  • define rules specifying the side effects you want to inject而BTrace类java语法

Distributed Tracing Tools

  • Dapper, a Large-Scale Distributed Systems Tracing Infrastructure
  • x-trace, a network diagnostic tool designed to provide users and network operators with better visibility into increasingly complex Internet applications.
  • HTrace,a tracing framework intended for use with distributed systems written in java
  • Add Tracing to HDFS
  • Update HTrace for HBase

Linux observability tools| Linux 性能观测工具 Linux benchmarking tools| Linux 性能测评工具 Linux tuning tools| Linux 性能调优工具 Linux observability sar

Brendan Gregg 目前是 Netflix 的高级性能架构师 ,他在那里做大规模计算机性能设计、分析和调优。他是《Systems Performance》等技术书的作者,因在系统管理员方面的成绩,获得过 2013年 USENIX LISA 大奖。他之前是 SUN 公司是性能领头人和内核工程师,他在 SUN 开发过ZFS L2ARC,研究存储和网络性能。他也发明和开发过一大波性能分析工具,很多已集成到操作系统中了 。他的最近工作包括研究性能分析的方法论和可视化,其目标包括Linux内核。

上面这是 Gregg 的简介,正如其中说的,他个人站点上分享了很多Linux 性能相关的资源,都是自己开发的:





来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/29578568/viewspace-2140461/,如需转载,请注明出处,否则将追究法律责任。







QR code