sed或awk:按段落分组,每个段落的第二行至第n + 1行

时间:2020-02-24 20:04:21

标签: awk sed

我需要计算一个线程转储中相同子段的数量。我无法使用sed提取每个分段的第二条 up n + 1行。也可以使用awk

例如,给定以下示例threaddump.txt

"RMI TCP Accept-0" Id=11 RUNNABLE (in native)
    at java.net.PlainSocketImpl.socketAccept(Native Method)
    at java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:409)
    at java.net.ServerSocket.implAccept(ServerSocket.java:545)
    at java.net.ServerSocket.accept(ServerSocket.java:513)
    at sun.management.jmxremote.LocalRMIServerSocketFactory$1.accept(LocalRMIServerSocketFactory.java:52)
    at sun.rmi.transport.tcp.TCPTransport$AcceptLoop.executeAcceptLoop(TCPTransport.java:400)
    at sun.rmi.transport.tcp.TCPTransport$AcceptLoop.run(TCPTransport.java:372)
    at java.lang.Thread.run(Thread.java:745)

"AMQP Connection 10.170.10.128:5672" Id=227 RUNNABLE (in native)
    at java.net.SocketInputStream.socketRead0(Native Method)
    at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
    at java.net.SocketInputStream.read(SocketInputStream.java:171)
    at java.net.SocketInputStream.read(SocketInputStream.java:141)
    at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
    at java.io.BufferedInputStream.read(BufferedInputStream.java:265)
    at java.io.DataInputStream.readUnsignedByte(DataInputStream.java:288)
    at com.rabbitmq.client.impl.Frame.readFrom(Frame.java:95)
    at com.rabbitmq.client.impl.SocketFrameHandler.readFrame(SocketFrameHandler.java:139)
    at com.rabbitmq.client.impl.AMQConnection$MainLoop.run(AMQConnection.java:536)
    at java.lang.Thread.run(Thread.java:745)

"http-bio-10.104.42.237-16210-exec-12" Id=90 RUNNABLE (in native)
    at java.net.SocketInputStream.socketRead0(Native Method)
    at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
    at java.net.SocketInputStream.read(SocketInputStream.java:171)
    at java.net.SocketInputStream.read(SocketInputStream.java:141)
    at org.apache.coyote.http11.InternalInputBuffer.fill(InternalInputBuffer.java:534)
    at org.apache.coyote.http11.InternalInputBuffer.fill(InternalInputBuffer.java:519)
    at org.apache.coyote.http11.Http11Processor.setRequestLineReadTimeout(Http11Processor.java:174)
    at org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1048)
    at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:637)
    at org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:318)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
    at java.lang.Thread.run(Thread.java:745)

"Signal Dispatcher" Id=6 RUNNABLE

"kafcli-poller-10" Id=277 RUNNABLE (in native)
    at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
    at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
    at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93)
    at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
    at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)
    at org.apache.kafka.common.network.Selector.select(Selector.java:686)
    at org.apache.kafka.common.network.Selector.poll(Selector.java:408)
    at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:460)
    at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:261)
    at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:233)
    at org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:1171)
    at org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1115)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)

"localhost-startStop-1-SendThread(zk0007.svc.prod.wd1.wd:2181)" Id=59 RUNNABLE (in native)
    at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
    at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
    at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93)
    at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
    at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)
    at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:345)
    at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1214)

如果n = 3,则输出为(请注意每个子堆栈开头的计数):

2   at java.net.SocketInputStream.socketRead0(Native Method)
    at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
    at java.net.SocketInputStream.read(SocketInputStream.java:171)

2   at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
    at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
    at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93)

1   at java.net.PlainSocketImpl.socketAccept(Native Method)
    at java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:409)
    at java.net.ServerSocket.implAccept(ServerSocket.java:545)

因为

at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
at java.net.SocketInputStream.read(SocketInputStream.java:171)

在线程转储中出现两次;等等等等。

这是一个三步过程:

  1. 提取所有可运行的参数,也就是可运行的堆栈。使用以下syntax成功完成此操作:

cat threaddump.txt | sed -e '/./{H;$!d;}' -e 'x;/ RUNNABLE/!d;' > RUNNABLE.txt

  1. 对于每个堆栈(或段落),提取第二条 up 第n + 1行。我已经尝试了以下方法的许多不同组合,尝试使用sed的“ q”选项选择行,但无济于事。我不会基于这些examples列出所有其他尝试。 awk也可以工作,但无法将保留模式从sed转换为awk。

cat RUNNABLE.txt | sed -e '/./{H;$!d;}' -e 'x;/{2q}/!d;'

  1. 最后,按分段分组。我还没走那么远。但是我的计划是通过删除换行符将每个子堆栈折叠成一行,然后使用 sort ,然后使用 uniq -c

1 个答案:

答案 0 :(得分:2)

以下内容:

# extract first fields from each group
awk -v RS='' -v FS='\n' -v n=3 'NF > n { for (i = 2; i <= n + 1; ++i) print $i; printf "%c", "\0" }' |
# sort and uniq
sort -z | uniq -zc | sort -zrnk1 |
# some messy output formatting
sed 's/\x00//g; s/^ *\([0-9]\+\) */#\n\1#/; 1s/^#\n//; s/^ *at/#at/' | column -t -s'#' -o '   '

输出:

2   at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
    at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
    at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93)

2   at java.net.SocketInputStream.socketRead0(Native Method)
    at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
    at java.net.SocketInputStream.read(SocketInputStream.java:171)

1   at java.net.PlainSocketImpl.socketAccept(Native Method)
    at java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:409)
    at java.net.ServerSocket.implAccept(ServerSocket.java:545)
  • 记录分隔符设置为空行。这样,我用awk一次读取了每个段落,因为它们之间用空行分隔。字段分隔符为一行。因此,在每个段落中,可以使用单独的$num变量轻松访问每一行。然后,我只输出从2n+1的行以从每个段落中提取行。这些行的后缀为零字节。
  • sort -z | uniq -zc然后计算计数。
  • sort -zrnk1然后使用uniq输出的数字对其进行排序。
  • 然后,通过混乱的sed传递到column来完成漂亮的列化输出。