Question

我跟随空间分离的i / p。第一列是时间戳，接下来是线程ID。

我想将o / p转换为csv文件

示例输入

04/09/15,08:49:05.001210  [Dispatch#3 (0x1b3b738)] NOTI  
04/09/15,08:49:05.118592  [Dispatch#0 (0x1b3b708)] NOTI  
04/09/15,08:49:05.225846  [Dispatch#2 (0x1b3b728)] NOTI  
04/09/15,08:49:05.361914  [Dispatch#1 (0x1b3b718)] NOTI  
04/09/15,08:49:05.469372  [Dispatch#3 (0x1b3b738)] NOTI  
04/09/15,08:49:05.569784  [Dispatch#0 (0x1b3b708)] NOTI  
04/09/15,08:49:05.738324  [Dispatch#2 (0x1b3b728)] NOTI  
04/09/15,08:49:05.851328  [Dispatch#1 (0x1b3b718)] NOTI  
04/09/15,08:49:05.965042  [Dispatch#3 (0x1b3b738)] NOTI  
04/09/15,08:49:06.041505  [Dispatch#0 (0x1b3b708)] NOTI  
04/09/15,08:49:06.151353  [Dispatch#2 (0x1b3b728)] NOTI  
04/09/15,08:49:07.814024  [Dispatch#1 (0xb29718)] NOTI   
04/09/15,08:49:07.588469  [Dispatch#1 (0xb29718)] NOTI   
04/09/15,08:49:07.371815  [Dispatch#0 (0xb29708)] NOTI   
04/09/15,08:49:07.160045  [Dispatch#0 (0xb29708)] NOTI   
04/09/15,08:49:07.979571  [Dispatch#0 (0xb29708)] NOTI   
04/09/15,08:50:08.385921  [Dispatch#0 (0x120e708)] NOTI  
04/09/15,08:50:08.450522  [Dispatch#3 (0x120e738)] NOTI  
04/09/15,08:50:08.550118  [Dispatch#1 (0x120e718)] NOTI  
04/09/15,08:50:08.600923  [Dispatch#0 (0x120e708)] NOTI

采用csv格式的o / p

TimeStamp,Thread1,Thread2,Thread3,Thread4    
04/09/15 08:49:05,2,2,2,3    
04/09/15 08:49:06,1,0,1,0    
04/09/15 08:49:07,3,2,0,0    
04/09/15 08:49:08,2,1,0,1

所以我想在特定时间打印每个线程处理的记录数。

所以在上面的例子中， 04/09/15 08:49:07 主题1（ 0x1b3b718 ）有 3 记录，线程2（ 0xb29718 ）有 2 个记录，第3个＆ 4没有任何记录。

请建议是否可以通过awk命令获取此信息。

Answer 1

如果我理解你要做的正确，那么

awk -F '[,.# ]+' -v OFS=, 'function ts() { return $1 " " $2 } function dump() { print saved, a[0]+0, a[1]+0, a[2]+0, a[3]+0 } BEGIN { print "TimeStamp", "Thread1", "Thread2", "Thread3", "Thread4" } ts() != saved { if(NR != 1) dump(); delete a; saved = ts() } { ++a[$5] } END { dump() }' filename

是一种粗略的方式。

诀窍在于，使用字段分隔符regex [,.# ]+，行将被拆分，以便时间戳位于字段1和2中，而线程编号位于字段5中。-v OFS=,选项集输出字段分隔符为逗号，以便输出数据为CSV。然后：

function ts() {       # function to build a full timestamp as it is printed
  return $1 " " $2    # later
}

function dump() {     # function to print a result line. The +0 is to force
                      # the fields to be numbers, in case one remained empty.
  print saved, a[0]+0, a[1]+0, a[2]+0, a[3]+0
}

BEGIN {               # in the beginning, print the header line.
  print "TimeStamp", "Thread1", "Thread2", "Thread3", "Thread4"
} 

ts() != saved {       # if the timestamp changed:
  if(NR != 1) dump()  # if we're not just starting, print the result for
                      # the last block
  delete a            # discard counters
  saved = ts()        # save new timestamp
}
{ ++a[$5] }           # increase the counter for the thread this line mentions
END { dump() }        # and in the end, print the result for the last block.

附录重新评论：对于动态数量的线程，我们需要对文件进行两次传递。在第一遍中，我们找出有多少线程，在第二遍中我们打印出来。这是因为文件中第一秒的条目可能无法告诉我们所有线程。由于这对于单行而言变得难以处理，因此将以下代码放入文件中：

#!/usr/bin/awk -f

BEGIN {
  FS  = "[,.# ]+"
  OFS = ","
}

function ts() {
  return $1 " " $2
}

function dump() {
  printf("%s", saved);
  for(i = 0; i <= threads; ++i) {
    printf("%s%d", OFS, a[i])
  }
  print ""
}

# NR == FNR is true only for the first pass.    
NR == FNR {
  threads = $5 > threads ? $5 : threads
  next
}

FNR == 1 {
  printf("TimeStamp");
  for(i = 0; i <= threads; ++i) {
    printf("%sThread%d", OFS, i + 1)
  }
  print "";
} 

ts() != saved {
  if(FNR != 1) {
    dump()
  }

  delete a
  saved = ts()
}
{ ++a[$5] }
END { dump() }

称之为foo.awk，然后运行

awk -f foo.awk filename filename

请注意，文件名必须提供给awk 两次。它的工作方式几乎相同，只是在打印之前有一个传递，它找到最大的线程数，并且打印是在循环中完成的。

awk：用于解析文件并将数据与下一行进行比较并以csv格式打印的命令

1 个答案: