我跟随空间分离的i / p。 第一列是时间戳,接下来是线程ID。
我想将o / p转换为csv文件
示例输入
04/09/15,08:49:05.001210 [Dispatch#3 (0x1b3b738)] NOTI
04/09/15,08:49:05.118592 [Dispatch#0 (0x1b3b708)] NOTI
04/09/15,08:49:05.225846 [Dispatch#2 (0x1b3b728)] NOTI
04/09/15,08:49:05.361914 [Dispatch#1 (0x1b3b718)] NOTI
04/09/15,08:49:05.469372 [Dispatch#3 (0x1b3b738)] NOTI
04/09/15,08:49:05.569784 [Dispatch#0 (0x1b3b708)] NOTI
04/09/15,08:49:05.738324 [Dispatch#2 (0x1b3b728)] NOTI
04/09/15,08:49:05.851328 [Dispatch#1 (0x1b3b718)] NOTI
04/09/15,08:49:05.965042 [Dispatch#3 (0x1b3b738)] NOTI
04/09/15,08:49:06.041505 [Dispatch#0 (0x1b3b708)] NOTI
04/09/15,08:49:06.151353 [Dispatch#2 (0x1b3b728)] NOTI
04/09/15,08:49:07.814024 [Dispatch#1 (0xb29718)] NOTI
04/09/15,08:49:07.588469 [Dispatch#1 (0xb29718)] NOTI
04/09/15,08:49:07.371815 [Dispatch#0 (0xb29708)] NOTI
04/09/15,08:49:07.160045 [Dispatch#0 (0xb29708)] NOTI
04/09/15,08:49:07.979571 [Dispatch#0 (0xb29708)] NOTI
04/09/15,08:50:08.385921 [Dispatch#0 (0x120e708)] NOTI
04/09/15,08:50:08.450522 [Dispatch#3 (0x120e738)] NOTI
04/09/15,08:50:08.550118 [Dispatch#1 (0x120e718)] NOTI
04/09/15,08:50:08.600923 [Dispatch#0 (0x120e708)] NOTI
采用csv格式的o / p
TimeStamp,Thread1,Thread2,Thread3,Thread4
04/09/15 08:49:05,2,2,2,3
04/09/15 08:49:06,1,0,1,0
04/09/15 08:49:07,3,2,0,0
04/09/15 08:49:08,2,1,0,1
所以我想在特定时间打印每个线程处理的记录数。
所以在上面的例子中, 04/09/15 08:49:07 主题1( 0x1b3b718 )有 3 记录,线程2( 0xb29718 )有 2 个记录,第3个& 4没有任何记录。
请建议是否可以通过awk命令获取此信息。
答案 0 :(得分:0)
如果我理解你要做的正确,那么
awk -F '[,.# ]+' -v OFS=, 'function ts() { return $1 " " $2 } function dump() { print saved, a[0]+0, a[1]+0, a[2]+0, a[3]+0 } BEGIN { print "TimeStamp", "Thread1", "Thread2", "Thread3", "Thread4" } ts() != saved { if(NR != 1) dump(); delete a; saved = ts() } { ++a[$5] } END { dump() }' filename
是一种粗略的方式。
诀窍在于,使用字段分隔符regex [,.# ]+
,行将被拆分,以便时间戳位于字段1和2中,而线程编号位于字段5中。-v OFS=,
选项集输出字段分隔符为逗号,以便输出数据为CSV。然后:
function ts() { # function to build a full timestamp as it is printed
return $1 " " $2 # later
}
function dump() { # function to print a result line. The +0 is to force
# the fields to be numbers, in case one remained empty.
print saved, a[0]+0, a[1]+0, a[2]+0, a[3]+0
}
BEGIN { # in the beginning, print the header line.
print "TimeStamp", "Thread1", "Thread2", "Thread3", "Thread4"
}
ts() != saved { # if the timestamp changed:
if(NR != 1) dump() # if we're not just starting, print the result for
# the last block
delete a # discard counters
saved = ts() # save new timestamp
}
{ ++a[$5] } # increase the counter for the thread this line mentions
END { dump() } # and in the end, print the result for the last block.
附录重新评论:对于动态数量的线程,我们需要对文件进行两次传递。在第一遍中,我们找出有多少线程,在第二遍中我们打印出来。这是因为文件中第一秒的条目可能无法告诉我们所有线程。由于这对于单行而言变得难以处理,因此将以下代码放入文件中:
#!/usr/bin/awk -f
BEGIN {
FS = "[,.# ]+"
OFS = ","
}
function ts() {
return $1 " " $2
}
function dump() {
printf("%s", saved);
for(i = 0; i <= threads; ++i) {
printf("%s%d", OFS, a[i])
}
print ""
}
# NR == FNR is true only for the first pass.
NR == FNR {
threads = $5 > threads ? $5 : threads
next
}
FNR == 1 {
printf("TimeStamp");
for(i = 0; i <= threads; ++i) {
printf("%sThread%d", OFS, i + 1)
}
print "";
}
ts() != saved {
if(FNR != 1) {
dump()
}
delete a
saved = ts()
}
{ ++a[$5] }
END { dump() }
称之为foo.awk
,然后运行
awk -f foo.awk filename filename
请注意,文件名必须提供给awk 两次。它的工作方式几乎相同,只是在打印之前有一个传递,它找到最大的线程数,并且打印是在循环中完成的。