Question

我有一个需要帮助解析的日志文件

这就是它的样子：

2018-02-19 15:55:50.070 t.a.ApiUploader [INFO] zzz(708473232) uploaded file 'hdfs://fr-de.int.fz.net:4010/user/profile_export/aId=6/empId=4/classId=10/members-x--491eedd6-2e14-488f-8c13-84be2c6f777b.txt.gz' in 4 chunk(s) - total ops: 31, failed ops: 0
2018-02-19 15:55:50.092 t.a.ApiUploader [INFO] zzz(617022301) uploaded file 'hdfs://fr-de.int.fz.net:4010/user/profile_export/aId=6/empId=4/classId=10/members-x-de10af80-4ac5-4b1a-9675-f7aa9da7ecb2.txt.gz' in 5 chunk(s) - total ops: 45, failed ops: 0
2018-02-19 15:55:50.204 t.a.ApiUploader [INFO] zzz(89993157) uploaded file 'hdfs://fr-de.int.fz.net:4010/user/profile_export/aId=6/empId=4/classId=10/members-x-2aa7808e-a209-4bf8-a744-818724cca054.txt.gz' in 4 chunk(s) - total ops: 32, failed ops: 0

现在我要做的是将我的解析结果放在excel文件中，如：

预期输出：

Date,aId,classId,total ops,failed ops
2018-02-19 15:55:50.070,6,10,31,0
2018-02-19 15:55:50.092,6,10,45,0
2018-02-19 15:55:50.204,6,10,32,0

我可以单独获取它，但如何将所有内容组合成逗号分隔格式？是否有bash样本来执行此操作？

cat twr.log | awk -F“”{'print $ 8'} | awk -F“/”{'print $ 8，$ 10'}

这给了我：

aId=6 classId=10
aId=6 classId=10
aId=6 classId=10

约会我这样做了：

cat twr.log | awk -F“”{'print“日期：”$ 1，$ 2'}

Date: 2018-04-19 15:55:50.070
Date: 2018-04-19 15:55:50.092
Date: 2018-04-19 15:55:50.204

感谢任何帮助。

由于

Answer 1

A=c(1,0,1)
B=c(1,0,0)
C=c(1,0,1)
D=c(1,0,0)
E=c(0,0,0)

testframe = data.frame(A=A,B=B,C=C,D=D,E=E)
testframe
#   A B C D E
# 1 1 1 1 1 0
# 2 0 0 0 0 0
# 3 1 0 1 0 0

# transpose
testframex <- t(testframe)
testframex

# remove duplicated rows
testframe1 <- unique(testframex)
testframe1

# transpose again
dupsremoved <- as.data.frame(t(testframe1))
dupsremoved

#   A B E
# 1 1 1 0
# 2 0 0 0
# 3 1 0 0

Answer 2

如果您的网址是固定格式

$ awk -v OFS=, 'BEGIN{print "Date,aId,classId,total ops,failed ops"}
                     {split($8,a,"/"); 
                      sub(/.*=/,"",a[6]); 
                      sub(/.*=/,"",a[8]); 
                      print $1 FS $2,a[6],a[8],$15 $18}' file

Date,aId,classId,total ops,failed ops
2018-02-19 15:55:50.070,6,10,31,0
2018-02-19 15:55:50.092,6,10,45,0
2018-02-19 15:55:50.204,6,10,32,0

否则，您必须在数组a的元素中对您感兴趣的关键字进行模式匹配。

注意针对特殊情况的输出分隔符的特殊处理的hacks $1 FS $2和$15 $18

<强>更新

将其添加到主要块

sum15+=$15; sum18+=$18

这是脚本中的最后一个块。

END {print "sum total ops:",sum15, "sum failed ops:",sum18}

Answer 3

您可以针对此类情况使用match功能：

awk 'BEGIN { OFS =","; print "Date,aId,classId,total ops,failed ops" } 
{ 
    match($8,/aId=([0-9]*)\/.*\/classId=([0-9]*)/,a)
    print $1 " " $2,a[1],a[2],$(NF-3) $NF 
}' YOURFILE

当URL的格式不是很严格时，它可能更容易出错。

Answer 4

parens中正则表达式的部分被捕获为$1然后$2等......

perl -lne 'BEGIN{print "Date,aId,classId,total ops,failed ops"} \
print "$1,$2,$3,$4,$5" if /(\S+ \S+).+?aId=(\d+).+?classId=(\d+).+?total ops:\s*(\d+).+?failed ops:\s*(\d+)/' \
inputFile

作为旁注。您可以管道输出（或某些awk命令），如下所示：

| datamash -sHt , -g classId mean 'total ops' sum 'failed ops' | column ts ,
GroupBy(classId)  mean(total ops)  sum(failed ops)
10                36               0

提取您可能在Excel中查找的数据。 Datamash适用于大多数软件包管理器（apt，pacman等）

Answer 5

以下答案说明了创建/传递变量以跟踪字段的自我记录功能：

awk -v OFS=',' -v date=1 -v time=2 -v url=8 -v url_aid=6 -v url_cid=8 -v total=15 -v failed=18 '
    NR == 1 { print "Date", "aId", "classId", "total ops", "failed ops" }
    {
        split($url, arr, /\//)
        aid = arr[url_aid]; sub(/[^=]+=/, "", aid)
        cid = arr[url_cid]; sub(/[^=]+=/, "", cid)
        sub(/,/, "", $total)
        print $date " " $time, aid, cid, $total, $failed
    }' twr.log

几条评论......

如果将来更改OFS
$total在print之前进行了清理，原因与我们分离标题的原因相同，但也因为在备用版本中这样一个微妙的事情，如{{1}之间缺少逗号}和$total很容易在阅读中掩盖。
上述解决方案很容易处理，并考虑将来是否需要修改。例如......

如果链接未修复，或者需要过滤，则以下内容更合适：

$failed

Answer 6

无需为这个小金块掏出awk和perl s。这是一个仅限Bash的解决方案：

while read date1 date2 _ _ _   _ _ url _ _   _ _ _ _ failed  _ _ total; do
  IFS=/ read _ _ _ _ _  _ _ aid _ classId _ <<< "$url"
  printf "%s,%s,%s,%s,%s\n" "$date1 $date2" "$aid" "$classId" "$total" "${failed%,}"
done < file.log

像Bash这样的感觉是为了解析CSV，不是吗？ ;-)

解析日志并在csv文件中打印某些字段

6 个答案: