我希望按分钟汇总CSV文件中的数据。
预期输出是一分钟内的会话数(第3列)
输入:
30/Jan/2018, 04:01:04, tRhmNvNQG2Ykjz5UrQopDwe
30/Jan/2018, 04:01:09, BKB4UlW2je-fM4vNi5dkO9E
30/Jan/2018, 04:01:19, fjD7lGCc48BBRkPsPNv_iOt
30/Jan/2018, 04:01:32, eqdgbdf54tBBRkPsPNv_iOt
30/Jan/2018, 04:01:46, GhylG7J21i5t-974mGlElWO
30/Jan/2018, 04:01:51, GhylG7J21i5t-974mGlElWO
30/Jan/2018, 04:02:07, GhylG7J21i5t-974mGlElWO
30/Jan/2018, 04:02:17, WnjtqtPr6dqjHoG2YbOD1js
30/Jan/2018, 04:02:28, elz45MJQoPnAJUTQS8Lwkd8
30/Jan/2018, 04:02:38, TUJbbsUZd0txgADVd7PsJrd
30/Jan/2018, 04:02:48, WnjtqtPr6dqjHoG2YbOD1js
预期输出:
30/Jan/2018, 04:01, 6
30/Jan/2018, 04:02, 5
答案 0 :(得分:2)
使用 awk
$ awk -F":" '{a[$1 FS $2]++; next} END{for(i in a) print i", "a[i]}' file
30/Jan/2018, 04:01, 6
30/Jan/2018, 04:02, 5
-F":"
字段分隔符为:
a[$1 FS $2]++;
创建关联字段,其中组合字段为1
和2
作为键,值为计数
END{for(i in a) print i", "a[i]}
:打印所需的结果
注意:这不能保证排序结果。如果您想要根据计数降序排序结果,那么您可以将结果传递给sort
$ awk -F":" '{a[$1 FS $2]++; next} END{for(i in a) print i", "a[i]}' file | sort -t, -nrk3
30/Jan/2018, 04:01, 6
30/Jan/2018, 04:02, 5
答案 1 :(得分:1)
考虑到您的Input_file与显示的示例相同,那么关注awk
可能对您有所帮助。
awk -F'[/, :]' '{a[$1"/"$2"/"$3", "$5":"$6]++} END{for(i in a){print i,a[i]}}' Input_file
答案 2 :(得分:1)
您可以编写一个简单的小bash脚本,该脚本可以读取日志文件名并输出汇总的会话计数。本质上,脚本只是循环访问保持计数的条目,解析分钟,将其与最后一分钟进行比较,如果它们不同,则输出最后的日期/小时:分钟和计数:
#!/bin/bash
fn="${1:-/dev/stdin}" ## read from file "$1" or stdin
[ -r "$fn" ] || { ## validate file readable
printf "error: unable to read from filename or stdin\n" >&2
exit 1
}
lastdt= ## declare last date, hour, min, count
lasthr=
lastmn=
declare -i cnt=0
while IFS+=',' read -r dt tm s; do ## read each csv
hr="${tm:0:2}" ## get hour and minute
min="${tm:3:2}"
if [ -n "$lastdt" ]; then ## do we have a lastdt?
if [ "$min" != "$lastmn" ]; then ## if lastmin not current
printf "%s, %s:%s, %d\n" "$lastdt" "$lasthr" "$lastmn" $cnt
cnt=0 ## reset count
fi
fi
lastdt="$dt" ## save last values
lasthr="$hr"
lastmn="$min"
((cnt++)) ## increment count
done < "$fn"
## output final session count
printf "%s, %s:%s, %d\n" "$lastdt" "$lasthr" "$lastmn" $cnt
示例使用/输出
$ bash logsessions.sh log.csv
30/Jan/2018, 04:01, 6
30/Jan/2018, 04:02, 5
仔细看看,如果您有其他问题,请告诉我。