按分钟在文件中聚合行

时间:2018-02-28 06:26:29

标签: linux bash shell awk

我希望按分钟汇总CSV文件中的数据。

预期输出是一分钟内的会话数(第3列)

输入:

30/Jan/2018, 04:01:04, tRhmNvNQG2Ykjz5UrQopDwe
30/Jan/2018, 04:01:09, BKB4UlW2je-fM4vNi5dkO9E
30/Jan/2018, 04:01:19, fjD7lGCc48BBRkPsPNv_iOt
30/Jan/2018, 04:01:32, eqdgbdf54tBBRkPsPNv_iOt
30/Jan/2018, 04:01:46, GhylG7J21i5t-974mGlElWO
30/Jan/2018, 04:01:51, GhylG7J21i5t-974mGlElWO
30/Jan/2018, 04:02:07, GhylG7J21i5t-974mGlElWO
30/Jan/2018, 04:02:17, WnjtqtPr6dqjHoG2YbOD1js
30/Jan/2018, 04:02:28, elz45MJQoPnAJUTQS8Lwkd8
30/Jan/2018, 04:02:38, TUJbbsUZd0txgADVd7PsJrd
30/Jan/2018, 04:02:48, WnjtqtPr6dqjHoG2YbOD1js

预期输出:

30/Jan/2018, 04:01, 6
30/Jan/2018, 04:02, 5

3 个答案:

答案 0 :(得分:2)

使用 awk

$ awk -F":" '{a[$1 FS $2]++; next} END{for(i in a) print i", "a[i]}' file
30/Jan/2018, 04:01, 6
30/Jan/2018, 04:02, 5

-F":"字段分隔符为:

a[$1 FS $2]++;创建关联字段,其中组合字段为12作为键,值为计数

END{for(i in a) print i", "a[i]}:打印所需的结果

注意:这不能保证排序结果。如果您想要根据计数降序排序结果,那么您可以将结果传递给sort

$ awk -F":" '{a[$1 FS $2]++; next} END{for(i in a) print i", "a[i]}' file | sort -t, -nrk3
30/Jan/2018, 04:01, 6
30/Jan/2018, 04:02, 5

答案 1 :(得分:1)

考虑到您的Input_file与显示的示例相同,那么关注awk可能对您有所帮助。

awk -F'[/, :]' '{a[$1"/"$2"/"$3", "$5":"$6]++} END{for(i in a){print i,a[i]}}'  Input_file

答案 2 :(得分:1)

您可以编写一个简单的小bash脚本,该脚本可以读取日志文件名并输出汇总的会话计数。本质上,脚本只是循环访问保持计数的条目,解析分钟,将其与最后一分钟进行比较,如果它们不同,则输出最后的日期/小时:分钟和计数:

#!/bin/bash

fn="${1:-/dev/stdin}"   ## read from file "$1" or stdin

[ -r "$fn" ] || {       ## validate file readable
    printf "error: unable to read from filename or stdin\n" >&2
    exit 1
}

lastdt=     ## declare last date, hour, min, count
lasthr=
lastmn=
declare -i cnt=0

while IFS+=',' read -r dt tm s; do          ## read each csv
    hr="${tm:0:2}"                          ## get hour and minute
    min="${tm:3:2}"
    if [ -n "$lastdt" ]; then               ## do we have a lastdt?
        if [ "$min" != "$lastmn" ]; then    ## if lastmin not current
            printf "%s, %s:%s, %d\n" "$lastdt" "$lasthr" "$lastmn" $cnt
            cnt=0   ## reset count
        fi
    fi

    lastdt="$dt"    ## save last values
    lasthr="$hr"
    lastmn="$min"
    ((cnt++))       ## increment count

done < "$fn"

## output final session count
printf "%s, %s:%s, %d\n" "$lastdt" "$lasthr" "$lastmn" $cnt

示例使用/输出

$ bash logsessions.sh log.csv
30/Jan/2018, 04:01, 6
30/Jan/2018, 04:02, 5

仔细看看,如果您有其他问题,请告诉我。