汇总上一行的结果

时间:2018-02-09 12:52:32

标签: bash python-2.7 shell unix scripting

我想聚合排序文件中的行(大小约为200MB)

我希望通过取平均值来合并每一行的数值,如果一个' uri'值与先前值匹配

要在特殊字符(输入的第3行和第4行中的?)之前用其公用名称合并的字符值

INPUT

date, time-taken, uri
10/Jan/2018, 0.0001, /files/web/images/favicon1.png
10/Jan/2018, 0.0002, /files/web/images/favicon2.png
10/Jan/2018, 0.004, /files/web/Login?jsessionid=32wew.jsp
10/Jan/2018, 0.002, /files/web/Login?jsessionid=78trq.jsp
10/Jan/2018, 0.001, /files/web/userManagement.jsp
10/Jan/2018, 0.003, /files/web/userManagement.jsp

预期输出

date, time-taken, uri
10/Jan/2018, 0.0001, /files/web/images/favicon1.png
10/Jan/2018, 0.0002, /files/web/images/favicon2.png    
10/Jan/2018, 0.003, /files/web/Login.jsp
10/Jan/2018, 0.002, /files/web/userManagement.jsp

1 个答案:

答案 0 :(得分:1)

#!/usr/bin/env bash

declare -A list
while read -r line; do
    iter="$(cut -d, -f3- <<<"$line")"
    list["${iter%%\?*}"]+="+($(cut -d, -f2 <<<"$line"))"
done < input.txt

for line in "${!list[@]}"; do
    retn="$(bc -l <<<"(0 ${list["$line"]}) / $(tr -dc + <<<"${list["$line"]}" | wc -c)" |
            sed "s:0*$::")"
    echo "$retn $line"
done

input.txt的内容:

10/Jan/2018, 0.0001, /files/web/images/favicon1.png
10/Jan/2018, 0.0002, /files/web/images/favicon2.png
10/Jan/2018, 0.004, /files/web/Login?jsessionid=32wew.jsp
10/Jan/2018, 0.002, /files/web/Login?jsessionid=78trq.jsp
10/Jan/2018, 0.001, /files/web/userManagement.jsp
10/Jan/2018, 0.003, /files/web/userManagement.jsp

输出:

.002  /files/web/userManagement.jsp
.003  /files/web/Login
.0001  /files/web/images/favicon1.png
.0002  /files/web/images/favicon2.png

日期被省略,因为当它们在相同文件名的不同实例中不匹配时,没有足够的信息说明如何处理它们。