我想聚合排序文件中的行(大小约为200MB)
我希望通过取平均值来合并每一行的数值,如果一个' uri'值与先前值匹配
要在特殊字符(输入的第3行和第4行中的?)之前用其公用名称合并的字符值
INPUT
date, time-taken, uri
10/Jan/2018, 0.0001, /files/web/images/favicon1.png
10/Jan/2018, 0.0002, /files/web/images/favicon2.png
10/Jan/2018, 0.004, /files/web/Login?jsessionid=32wew.jsp
10/Jan/2018, 0.002, /files/web/Login?jsessionid=78trq.jsp
10/Jan/2018, 0.001, /files/web/userManagement.jsp
10/Jan/2018, 0.003, /files/web/userManagement.jsp
预期输出
date, time-taken, uri
10/Jan/2018, 0.0001, /files/web/images/favicon1.png
10/Jan/2018, 0.0002, /files/web/images/favicon2.png
10/Jan/2018, 0.003, /files/web/Login.jsp
10/Jan/2018, 0.002, /files/web/userManagement.jsp
答案 0 :(得分:1)
#!/usr/bin/env bash
declare -A list
while read -r line; do
iter="$(cut -d, -f3- <<<"$line")"
list["${iter%%\?*}"]+="+($(cut -d, -f2 <<<"$line"))"
done < input.txt
for line in "${!list[@]}"; do
retn="$(bc -l <<<"(0 ${list["$line"]}) / $(tr -dc + <<<"${list["$line"]}" | wc -c)" |
sed "s:0*$::")"
echo "$retn $line"
done
input.txt
的内容:
10/Jan/2018, 0.0001, /files/web/images/favicon1.png
10/Jan/2018, 0.0002, /files/web/images/favicon2.png
10/Jan/2018, 0.004, /files/web/Login?jsessionid=32wew.jsp
10/Jan/2018, 0.002, /files/web/Login?jsessionid=78trq.jsp
10/Jan/2018, 0.001, /files/web/userManagement.jsp
10/Jan/2018, 0.003, /files/web/userManagement.jsp
输出:
.002 /files/web/userManagement.jsp
.003 /files/web/Login
.0001 /files/web/images/favicon1.png
.0002 /files/web/images/favicon2.png
日期被省略,因为当它们在相同文件名的不同实例中不匹配时,没有足够的信息说明如何处理它们。