我正在尝试在多个文件中搜索607526个整数条目(保存在数组中),并添加相同的值并存储在文件中.32470个条目花费了1小时45分钟,但尚未完成。您能帮我改善脚本吗? 脚本如下:
#!/bin/bash
my_array=( `grep Curr a.txt | sed -e 's/Time:\(.*\).Num.*/\1/'` )
my_array_length=${#my_array[@]}
echo $my_array_length
rm -rf output
touch output
for element in "${my_array[@]}"
do
# echo "${element}"
toggles=`grep -w "time: ${element}" file_* | awk '{ sum += $6}; END {print sum }'`
echo "Time:"${element}".Num - "$toggles >> output
done
Inptu和输出为:
a.txt
Curr Time:0.Num - 6274
Curr Time:500.Num - 2
Curr Time:1500.Num - 62
Curr Time:2000.Num - 3
Curr Time:2500.Num - 2
Curr Time:3000.Num - 214
Curr Time:3500.Num - 205
Curr Time:4500.Num - 2
Curr Time:5000.Num - 211
Curr Time:5500.Num - 231
file_0
time: 0 count: 517
time: 2000 count: 9
time: 2500 count: 30
time: 4500 count: 14
time: 5000 count: 2
file_1
time: 0 count: 1500
time: 500 count: 10
time: 1500 count: 25
time: 2500 count: 39
time: 4500 count: 26
time: 5500 count: 154
output
Curr Time:0.NumToggles - 2017
Curr Time:500.NumToggles - 11
Curr Time:1500.NumToggles - 25
Curr Time:2000.NumToggles - 9
Curr Time:2500.NumToggles - 69
Curr Time:3000.NumToggles - 0
Curr Time:3500.NumToggles - 0
Curr Time:4500.NumToggles - 40
Curr Time:5000.NumToggles - 2
Curr Time:5500.NumToggles - 154
如果需要,可以在https://i.stack.imgur.com/kFxt8.jpg上找到图片。
答案 0 :(得分:1)
这适用于我的git bash仿真。让我知道它是否阻塞了整个数据集。
awk -v keyfile=a.txt ' { sum[$2] += $4; next; }
END {
while ( getline < keyfile && "$0" ) {
match( $0, "^Curr Time:(.*).Num", key);
printf "Curr Time:%d.NumToggles - %d\n", key[1], sum[key[1]];
}
}
' file_*
逻辑:遍历所有数据文件以求和每个键的值。然后,一个通过主文件以获得完整的密钥集,为每个密钥打印总和。这只会调用一个主过程来读取每个文件,而不是两个来进行初始加载,然后再调用两个来对 every 键的所有数据文件进行完整扫描,这需要数十万次传递文件。
欢迎提问。