我有大(4GB)分号分隔文件(1.txt
):
- "3321";"<a href='/files/goods/edit/647/'><u>[ID 647]</u></a> Шорты";"2015-09-06 18:39:17";"1590";"1";"500";"";"Лейла";"878785";"Да";"80.140.1.38"
- "2780";"<a href='/files/goods/edit/647/'><u>[ID 647]</u></a> Шорты";"2015-09-06 18:42:51";"1590";"1";"500";"";"Мара";"8664456";"Да";"46.00.00.2"
- "3352";"<a href='/files/goods/edit/698/'><u>[ID 698]</u></a> Deck";"2015-09-06 19:05:42";"990";"1";"400";"";"Ed";"456452";"Нет";"80.26.00.00"
- "3764";"<a href='/files/goods/edit/669/'><u>[ID 669]</u></a> Fish";"2015-09-06 18:36:18";"1390";"1";"530";"";"Ann";"545566";"Нет";"80.00.35.90"
- "3323";"<a href='/files/goods/edit/669/'><u>[ID 669]</u></a> Fish";"2015-09-06 18:54:18";"1390";"1";"530";"";"юрий";"99393";"Да";"85.141.00.100"
- "32763";"<a href='/files/goods/edit/430/'><u>[ID 430]</u></a> Radio";"2015-09-06
我需要按第二列排序1.txt
,并根据第二列名称将所有结果输出到单独的文件。
我这样做:
sed -r -i -e 's#"<a href=\x27\/files\/goods\/edit\/##g' 1.txt | sed -r -i -e 's#\/\x27>#;#g' 1.txt | sort --field-separator=';' --key=2 1.txt
但是现在如何拆分1.txt
文件并将所有相同的ID(第二列)值行放在单独的文件中并计算文件中的记录?包含647_count.txt
,698_count.txt
,669_count.txt
,430_count.txt
等内容。
答案 0 :(得分:2)
尝试使用以下awk
脚本(让我们称之为parser.awk
):
BEGIN { FS=";"; } # field separator
{
if (match($2, /[0-9]+/)) { # matching `ID` value
m=substr($2, RSTART, RLENGTH);
a[m]++; # accumulating number of lines for each `ID`
print > m"_count.txt"; # writing lines pertaining to certain `ID` into respective file
}
}
END {
for(i in a) {
print "mv "i"_count.txt "i"_"a[i]".txt" # renaming files with actual counts
}
}
用法:的
awk -f parser.awk 1.csv | sh
对于您在问题中发布的输入片段,我已获得以下文件列表:
430_1.txt
647_2.txt
669_2.txt
698_1.txt
答案 1 :(得分:2)
击:
err() { echo "$@" >&2; return 1; }
#the line sorting
re='^[^;]*;[^;]*ID ([0-9][0-9]*)'
n=0
while read -r line
do
let n++
if [[ "$line" =~ $re ]]
then
echo "$line" >> "${BASH_REMATCH[1]}_COUNT.csv"
else
err "$n-th line [$line] doesn't match"
fi
done
#rename the ID_COUNT.csv to the real value of lines
shopt -s nullglob
for file in [0-9][0-9]*_COUNT.csv
do
mv -n "$file" "${file//_COUNT/_$(grep -c '^' "$file")}"
done