I have a file that looks like this: ( Note : A*, B*, C* are placeholders). The file is delimited by ;
AAAA;BBBB;CCCCCCCC;DD;EEEEEEEE;FF;
AAA1;BBBBB;CCCC;DD;EEEEEEEE;FFFFF;
AAA3;BB;CCCC;DDDDDDDDD;EEEEEEE;FF;
I m trying to write a small script that counts the number of occurrences of the delimiter ;
and if it is lesser or greater than 5, output said line to a text file.
delim=";"
while read line
do
n_of_occ=$(grep -o "$delim" <<< "$line" | wc -l)
if [[ $n_of_occ < 5 ]] || [[ $n_of_occ > 5 ]]
then
echo $line >> outfile
fi
done
For some reason, this doesn't seem to work and my output is garbled. Could someone assist or provide a different way to tackle this? Perhaps with Perl instead of bash?
答案 0 :(得分:3)
使用awk
:
awk -F\; 'NF!=6' file > outfile
答案 1 :(得分:1)
我会采用这种单行方式:
awk '{x=$0}gsub(";","",x)!=5' file
答案 2 :(得分:1)
Perl很容易:
perl -ne 'print if tr/;// != 5' input_file > output_file
-n
逐行读取输入行tr
运算符返回匹配数答案 3 :(得分:1)
使用sed,你可以这样做:
sed '/^\([^;]*;\)\{5\}$/d' file > outfile
删除带有5个逗号(;
)的行,并将输出发送到 outfile 。
done
替换为done <file
[[
替换为((
,将]]
替换为))
,即使用((...))
代替[[...]]
答案 4 :(得分:1)
不幸的是,样本数据中的每一行都有六个分号,这意味着它们都应该被打印出来。但是,这是一个单线Perl解决方案
$ perl -ne'print if tr/;// != 5' aaa.csv
AAAA;BBBB;CCCCCCCC;DD;EEEEEEEE;FF;
AAA1;BBBBB;CCCC;DD;EEEEEEEE;FFFFF;
AAA3;BB;CCCC;DDDDDDDDD;EEEEEEE;FF;