Counting number of character occurrences per line

时间:2016-05-17 11:20:05

标签: bash perl find-occurrences

I have a file that looks like this: ( Note : A*, B*, C* are placeholders). The file is delimited by ;

AAAA;BBBB;CCCCCCCC;DD;EEEEEEEE;FF;
AAA1;BBBBB;CCCC;DD;EEEEEEEE;FFFFF;
AAA3;BB;CCCC;DDDDDDDDD;EEEEEEE;FF;

I m trying to write a small script that counts the number of occurrences of the delimiter ; and if it is lesser or greater than 5, output said line to a text file.

delim=";"

while read line
do  
    n_of_occ=$(grep -o "$delim" <<< "$line" | wc -l)

    if [[ $n_of_occ < 5 ]] || [[ $n_of_occ > 5 ]]
    then
        echo $line >> outfile
    fi
done

For some reason, this doesn't seem to work and my output is garbled. Could someone assist or provide a different way to tackle this? Perhaps with Perl instead of bash?

5 个答案:

答案 0 :(得分:3)

使用awk

这很容易

awk -F\; 'NF!=6' file > outfile

答案 1 :(得分:1)

我会采用这种单行方式:

awk '{x=$0}gsub(";","",x)!=5' file

答案 2 :(得分:1)

Perl很容易:

perl -ne 'print if tr/;// != 5' input_file > output_file
  • -n逐行读取输入行
  • tr运算符返回匹配数

答案 3 :(得分:1)

使用sed,你可以这样做:

sed '/^\([^;]*;\)\{5\}$/d' file > outfile

删除带有5个逗号(;)的行,并将输出发送到 outfile

<小时/> 或者,如果您希望自己的代码有效,请进行以下更改:

  1. done替换为done <file
  2. [[替换为((,将]]替换为)),即使用((...))代替[[...]]

答案 4 :(得分:1)

不幸的是,样本数据中的每一行都有六个分号,这意味着它们都应该被打印出来。但是,这是一个单线Perl解决方案

$ perl -ne'print if tr/;// != 5' aaa.csv
AAAA;BBBB;CCCCCCCC;DD;EEEEEEEE;FF;
AAA1;BBBBB;CCCC;DD;EEEEEEEE;FFFFF;
AAA3;BB;CCCC;DDDDDDDDD;EEEEEEE;FF;