假设我们有以下记录{(XXX1),(XXX2)},whatever
我想要的是,根据以下规则提取信息,最好使用'grep':如果{}包含少于或等于两个UNIQUE元素,()中的那些,然后保留(两者),否则删除整行。作为进一步的步骤,我想提取()中的值,最后以下面的形式写出剩余的行:XXX1,XXX2,whatever
更新
对于以下输入:
{(XXX1),(XXX2)},whatever,unique=2
{(XXX1),(XXX1),(XXX1),(XXX2)},whatever,unique=2
{(XXX1)},whatever,unique=1
{},whatever,unique=0
{(XXX1),(XXX2),(XXX3),(XXX4)},whatever
我应该得到以下输出:
XXX1,XXX2,whatever,unique=2
XXX1,whatever,unique=1
答案 0 :(得分:1)
awk可以做到,检查这个单行:
awk -F'[}{]' '{split($2,a,",");delete(b);for(x in a)b[a[x]]}length(b)<=2' file
让我们做一个小测试:
kent$ cat file
ok,{(XXX1),(XXX2)},whatever,unique=2
ok,{(XXX1),(XXX1),(XXX1),(XXX2)},whatever,unique=2
ok,{(XXX1)},whatever,unique=1
ok,{},whatever,unique=0
nok,{(XXX1),(XXX2),(XXX3),(XXX4)},whatever
kent$ awk -F'[}{]' '{split($2,a,",");delete(b);for(x in a)b[a[x]]}length(b)<=2' file
ok,{(XXX1),(XXX2)},whatever,unique=2
ok,{(XXX1),(XXX1),(XXX1),(XXX2)},whatever,unique=2
ok,{(XXX1)},whatever,unique=1
ok,{},whatever,unique=0
您可以看到,nok
行已被删除
修改强>
awk -F'[}{]' '{gsub(/[()]/,"");split($2,a,",");delete(b);for(x in a)b[a[x]];l=length(b)}l<=2&&l>0{s="";for(x in b)s=s""x",";sub(/,$/,"",s);y[s]=s $3}END{for(x in y)print y[x]}' file
测试
kent$ cat file
{(XXX1),(XXX2)},whatever,unique=2
{(XXX1),(XXX1),(XXX1),(XXX2)},whatever,unique=2
{(XXX1)},whatever,unique=1
{},whatever,unique=0
{(XXX1),(XXX2),(XXX3),(XXX4)},whatever
kent$ awk -F'[}{]' '{gsub(/[()]/,"");split($2,a,",");delete(b);for(x in a)b[a[x]];l=length(b)}l<=2&&l>0{s="";for(x in b)s=s""x",";sub(/,$/,"",s);y[s]=s $3}END{for(x in y)print y[x]}' file
XXX1,XXX2,whatever,unique=2
XXX1,whatever,unique=1