awk用于处理cvs文件的文本

时间:2015-11-08 10:51:31

标签: awk text-processing

我有许多大* .cvs文本文件,如下所示:

    Word,Tag,Lemma
    Off,aa,off
    short,aa,short
    and,sfg3eþ,and
    tall,sþghen,tall
    deers,aþ,deer
    in,never,in
    Africa,nc,Africa
    frv.,aa,frv.
    ---,ta,---
    ,,
    All,nhfn,all
    allowed,lhfnsf,allow
    personell,c,personell
    aggr.,lheþsf,aggr.
    with,aþ,with
    23,ta,23
    as.,nvfn,as.
    sillable.,lheþsf,sillable.
    ,,
    Á,aþ,á 

我需要处理这个文件,以便将第一列放在这样的列表中:

    {[Off short and tall deers in Africa frv],[All allowed personnel aggr. with 23 as syllable.],[Á......],...n]}

最后需要:]}

我尝试过:

    awk 'BEGIN {FS=",";print"{["} /",,"/ {print"],["} END {print"]}"}' 079.cvs

只打印: {[ ]}

我也发现了这个:

    cat 080.csv | cut -d ',' -f3 >>D.txt

这实际上非常有用:

    Off
    short
    and 
    tall
    ....

但实际上是" deep"文件并缺少列表元素。

1 个答案:

答案 0 :(得分:0)

更新

awk -F, 'NR==1{printf "{["; next} /^--/||!$1{if(a)printf "],["; a=0; next} {printf "%s ",$1; a=1} END{printf "]}"}' file
{[Off short and tall deers in Africa frv. ],[All allowed personell aggr. with 23 as. sillable. ],[Á ]}