我无法通过awk处理数据

时间:2015-09-17 01:50:19

标签: regex linux awk

我正在尝试使用awk处理数据,但我无法达到正确的结果。请让我知道如果在某处做错了 数据: - test.txt

"A","B","ls",,"This,is,the,test",T,
"k",O,"mv",,"This,is,the,2nd test","L",
"C",J,"cd",,"This,is,the,3rd test",,

awk  'BEGIN { FS=","; OFS="|" }  { nf=0; delete f; while ( match($0,/([^,]+)|(\"[^\"]+\")/) ) { f[++nf] = substr($0,RSTART,RLENGTH); $0 = substr($0,RSTART+RLENGTH); };  print f[2],f[3],f[4],f[5] }' test.txt 

输出继电器

"B"|"ls"|"This,is,the,test"|T
O|"mv"|"This,is,the,2nd test"|"L"
J|"cd"|"This,is,the,3rd test"|

但输出应该是这样的

"B"|"ls"||"This,is,the,test"|T
O|"mv"||"This,is,the,2nd test"|"L"
J|"cd"||"This,is,the,3rd test"|

3 个答案:

答案 0 :(得分:1)

awk -F\" '{q="\""; print q$4q"|"q$6q"||"q$8q}'

答案 1 :(得分:1)

awk -vFPAT='"[^"]*"' '{$0=$2"|"$3"||"$4}1' FILE

使用pat

答案 2 :(得分:0)

使用您的新输入和任何awk:

$ cat tst.awk
BEGIN { FS=","; OFS="|" }
{
    # 1) Replace all FSs inside quotes with the value of RS
    #    since we know that RS cannot be present in any record:
    head = ""
    tail = $0
    while( match(tail,/"[^"]+"/) ) {
        trgt = substr(tail,RSTART,RLENGTH)
        gsub(FS,RS,trgt)
        head = head substr(tail,1,RSTART-1) trgt
        tail = substr(tail,RSTART+RLENGTH)
    }
    $0 = head tail

    # 2) re-compile the record to replace FSs with OFSs:
    $1 = $1

    # 3) restore the RSs within quoted fields to FSs:
    gsub(RS,FS)

    # 4) remove the first and last fields:
    gsub("^[^" OFS "]*[" OFS "]|[" OFS "][^" OFS "]*$","")

    print
}

$ awk -f tst.awk file
"B"|"ls"||"This,is,the,test"|T
O|"mv"||"This,is,the,2nd test"|"L"
J|"cd"||"This,is,the,3rd test"|