我无法通过awk脚本处理记录集

时间:2015-09-18 08:34:48

标签: linux awk

我正在尝试处理一组记录,但是我无法获得预期的输出,这组代码无法打印12列(它是空的)。 数据test.txt

"B64NN2",163934,"ALLPMR",22193625,G,"XYX, Test Surgery","31 Orwell Road","TTP","","IP11 7DD","IP11 7DD",,"DMB0406C","2011-09-12","2011-11-02"
"B6PPL1",215969,"ALLPMR",22192331,G,"KBC Medical Test","Open Close","JJK  Cardiff","South Glamorgan","CF15 8DZ","CF15 8DZ",,"DMB4001B","2011-09-12","2013-08-01"


awk  'BEGIN { FS=","; OFS="," }  { nf=0; delete f; while ( match($0,/([^,]+)|(\"[^\"]+\")/) ) { f[++nf] = substr($0,RSTART,RLENGTH); $0 = substr($0,RSTART+RLENGTH); };  print f[1],f[2],f[3],f[4],f[5],f[6],f[7],f[8],f[9],f[11],f[12],f[13],f[14],f[15] }' test.txt 

输出

"B64NN2",163934,"ALLPMR",22193625,G,"XYX, Test Surgery","31 Orwell Road","TTP","","IP11 7DD","DMB0406C","2011-09-12","2011-11-02"
    "B6PPL1",215969,"ALLPMR",22192331,G,"KBC Medical Test","Open Close","JJK  Cardiff","South Glamorgan","CF15 8DZ","DMB4001B","2011-09-12","2013-08-01"

但输出应该是这样的

"B64NN2",163934,"ALLPMR",22193625,G,"XYX, Test Surgery","31 Orwell Road","TTP","","IP11 7DD",,"DMB0406C","2011-09-12","2011-11-02"
        "B6PPL1",215969,"ALLPMR",22192331,G,"KBC Medical Test","Open Close","JJK  Cardiff","South Glamorgan","CF15 8DZ",,"DMB4001B","2011-09-12","2013-08-01"

任何想法。

1 个答案:

答案 0 :(得分:0)

我建议您使用解析器来解析,而不是计算双引号和逗号。这里有一个的例子:

import csv 
import sys 

with open(sys.argv[1], newline='') as csvfile:
    csvreader = csv.reader(csvfile, delimiter=',')
    csvwriter = csv.writer(sys.stdout, quoting=csv.QUOTE_ALL)
    for row in csvreader:
        newrow = row[0:10]
        newrow.extend(row[11:])
        csvwriter.writerow(newrow)

您可以像以下一样运行它:

python3 script.py infile

删除第11个字段并保留空字段:

"B64NN2","163934","ALLPMR","22193625","G","XYX, Test Surgery","31 Orwell Road","TTP","","IP11 7DD","","DMB0406C","2011-09-12","2011-11-02"
"B6PPL1","215969","ALLPMR","22192331","G","KBC Medical Test","Open Close","JJK  Cardiff","South Glamorgan","CF15 8DZ","","DMB4001B","2011-09-12","2013-08-01"