我正在尝试处理一组记录,但是我无法获得预期的输出,这组代码无法打印12列(它是空的)。 数据test.txt
"B64NN2",163934,"ALLPMR",22193625,G,"XYX, Test Surgery","31 Orwell Road","TTP","","IP11 7DD","IP11 7DD",,"DMB0406C","2011-09-12","2011-11-02"
"B6PPL1",215969,"ALLPMR",22192331,G,"KBC Medical Test","Open Close","JJK Cardiff","South Glamorgan","CF15 8DZ","CF15 8DZ",,"DMB4001B","2011-09-12","2013-08-01"
awk 'BEGIN { FS=","; OFS="," } { nf=0; delete f; while ( match($0,/([^,]+)|(\"[^\"]+\")/) ) { f[++nf] = substr($0,RSTART,RLENGTH); $0 = substr($0,RSTART+RLENGTH); }; print f[1],f[2],f[3],f[4],f[5],f[6],f[7],f[8],f[9],f[11],f[12],f[13],f[14],f[15] }' test.txt
输出
"B64NN2",163934,"ALLPMR",22193625,G,"XYX, Test Surgery","31 Orwell Road","TTP","","IP11 7DD","DMB0406C","2011-09-12","2011-11-02"
"B6PPL1",215969,"ALLPMR",22192331,G,"KBC Medical Test","Open Close","JJK Cardiff","South Glamorgan","CF15 8DZ","DMB4001B","2011-09-12","2013-08-01"
但输出应该是这样的
"B64NN2",163934,"ALLPMR",22193625,G,"XYX, Test Surgery","31 Orwell Road","TTP","","IP11 7DD",,"DMB0406C","2011-09-12","2011-11-02"
"B6PPL1",215969,"ALLPMR",22192331,G,"KBC Medical Test","Open Close","JJK Cardiff","South Glamorgan","CF15 8DZ",,"DMB4001B","2011-09-12","2013-08-01"
任何想法。
答案 0 :(得分:0)
我建议您使用csv解析器来解析csv,而不是计算双引号和逗号。这里有一个python的例子:
import csv
import sys
with open(sys.argv[1], newline='') as csvfile:
csvreader = csv.reader(csvfile, delimiter=',')
csvwriter = csv.writer(sys.stdout, quoting=csv.QUOTE_ALL)
for row in csvreader:
newrow = row[0:10]
newrow.extend(row[11:])
csvwriter.writerow(newrow)
您可以像以下一样运行它:
python3 script.py infile
删除第11个字段并保留空字段:
"B64NN2","163934","ALLPMR","22193625","G","XYX, Test Surgery","31 Orwell Road","TTP","","IP11 7DD","","DMB0406C","2011-09-12","2011-11-02"
"B6PPL1","215969","ALLPMR","22192331","G","KBC Medical Test","Open Close","JJK Cardiff","South Glamorgan","CF15 8DZ","","DMB4001B","2011-09-12","2013-08-01"