我的源文件存在一个问题。考虑我在文件中有以下数据 -
"dfjsdlfkj,fsdkfj,werkj",234234,234234,,"dfsd,etwetr"
这里,分隔符是逗号,但有些字段使用逗号作为数据的一部分。这些字段用双引号括起来。我想从文件中提取几列。
如果我使用cut -d "," -f 1,3
,那么我的输出就像 -
"dfjsdlfkj,werkj"
答案 0 :(得分:1)
我建议您使用csv
解析器。例如,python有一个作为内置模块,因此您只需要导入它:
import sys
import csv
with open(sys.argv[1], newline='') as csvfile:
csvreader = csv.reader(csvfile)
csvwriter = csv.writer(sys.stdout)
for row in csvreader:
csvwriter.writerow([row[e] for e in (0,2)])
假设您的示例行位于名为infile
的输入文件中,请将脚本运行为:
python3 script.py infile
产量:
"dfjsdlfkj,fsdkfj,werkj",234234
答案 1 :(得分:0)
您可以尝试以下方法:
awk -f getFields.awk input.txt
其中input.txt
是您的输入文件,getFields.awk
是:
{
split("",a)
splitLine()
print a[1],a[3]
}
function splitLine(s,indq,t,r,len) {
# Assumptions:
# * spaces before or after commas are ignored
# * spaces at beginning or end of line is ignored
# definition of a quoted parameter:
# - starts with: (^ and $ are regexp characters)
# a) ^"
# b) ,"
# - ends with:
# a) "$
# b) ",
s=$0; k=1
s=removeBlanks(s)
while (s) {
if (substr(s,1,1)=="\"")
indq=2
else {
sub(/[[:blank:]]*,[[:blank:]]*"/,",\"",s)
indq=index(s,",\"")
if (indq) {
t=substr(s,1,indq-1)
splitCommaString(t)
indq=indq+2
}
}
if (indq) {
s=substr(s,indq)
sub(/"[[:blank:]]*,/,"\",",s)
len=index(s,"\",") #find closing quote
if (!len) {
if (match(s,/"$/)) {
len=RSTART-1
}
else
len=length(s)
r=substr(s,1,len)
s=""
} else {
r=substr(s,1,len-1)
s=substr(s,len+2)
}
a[k++]=r
} else {
splitCommaString(s)
s=""
}
}
k=k-1
}
function splitCommaString(t,b,i) {
n=split(t,b,",")
for (i=1; i<=n; i++)
a[k++]=removeBlanks(b[i])
}
function removeBlanks(r) {
sub(/^[[:blank:]]*/,"",r)
sub(/[[:blank:]]*$/,"",r)
return r
}