我正在尝试解析用25
引用的csv。
因此,基本上该文件如下所示:
"'
我尝试解析以下内容:
"'test1'","'test2'","'test3'","'test4'"
"'value1'","'value2'",,"'value4'"
我希望密钥为import csv
from pprint import pprint
inputCsv = "test.csv"
with open(inputCsv, 'r', newline='') as csvfile:
dictReader = csv.DictReader(csvfile, quotechar='"', delimiter=',',
quoting=csv.QUOTE_ALL, doublequote=True)
for line in dictReader:
pprint(line)
# print(line["'test1'"]) # works, but only with "'test1'", not "test1" or 'test1'; also result is 'value1' not value1
-因此我可以使用test1
(而不是line["test1"]
)和它的值为line["'test1'"]
来访问它,而无需附加引号
是否可以不遍历整个字典并在解析后删除每个元素的引号吗?
答案 0 :(得分:3)
您可以定义自己的阅读器来解决迭代期间 的问题(警告:未经测试的代码,但至少应该可以使您入门):
class MyReader(csv.reader):
def __next__(self):
row = super().__next__()
return [value.strip("'") for value in row]
class MyDictReader(csv.DictReader):
def __init__(self, f, fieldnames=None, restkey=None, restval=None,
dialect="excel", *args, **kwds):
super().__init__(f, fieldnames, restkey, restval, dialect, *args, *kwds)
self.reader = MyReader(f, dialect, *args, **kwds)
答案 1 :(得分:1)
这有点round回,但是如果我们以CSV格式读取文件两次,我们将得到所需的内容:
import csv
from pprint import pprint
from io import StringIO
inputCsv = "test.csv"
with open(inputCsv, 'r', newline='') as csvfile:
csvReader = csv.reader(csvfile, quotechar='"', delimiter=',')
dequotedStringIO = StringIO()
csvWriter = csv.writer(dequotedStringIO, quoting=csv.QUOTE_NONE)
csvWriter.writerows(csvReader)
dequotedLines = dequotedStringIO.getvalue().splitlines()
dictReader = csv.DictReader(dequotedLines, quotechar="'")
for line in dictReader:
print(line['test1'])
因此,首先我们有一个直接的csv.reader
来解析外部引号;然后我们将所有数据发送回直接的csv.writer
,并告诉它不要引用任何内容。实际上,这会以尊重CSV语义的方式剥离外部双引号,并且您会得到一个仅包含单引号的兼容CSV文件,您可以将其传递到csv.DictReader
中以获得所需的最终结果。>