我有一个.csv文件,该文件的所有元素都带有引号,例如:
"one","two","three","here comes "complex," column
with newlines and "\"quotes\""","five"
"six","seven",eight","nine","ten"
这是非常复杂和混乱的,我想删除所有元素中的所有引号,但complex列除外,它始终是列#4。删除所有引号会很棒,但是我发现很难删除它们,因为删除第4列的引号通常会产生倾斜的.csv文件
输出应如下所示:
one,two,three,"here comes "complex," column
with newlines and "quotes"",five
six,seven,eight,nine,ten
我正在寻找
我尝试了
import csv
with open('example.csv', 'rb') as csvfile:
rowreader = csv.reader(csvfile, delimiter=',', quotechar='`')
for row in rowreader:
print row
但不是我真正想要的
答案 0 :(得分:0)
这不是答案,只是试图帮助OP理解他的输入格式问题,因为到目前为止,他已经问了六个问题,如何解析它,根本没有答案。
鉴于您有此输入(我从您的问题中将,eight"
修改为,"eight"
,以修复/简化示例-不会影响问题):
"one","two","three","here comes "complex," column
with newlines and "\"quotes\""","five"
"six","seven","eight","nine","ten"
记录的第四个字段包含在"
中,并且可以包含"
,,
和换行符-如何任何工具确定以上含义就是这样:
Record 1:
Field 1: "one"
Field 2: "two"
Field 3: "three"
Field 4: "here comes "complex," column
with newlines and "\"quotes\"""
Field 5: "five"
Record 2:
Field 1: "six"
Field 2: "seven"
Field 3: "eight"
Field 4: "nine"
Field 5: "ten"
代替这个(或其他):
Record 1:
Field 1: "one"
Field 2: "two"
Field 3: "three"
Field 4: "here comes "complex," column
with newlines and "\"quotes\""","five"
"six","seven","eight","nine"
Field 5: "ten"
在以上两种情况下,字段4都用引号引起来,并包含引号,逗号和换行符。鉴于您到目前为止已经告诉我们有关输入格式的信息,因此无法以编程方式分辨出上述哪种数据解释是正确的。
在有效的CSV中(例如,参见https://tools.ietf.org/html/rfc4180或Excel的输出),双引号字段可以包含逗号和/或换行符,没有任何问题,但是任何双引号都必须转义(以{{ 1}}或""
),以使CSV明确且可通过工具解析。
答案 1 :(得分:0)
假设您使用的格式正确的.csv每个字段都带有引号,例如:
"one","two","three","here comes ""complex,"" column
with newlines and ""quotes""","five","six","seven","eight","nine","ten"
然后默认的csv.reader
将正确读取它,默认的csv.writer
配置(QUOTE_MINIMAL)将根据需要重写CSV:
导入csv
with open('example.csv','r',newline='') as fin:
with open('rewrite.csv','w',newline='') as fout:
r = csv.reader(fin)
w = csv.writer(fout)
for line in r:
for i,col in enumerate(line,1):
print(f'Field {i}: {col}')
w.writerow(line)
输出:
Field 1: one
Field 2: two
Field 3: three
Field 4: here comes "complex," column
with newlines and "quotes"
Field 5: five
Field 6: six
Field 7: seven
Field 8: eight
Field 9: nine
Field 10: ten
rewrite.csv:
one,two,three,"here comes ""complex,"" column
with newlines and ""quotes""",five,six,seven,eight,nine,ten
如果您将反斜杠转义为双引号,则可以帮助使用以下csv.reader
:
r = csv.reader(fin,doublequote=False,escapechar="\\")
这将显示为:
"one","two","three","here comes \"complex,\" column
with newlines and \"quotes\"","five","six","seven","eight","nine","ten"
有关更多信息,请参见csv
文档中的Dialects and Formatting Parameters。