Question

我有一个.csv文件，该文件的所有元素都带有引号，例如：

"one","two","three","here comes "complex," column
with newlines and "\"quotes\""","five"
"six","seven",eight","nine","ten"

这是非常复杂和混乱的，我想删除所有元素中的所有引号，但complex列除外，它始终是列＃4。删除所有引号会很棒，但是我发现很难删除它们，因为删除第4列的引号通常会产生倾斜的.csv文件

输出应如下所示：

one,two,three,"here comes "complex," column
with newlines and "quotes"",five
six,seven,eight,nine,ten

我正在寻找

删除第6列以外的所有引号
第6列中有换行符，引号和逗号，将保留原样
确实希望保留格式不变，并且不要用任何其他字符替换任何引号。想要删除它们而不是替换它们

我尝试了

import csv

with open('example.csv', 'rb') as csvfile:
    rowreader = csv.reader(csvfile, delimiter=',', quotechar='`')
    for row in rowreader:
        print row

但不是我真正想要的

Answer 1

这不是答案，只是试图帮助OP理解他的输入格式问题，因为到目前为止，他已经问了六个问题，如何解析它，根本没有答案。

鉴于您有此输入（我从您的问题中将,eight"修改为,"eight"，以修复/简化示例-不会影响问题）：

"one","two","three","here comes "complex," column
with newlines and "\"quotes\""","five"
"six","seven","eight","nine","ten"

记录的第四个字段包含在"中，并且可以包含"，,和换行符-如何任何工具确定以上含义就是这样：

Record 1:
    Field 1: "one"
    Field 2: "two"
    Field 3: "three"
    Field 4: "here comes "complex," column
             with newlines and "\"quotes\"""
    Field 5: "five"

Record 2:
    Field 1: "six"
    Field 2: "seven"
    Field 3: "eight"
    Field 4: "nine"
    Field 5: "ten"

代替这个（或其他）：

Record 1:
    Field 1: "one"
    Field 2: "two"
    Field 3: "three"
    Field 4: "here comes "complex," column
             with newlines and "\"quotes\""","five"
             "six","seven","eight","nine"
    Field 5: "ten"

在以上两种情况下，字段4都用引号引起来，并包含引号，逗号和换行符。鉴于您到目前为止已经告诉我们有关输入格式的信息，因此无法以编程方式分辨出上述哪种数据解释是正确的。

在有效的CSV中（例如，参见https://tools.ietf.org/html/rfc4180或Excel的输出），双引号字段可以包含逗号和/或换行符，没有任何问题，但是任何双引号都必须转义（以{{ 1}}或""），以使CSV明确且可通过工具解析。

Answer 2

假设您使用的格式正确的.csv每个字段都带有引号，例如：

"one","two","three","here comes ""complex,"" column
with newlines and ""quotes""","five","six","seven","eight","nine","ten"

然后默认的csv.reader将正确读取它，默认的csv.writer配置（QUOTE_MINIMAL）将根据需要重写CSV：

导入csv

with open('example.csv','r',newline='') as fin:
    with open('rewrite.csv','w',newline='') as fout:
        r = csv.reader(fin)
        w = csv.writer(fout)
        for line in r:
            for i,col in enumerate(line,1):
                print(f'Field {i}: {col}')
            w.writerow(line)

输出：

Field 1: one
Field 2: two
Field 3: three
Field 4: here comes "complex," column
with newlines and "quotes"
Field 5: five
Field 6: six
Field 7: seven
Field 8: eight
Field 9: nine
Field 10: ten

rewrite.csv：

one,two,three,"here comes ""complex,"" column
with newlines and ""quotes""",five,six,seven,eight,nine,ten

如果您将反斜杠转义为双引号，则可以帮助使用以下csv.reader：

r = csv.reader(fin,doublequote=False,escapechar="\\")

这将显示为：

"one","two","three","here comes \"complex,\" column
with newlines and \"quotes\"","five","six","seven","eight","nine","ten"

有关更多信息，请参见csv文档中的Dialects and Formatting Parameters。

使用Python删除除CSV文件中的一列以外的引号

2 个答案: