使用Python删除除CSV文件中的一列以外的引号

时间:2019-07-02 18:10:28

标签: python csv

我有一个.csv文件,该文件的所有元素都带有引号,例如:

"one","two","three","here comes "complex," column
with newlines and "\"quotes\""","five"
"six","seven",eight","nine","ten"

这是非常复杂和混乱的,我想删除所有元素中的所有引号,但complex列除外,它始终是列#4。删除所有引号会很棒,但是我发现很难删除它们,因为删除第4列的引号通常会产生倾斜的.csv文件

输出应如下所示:

one,two,three,"here comes "complex," column
with newlines and "quotes"",five
six,seven,eight,nine,ten

我正在寻找

  1. 删除第6列以外的所有引号
  2. 第6列中有换行符,引号和逗号,将保留原样
  3. 确实希望保留格式不变,并且不要用任何其他字符替换任何引号。想要删除它们而不是替换它们

我尝试了

import csv

with open('example.csv', 'rb') as csvfile:
    rowreader = csv.reader(csvfile, delimiter=',', quotechar='`')
    for row in rowreader:
        print row

但不是我真正想要的

2 个答案:

答案 0 :(得分:0)

这不是答案,只是试图帮助OP理解他的输入格式问题,因为到目前为止,他已经问了六个问题,如何解析它,根本没有答案。

鉴于您有此输入(我从您的问题中将,eight"修改为,"eight",以修复/简化示例-不会影响问题):

"one","two","three","here comes "complex," column
with newlines and "\"quotes\""","five"
"six","seven","eight","nine","ten"

记录的第四个字段包含在"中,并且可以包含",和换行符-如何任何工具确定以上含义就是这样:

Record 1:
    Field 1: "one"
    Field 2: "two"
    Field 3: "three"
    Field 4: "here comes "complex," column
             with newlines and "\"quotes\"""
    Field 5: "five"

Record 2:
    Field 1: "six"
    Field 2: "seven"
    Field 3: "eight"
    Field 4: "nine"
    Field 5: "ten"

代替这个(或其他):

Record 1:
    Field 1: "one"
    Field 2: "two"
    Field 3: "three"
    Field 4: "here comes "complex," column
             with newlines and "\"quotes\""","five"
             "six","seven","eight","nine"
    Field 5: "ten"

在以上两种情况下,字段4都用引号引起来,并包含引号,逗号和换行符。鉴于您到目前为止已经告诉我们有关输入格式的信息,因此无法以编程方式分辨出上述哪种数据解释是正确的。

在有效的CSV中(例如,参见https://tools.ietf.org/html/rfc4180或Excel的输出),双引号字段可以包含逗号和/或换行符,没有任何问题,但是任何双引号都必须转义(以{{ 1}}或""),以使CSV明确且可通过工具解析。

答案 1 :(得分:0)

假设您使用的格式正确的.csv每个字段都带有引号,例如:

"one","two","three","here comes ""complex,"" column
with newlines and ""quotes""","five","six","seven","eight","nine","ten"

然后默认的csv.reader将正确读取它,默认的csv.writer配置(QUOTE_MINIMAL)将根据需要重写CSV:

导入csv

with open('example.csv','r',newline='') as fin:
    with open('rewrite.csv','w',newline='') as fout:
        r = csv.reader(fin)
        w = csv.writer(fout)
        for line in r:
            for i,col in enumerate(line,1):
                print(f'Field {i}: {col}')
            w.writerow(line)

输出:

Field 1: one
Field 2: two
Field 3: three
Field 4: here comes "complex," column
with newlines and "quotes"
Field 5: five
Field 6: six
Field 7: seven
Field 8: eight
Field 9: nine
Field 10: ten

rewrite.csv:

one,two,three,"here comes ""complex,"" column
with newlines and ""quotes""",five,six,seven,eight,nine,ten

如果您将反斜杠转义为双引号,则可以帮助使用以下csv.reader

r = csv.reader(fin,doublequote=False,escapechar="\\")

这将显示为:

"one","two","three","here comes \"complex,\" column
with newlines and \"quotes\"","five","six","seven","eight","nine","ten"

有关更多信息,请参见csv文档中的Dialects and Formatting Parameters