如何使用Python转义特定.csv列中的所有单双引号?

时间:2018-09-27 23:17:38

标签: python python-2.7 csv

  • 使用Python 2.7.6
  • 无需使用Pandas库的解决方案

我的.csv文件带有特定的(文本)列,其单元格偶尔会包含双引号(“)。在ArcMap中转换为shapefile时,这些单双引号会导致错误的转换。必须将它们“转义”。

我需要一个脚本来编辑.csv,以便它:

  1. 将“的所有实例替换为”。
  2. 将每个单元格都用双引号引起来。

我的脚本:

import csv

with open(Source_CSV, 'r') as file1, open('OUTPUT2.csv','w') as file2:
    reader = csv.reader(file1)  

    # Write column headers without quotes
    headers = reader.next()
    str1 = ''.join(headers)
    writer = csv.writer(file2)
    writer.writerow(headers)

    # Write all other rows with quotes
    writer = csv.writer(file2, quoting=csv.QUOTE_ALL)
    for row in reader:
        writer.writerow(row)

此脚本在 ALL 列中成功完成了上述两项任务。

例如,原始的.csv:

Column 1, Column 2, Column 3, Column 4 
Fred, Flintstone, 5'10", black hair 
Wilma, Flintstone, five feet seven inches, red hair 
Barney, Rubble, 5 feet 2" inches, blond hair 
Betty, Rubble, 5 foot 7, black hair

成为这个:

Column 1, Column 2, Column 3, Column 4
"Fred"," Flintstone"," 5'10"""," black hair"
"Wilma"," Flintstone"," five feet seven inches"," red hair"
"Barney"," Rubble"," 5 feet 2"" inches"," blond hair"
"Betty"," Rubble"," 5 foot 7"," black hair"

但是,如果我只想在第3列中完成此操作(有时实际上会用双引号引起来)怎么办?

换句话说,我怎么能得到这个...?

Column 1, Column 2, Column 3, Column 4
Fred, Flintstone," 5'10""", black hair
Wilma, Flintstone," five feet seven inches", red hair
Barney, Rubble," 5 feet 2"" inches", blond hair
Betty, Rubble," 5 foot 7", black hair

2 个答案:

答案 0 :(得分:1)

仅引用其中带有双引号的字段是否足够?如果是这样,csv模块的默认行为将起作用,尽管我在解析输入文件时添加了skipinitialspace=True,因此它不会将逗号后的空格视为有效的空格。

也根据csv模块文档,我以二进制模式打开了文件。

import csv

with open('input.csv','rb') as file1, open('output.csv','wb') as file2:
    reader = csv.reader(file1,skipinitialspace=True)  
    writer = csv.writer(file2)

    for row in reader:
        writer.writerow(row)

输入:

Column 1, Column 2, Column 3, Column 4
Fred, Flintstone, 5'10", black hair
Wilma, Flintstone, five feet seven inches, red hair
Barney, Rubble, 5 feet 2" inches, blond hair
Betty, Rubble, 5 foot 7, black hair

输出:

Column 1,Column 2,Column 3,Column 4
Fred,Flintstone,"5'10""",black hair
Wilma,Flintstone,five feet seven inches,red hair
Barney,Rubble,"5 feet 2"" inches",blond hair
Betty,Rubble,5 foot 7,black hair

如果您需要引用第3列的每一行,则可以手动进行引用。我已将csv模块设置为不加引号,并将引号设置为不应在输入中出现的不可打印的控制字符:

import csv

with open('input.csv','rb') as file1, open('output.csv','wb') as file2:
    reader = csv.reader(file1,skipinitialspace=True)
    writer = csv.writer(file2,quoting=csv.QUOTE_NONE,quotechar='\x01')

    # Write column headers without quotes
    headers = reader.next()
    writer.writerow(headers)

    # Write 3rd column with quotes
    for row in reader:
        row[2] = '"' + row[2].replace('"','""') + '"'
        writer.writerow(row)

输出:

Column 1,Column 2,Column 3,Column 4
Fred,Flintstone,"5'10""",black hair
Wilma,Flintstone,"five feet seven inches",red hair
Barney,Rubble,"5 feet 2"" inches",blond hair
Betty,Rubble,"5 foot 7",black hair

答案 1 :(得分:0)

您可以尝试以下方法:

    import csv
with open("file.csv", "rU") as fin:
    words = fin.readlines()

with open("cleaned.csv", "w") as fout:
    writer = csv.writer(fout, quoting=csv.QUOTE_ALL, quotechar = '"', doublequote = True)
    for row in words:
        row = row.replace("\n", "")
        newrow = []
        for word in row.split(","): 
            newrow.append(word.strip())
        writer.writerow(newrow)

首先尝试打开它,将其作为简单的文本文件读取,以绕过格式错误的csv文件。然后,我们通常将其写入csv文件。