Question

使用Python 2.7.6
无需使用Pandas库的解决方案

我的.csv文件带有特定的（文本）列，其单元格偶尔会包含双引号（“）。在ArcMap中转换为shapefile时，这些单双引号会导致错误的转换。必须将它们“转义”。

我需要一个脚本来编辑.csv，以便它：

将“的所有实例替换为”。
将每个单元格都用双引号引起来。

我的脚本：

import csv

with open(Source_CSV, 'r') as file1, open('OUTPUT2.csv','w') as file2:
    reader = csv.reader(file1)  

    # Write column headers without quotes
    headers = reader.next()
    str1 = ''.join(headers)
    writer = csv.writer(file2)
    writer.writerow(headers)

    # Write all other rows with quotes
    writer = csv.writer(file2, quoting=csv.QUOTE_ALL)
    for row in reader:
        writer.writerow(row)

此脚本在 ALL 列中成功完成了上述两项任务。

例如，原始的.csv：

Column 1, Column 2, Column 3, Column 4 
Fred, Flintstone, 5'10", black hair 
Wilma, Flintstone, five feet seven inches, red hair 
Barney, Rubble, 5 feet 2" inches, blond hair 
Betty, Rubble, 5 foot 7, black hair

成为这个：

Column 1, Column 2, Column 3, Column 4
"Fred"," Flintstone"," 5'10"""," black hair"
"Wilma"," Flintstone"," five feet seven inches"," red hair"
"Barney"," Rubble"," 5 feet 2"" inches"," blond hair"
"Betty"," Rubble"," 5 foot 7"," black hair"

但是，如果我只想在第3列中完成此操作（有时实际上会用双引号引起来）怎么办？

换句话说，我怎么能得到这个...？

Column 1, Column 2, Column 3, Column 4
Fred, Flintstone," 5'10""", black hair
Wilma, Flintstone," five feet seven inches", red hair
Barney, Rubble," 5 feet 2"" inches", blond hair
Betty, Rubble," 5 foot 7", black hair

Answer 1

仅引用其中带有双引号的字段是否足够？如果是这样，csv模块的默认行为将起作用，尽管我在解析输入文件时添加了skipinitialspace=True，因此它不会将逗号后的空格视为有效的空格。

也根据csv模块文档，我以二进制模式打开了文件。

import csv

with open('input.csv','rb') as file1, open('output.csv','wb') as file2:
    reader = csv.reader(file1,skipinitialspace=True)  
    writer = csv.writer(file2)

    for row in reader:
        writer.writerow(row)

输入：

Column 1, Column 2, Column 3, Column 4
Fred, Flintstone, 5'10", black hair
Wilma, Flintstone, five feet seven inches, red hair
Barney, Rubble, 5 feet 2" inches, blond hair
Betty, Rubble, 5 foot 7, black hair

输出：

Column 1,Column 2,Column 3,Column 4
Fred,Flintstone,"5'10""",black hair
Wilma,Flintstone,five feet seven inches,red hair
Barney,Rubble,"5 feet 2"" inches",blond hair
Betty,Rubble,5 foot 7,black hair

如果您需要引用第3列的每一行，则可以手动进行引用。我已将csv模块设置为不加引号，并将引号设置为不应在输入中出现的不可打印的控制字符：

import csv

with open('input.csv','rb') as file1, open('output.csv','wb') as file2:
    reader = csv.reader(file1,skipinitialspace=True)
    writer = csv.writer(file2,quoting=csv.QUOTE_NONE,quotechar='\x01')

    # Write column headers without quotes
    headers = reader.next()
    writer.writerow(headers)

    # Write 3rd column with quotes
    for row in reader:
        row[2] = '"' + row[2].replace('"','""') + '"'
        writer.writerow(row)

输出：

Column 1,Column 2,Column 3,Column 4
Fred,Flintstone,"5'10""",black hair
Wilma,Flintstone,"five feet seven inches",red hair
Barney,Rubble,"5 feet 2"" inches",blond hair
Betty,Rubble,"5 foot 7",black hair

Answer 2

您可以尝试以下方法：

    import csv
with open("file.csv", "rU") as fin:
    words = fin.readlines()

with open("cleaned.csv", "w") as fout:
    writer = csv.writer(fout, quoting=csv.QUOTE_ALL, quotechar = '"', doublequote = True)
    for row in words:
        row = row.replace("\n", "")
        newrow = []
        for word in row.split(","): 
            newrow.append(word.strip())
        writer.writerow(newrow)

首先尝试打开它，将其作为简单的文本文件读取，以绕过格式错误的csv文件。然后，我们通常将其写入csv文件。

如何使用Python转义特定.csv列中的所有单双引号？

2 个答案: