我的.csv文件带有特定的(文本)列,其单元格偶尔会包含双引号(“)。在ArcMap中转换为shapefile时,这些单双引号会导致错误的转换。必须将它们“转义”。
我需要一个脚本来编辑.csv,以便它:
我的脚本:
import csv
with open(Source_CSV, 'r') as file1, open('OUTPUT2.csv','w') as file2:
reader = csv.reader(file1)
# Write column headers without quotes
headers = reader.next()
str1 = ''.join(headers)
writer = csv.writer(file2)
writer.writerow(headers)
# Write all other rows with quotes
writer = csv.writer(file2, quoting=csv.QUOTE_ALL)
for row in reader:
writer.writerow(row)
此脚本在 ALL 列中成功完成了上述两项任务。
例如,原始的.csv:
Column 1, Column 2, Column 3, Column 4
Fred, Flintstone, 5'10", black hair
Wilma, Flintstone, five feet seven inches, red hair
Barney, Rubble, 5 feet 2" inches, blond hair
Betty, Rubble, 5 foot 7, black hair
成为这个:
Column 1, Column 2, Column 3, Column 4
"Fred"," Flintstone"," 5'10"""," black hair"
"Wilma"," Flintstone"," five feet seven inches"," red hair"
"Barney"," Rubble"," 5 feet 2"" inches"," blond hair"
"Betty"," Rubble"," 5 foot 7"," black hair"
但是,如果我只想在第3列中完成此操作(有时实际上会用双引号引起来)怎么办?
换句话说,我怎么能得到这个...?
Column 1, Column 2, Column 3, Column 4
Fred, Flintstone," 5'10""", black hair
Wilma, Flintstone," five feet seven inches", red hair
Barney, Rubble," 5 feet 2"" inches", blond hair
Betty, Rubble," 5 foot 7", black hair
答案 0 :(得分:1)
仅引用其中带有双引号的字段是否足够?如果是这样,csv
模块的默认行为将起作用,尽管我在解析输入文件时添加了skipinitialspace=True
,因此它不会将逗号后的空格视为有效的空格。
也根据csv
模块文档,我以二进制模式打开了文件。
import csv
with open('input.csv','rb') as file1, open('output.csv','wb') as file2:
reader = csv.reader(file1,skipinitialspace=True)
writer = csv.writer(file2)
for row in reader:
writer.writerow(row)
输入:
Column 1, Column 2, Column 3, Column 4
Fred, Flintstone, 5'10", black hair
Wilma, Flintstone, five feet seven inches, red hair
Barney, Rubble, 5 feet 2" inches, blond hair
Betty, Rubble, 5 foot 7, black hair
输出:
Column 1,Column 2,Column 3,Column 4
Fred,Flintstone,"5'10""",black hair
Wilma,Flintstone,five feet seven inches,red hair
Barney,Rubble,"5 feet 2"" inches",blond hair
Betty,Rubble,5 foot 7,black hair
如果您需要引用第3列的每一行,则可以手动进行引用。我已将csv
模块设置为不加引号,并将引号设置为不应在输入中出现的不可打印的控制字符:
import csv
with open('input.csv','rb') as file1, open('output.csv','wb') as file2:
reader = csv.reader(file1,skipinitialspace=True)
writer = csv.writer(file2,quoting=csv.QUOTE_NONE,quotechar='\x01')
# Write column headers without quotes
headers = reader.next()
writer.writerow(headers)
# Write 3rd column with quotes
for row in reader:
row[2] = '"' + row[2].replace('"','""') + '"'
writer.writerow(row)
输出:
Column 1,Column 2,Column 3,Column 4
Fred,Flintstone,"5'10""",black hair
Wilma,Flintstone,"five feet seven inches",red hair
Barney,Rubble,"5 feet 2"" inches",blond hair
Betty,Rubble,"5 foot 7",black hair
答案 1 :(得分:0)
您可以尝试以下方法:
import csv
with open("file.csv", "rU") as fin:
words = fin.readlines()
with open("cleaned.csv", "w") as fout:
writer = csv.writer(fout, quoting=csv.QUOTE_ALL, quotechar = '"', doublequote = True)
for row in words:
row = row.replace("\n", "")
newrow = []
for word in row.split(","):
newrow.append(word.strip())
writer.writerow(newrow)
首先尝试打开它,将其作为简单的文本文件读取,以绕过格式错误的csv文件。然后,我们通常将其写入csv文件。