我有一个CSV文件,其中一个单元格中有这个奇怪的箭头字符:
无论我做什么,我似乎都无法将其删除,以便文件正确处理。
import csv,time,string,os,requests, json
from datetime import datetime
import unidecode, unicodedata
def clean(text):
#attempt 1
if (type(text) is str):
text = unicodedata.normalize('NFD',unicode(text)).encode('ascii','ignore')
return text
else:
return text
#attempt 2
if text is not None:
try:
text.decode('ascii')
except UnicodeDecodeError:
print("not unicode")
return ''
else:
print("unicode")
return text
with open(input_file) as infile, open("c:\\upload\\output_file.csv", "wb") as outfile:
r = csv.DictReader(infile, delimiter=",", skipinitialspace=True)
w = csv.DictWriter(outfile, inv_fields, extrasaction="ignore")
r = [{k: clean(v) for k, v in row.items()} for row in r]
wtr = csv.writer( outfile )
wtr.writerow(["a", "b", "c", "f"])
rows_processed = 0
for i, row in enumerate(r, start=new_start_row):
rows_processed += 1
print("Starting row " + str(rows_processed))
row['id'] = i
w.writerow(row)
如果我从单元格中删除箭头字符,保存并重新运行脚本,它可以正常工作,所以我知道这个字符会导致脚本失败。我最初的想法是使用这行代码运行每个单元格的过滤器:
r = [{k: clean(v) for k, v in row.items()} for row in r]
但我似乎无法写任何会发现""这个坏人。在屏幕上,我将看到Starting row 1/2/3/4/5
,然后在第6行,此单元格存在,脚本将停止运行。我没有得到任何错误或任何回溯,它只是完成。
思想?