导致python csv脚本失败的特殊字符

时间:2016-11-07 17:39:38

标签: python csv

我有一个CSV文件,其中一个单元格中有这个奇怪的箭头字符:

enter image description here

无论我做什么,我似乎都无法将其删除,以便文件正确处理。

import csv,time,string,os,requests, json
from datetime import datetime
import unidecode, unicodedata

def clean(text):
    #attempt 1
    if (type(text) is str):
        text = unicodedata.normalize('NFD',unicode(text)).encode('ascii','ignore')
        return text
    else:
        return text

    #attempt 2
    if text is not None:
        try:
            text.decode('ascii')
        except UnicodeDecodeError:
            print("not unicode")
            return ''
        else:
            print("unicode")
            return text

with open(input_file) as infile, open("c:\\upload\\output_file.csv", "wb") as outfile:
    r = csv.DictReader(infile, delimiter=",", skipinitialspace=True)
    w = csv.DictWriter(outfile, inv_fields, extrasaction="ignore")
    r = [{k: clean(v) for k, v in row.items()} for row in r]

    wtr = csv.writer( outfile )    
    wtr.writerow(["a", "b", "c", "f"])
    rows_processed = 0
    for i, row in enumerate(r, start=new_start_row):
        rows_processed += 1
        print("Starting row " + str(rows_processed))
        row['id'] = i
        w.writerow(row)

如果我从单元格中删除箭头字符,保存并重新运行脚本,它可以正常工作,所以我知道这个字符会导致脚本失败。我最初的想法是使用这行代码运行每个单元格的过滤器:

r = [{k: clean(v) for k, v in row.items()} for row in r]

但我似乎无法写任何会发现""这个坏人。在屏幕上,我将看到Starting row 1/2/3/4/5,然后在第6行,此单元格存在,脚本将停止运行。我没有得到任何错误或任何回溯,它只是完成。

思想?

0 个答案:

没有答案