Question

我正在阅读一个CSV文件并且效果很好，但有些字符串看起来像这样：

u'Egg'

当尝试将其转换为String时，我得到错误：

UnicodeEncodeError：'ascii'编解码器无法对位置0中的字符u'\ xfc'进行编码：序号不在范围内（128）。我已经阅读了与此类似的各种问题，但尝试提供的解决方案已经导致了相同的错误。

奇怪的是，调试时你可以在图片中看到变量CITY，它具有正确的值。但它仍然崩溃。

在我的功能之下：

def readData(filename, delimiter=";"):
    """
    Read in our data from a CSV file and create a dictionary of records,
    where the key is a unique record ID and each value is dict
    """
    data = pd.read_csv(filename, delimiter=delimiter, encoding="UTF-8")
    data.set_index("TRNUID")
    returnValue = {}
    for index, row in data.iterrows():
        if index == 0:
            print row["CITY"]
        else:
            if math.isnan(row["DUNS"]) == True:
                DUNS = ""
            else:
                DUNS = str((int(row["DUNS"])))[:-2]
            NAME = str(row["NAME"]).encode("utf-8")
            STREET = str(row["STREET"]).encode("utf-8")
            CITY = row["CITY"]
            POSTAL = str(row["POSTAL"]).encode("utf-8")
            returnValue[row["TRNUID"]] = {
                "DUNS": DUNS,
                "NAME": NAME,
                "STREET": STREET,
                "CITY": CITY,
                "POSTAL": POSTAL
            }
    return returnValue

Answer 1

您正在尝试转换为ASCII字符串，这些字符串本身无法转换为它。

如果查看\xfc的unicode字符，则为"u" with an umlaut。实际上，您的变量屏幕截图显示“Egg a.d.Guntz”，其中的变音符号为“u”。因此，问题不在于“蛋”，而在于延续。

您可以通过删除字符中的所有变音符号来解决此问题（如this question中所述），但您将丢失信息。

u'String'将csv文件解析为dict Python

1 个答案: