Question

Python 2.6，升级不是选项

脚本旨在从arcgis数据库中获取字段，并将Insert oracle语句创建为可在以后使用的文本文件。 3000条记录之后有7500条记录错误输出并说明问题在于。

fieldValue = unicode(str(row.getValue(field.name)),'utf-8',errors='ignore')

我已经尝试过unicode和编码的各种变体。我是python的新手，实际上只需要有经验的人查看我的代码并查看问题所在。

import arcpy

#Where the GDB Table is located
fc = "D:\GIS Data\eMaps\ALDOT\ALDOT_eMaps_SignInventory.gdb/SignLocation"

fields = arcpy.ListFields(fc)
cursor = arcpy.SearchCursor(fc)

#text file output
outFile = open(r"C:\Users\kkieliszek\Desktop\transfer.text", "w")


#insert values into table billboard.sign_inventory
for row in cursor:
 outFile.write("Insert into billboard.sign_inventory() Values (") 
 for field in fields:

    fieldValue = unicode(str(row.getValue(field.name)),'utf-8',errors='ignore')

    if row.isNull(field.name) or fieldValue.strip() == "" : #if field is Null or a Empty String print NULL
        value = "NULL"
        outFile.write('"' + value + '",')
    else: #print the value in the field
        value = str(row.getValue(field.name))
        outFile.write('"' + value + '",')

outFile.write("); \n\n ")


outFile.close()    # This closes the text file

错误代码：

Traceback (most recent call last):
 File "tablemig.py", line 25, in <module>
  fieldValue = unicode(str(row.getValue(field.name)),'utf-8',errors='ignore')
 UnicodeEncodeError: 'ascii' codec can't encode character u'\u2019' in    position 76: ordinal not in range(128)

Answer 1

永远不要在str()对象上调用unicode：

>>> str(u'\u2019')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\u2019' in position 0: ordinal not in range(128)

要编写包含csv格式的Unicode字符串的行，请使用UnicodeWriter而不是手动格式化字段。它应该立即解决几个问题。

Answer 2

文件TextWrappers不需要手动编码/解码。

假设行的结果是Unicode，只需使用io.open()并将encoding属性设置为所需的编码。

例如：

import io

with io.open(r"C:\Users\kkieliszek\Desktop\transfer.text", "w", encoding='utf-8') as my_file:
    my_file(my_unicode)

Answer 3

问题是你需要解码/编码unicode / byte字符串而不是只调用它上面的str。因此，如果你有一个字节串对象，那么你需要在其上调用encode以将其转换为unicode对象而忽略utf内容。另一方面，如果您有unicode个对象，则需要在其上调用decode将其转换为byte string再次忽略utf。所以，只需使用此功能

import re

def remove_unicode(string_data):
    """ (str|unicode) -> (str|unicode)

    recovers ascii content from string_data
    """
    if string_data is None:
        return string_data

    if isinstance(string_data, str):
        string_data = str(string_data.decode('ascii', 'ignore'))
    else:
        string_data = string_data.encode('ascii', 'ignore')

    remove_ctrl_chars_regex = re.compile(r'[^\x20-\x7e]')

    return remove_ctrl_chars_regex.sub('', string_data)

fieldValue = remove_unicode(row.getValue(field.name))

它应该解决问题。

ascii编解码器无法编码字符u＆＃39; \ u2019＆＃39;序数超出范围（128）

3 个答案: