Question

我有一个带有特殊字符的csv文件，如下所示-

location_id          location_name
7099395                SUPER CAFÃ‰

我首先将文件读入dataframe，然后使用dataframe将sqlalchemy写入数据库表。

由于特殊字符，出现以下错误：

'ascii' codec can't encode characters in position 33-39: ordinal not in range(128)

要解决此问题，我在函数中使用了unidecode模块，但是将df转换为string。随后使用string模块将df转换回StringIO时，它扭曲了表格形式。如果需要，我很乐意将我的代码粘贴在这里以供参考。

编辑：以下是我的代码-

from unidecode import unidecode
import pandas as pd
from pandas.compat import StringIO
from sqlalchemy import create_engine

def unicodize(item):
    def _get_int_if_int(x):
        try:
            if not abs(int(x) - float(x)) > 0:
                return int(x)
            else:
                return x
        except (ValueError, TypeError):
            return x

    try:
        if item.__contains__("_"):
            _item = item
        elif item.startswith("0") and len(item) > 1:
            _item = item
        else:
            _item = _get_int_if_int(item)
    except AttributeError:
        _item = _get_int_if_int(item)

    try:
        try:
            return unidecode(unicode(_item))
        except NameError:
            return unidecode(str(_item))
        except UnicodeDecodeError:
            try:
                return unidecode(_item.decode('utf-8'))
            except UnicodeDecodeError:
                return unidecode(_item.decode('latin-1'))
    except AttributeError:
        return _item

input = pd.read_csv('my_file.csv')
output = (unicodize(input))

df = pd.read_csv(StringIO(output), sep='\t')

output_df_dict = {}
output_df_dict['my_file'] = df
engine = create_engine('postgres://XX:YY@ZZ:5432/AA')
schema = "scenario_3"
for table_name, df in output_df_dict.items():
     df['jqgrid_id'] = df.index
     df.to_sql(table_name, con=engine, schema=schema, index=False, if_exists='replace')
     print("Data Transfer done!")

我确实将输出写在数据库表中，但是其形式完全失真，该表也显示了计数。（附有快照）

将具有特殊字符的csv写入数据库表

0 个答案: