我有一个带有特殊字符的csv文件,如下所示-
location_id location_name
7099395 SUPER CAFÉ
我首先将文件读入dataframe
,然后使用dataframe
将sqlalchemy
写入数据库表。
由于特殊字符,出现以下错误:
'ascii' codec can't encode characters in position 33-39: ordinal not in range(128)
要解决此问题,我在函数中使用了unidecode
模块,但是将df
转换为string
。随后使用string
模块将df
转换回StringIO
时,它扭曲了表格形式。如果需要,我很乐意将我的代码粘贴在这里以供参考。
编辑:以下是我的代码-
from unidecode import unidecode
import pandas as pd
from pandas.compat import StringIO
from sqlalchemy import create_engine
def unicodize(item):
def _get_int_if_int(x):
try:
if not abs(int(x) - float(x)) > 0:
return int(x)
else:
return x
except (ValueError, TypeError):
return x
try:
if item.__contains__("_"):
_item = item
elif item.startswith("0") and len(item) > 1:
_item = item
else:
_item = _get_int_if_int(item)
except AttributeError:
_item = _get_int_if_int(item)
try:
try:
return unidecode(unicode(_item))
except NameError:
return unidecode(str(_item))
except UnicodeDecodeError:
try:
return unidecode(_item.decode('utf-8'))
except UnicodeDecodeError:
return unidecode(_item.decode('latin-1'))
except AttributeError:
return _item
input = pd.read_csv('my_file.csv')
output = (unicodize(input))
df = pd.read_csv(StringIO(output), sep='\t')
output_df_dict = {}
output_df_dict['my_file'] = df
engine = create_engine('postgres://XX:YY@ZZ:5432/AA')
schema = "scenario_3"
for table_name, df in output_df_dict.items():
df['jqgrid_id'] = df.index
df.to_sql(table_name, con=engine, schema=schema, index=False, if_exists='replace')
print("Data Transfer done!")