我正在使用Python脚本打开.csv文件并将数据导入数据库。有几个拉丁字符会引起错误,因此我尝试使用UTF-8对那些字符进行编码,并使用errors ='replace'规范用问号替换那些麻烦的字符。但是,即使这样做,我仍然收到以下错误:
UnicodeEncodeError: 'charmap' codec can't encode character '\u010d' in position 2: character maps to <undefined>
我正在使用Python 3.7.4。这是我当前的代码:
import csv
import cx_Oracle
import io
localfile = 'C:/User/Documents/Upload/data.csv'
connection = cx_Oracle.connect()
with io.open(localfile, 'r', encoding='utf-8', errors='replace') as csvfile:
for row in reader:
connection.execute("INSERT INTO database.my_table (Column_1, Column_2, Column_3) values (:1, :2, :3)", [
row[0], row[1], row[2]])
connection.execute('commit')
connection.execute('commit')
编辑:
这是完整的回溯
Traceback (most recent call last):
File "c:\User\.vscode\extensions\ms-python.python-2019.8.30787\pythonFiles\ptvsd_launcher.py", line 43, in <module>
main(ptvsdArgs)
File "c:\User\.vscode\extensions\ms-python.python-2019.8.30787\pythonFiles\lib\python\ptvsd\__main__.py", line 432, in main
run()
File "c:\User\.vscode\extensions\ms-python.python-2019.8.30787\pythonFiles\lib\python\ptvsd\__main__.py", line 316, in run_file
runpy.run_path(target, run_name='__main__')
File "C:\User\AppData\Local\Programs\Python\Python37\lib\runpy.py", line 263, in run_path
pkg_name=pkg_name, script_name=fname)
File "C:\User\AppData\Local\Programs\Python\Python37\lib\runpy.py", line 96, in _run_module_code
mod_name, mod_spec, pkg_name, script_name)
File "C:\User\AppData\Local\Programs\Python\Python37\lib\runpy.py", line 85, in _run_code
exec(code, run_globals)
File "c:\User\Documents\Python_Projects\python_sftp_remote_server_edition.py", line 116, in <module>
insert(localfile, c)
File "c:\User\Documents\Python_Projects\python_sftp_remote_server_edition.py", line 28, in insert
row[0], row[1], row[2]])
File "C:\Users\AppData\Local\Programs\Python\Python37\lib\encodings\cp1252.py", line 12, in encode
return codecs.charmap_encode(input,errors,encoding_table)
UnicodeEncodeError: 'charmap' codec can't encode character '\u010d' in position 2: character maps to <undefined>
答案 0 :(得分:1)
如回溯所示,数据库希望在Windows代码页1252中接收输入。您可以尝试使用errors='replace'
转换为该编码,然后再转换回去:
item = item.encode('cp1252', errors='replace').decode('cp1252')
只需说明一下,我们就通过CP1252将Unicode字符串转换回Unicode,并替换所有无法往返的字符-只是将结果传递给接口,该接口会将其再次转换为CP1252。可以说这根本不优雅。
更好的策略是切换到可以正确处理Unicode的数据库。使用errors='replace'
基本上是在要求计算机破坏有限的传统目标字符编码无法处理的任何数据。
答案 1 :(得分:0)
尝试显式编码和解码单元格值。我用法语创建了一个示例.csv文件,此代码对我有用:
import csv
import cx_Oracle
import io
localfile = 'C:/User/Documents/Upload/data.csv'
connection = cx_Oracle.connect()
with io.open(localfile, 'r', encoding='utf-8', errors='replace') as csvfile:
for row in reader:
connection.execute("INSERT INTO database.my_table (Column_1, Column_2, Column_3) values (:1, :2, :3)", [
str(row[0]).encode('utf-8').decode('utf-8'), str(row[1]).encode('utf-8').decode('utf-8'), str(row[2]).encode('utf-8').decode('utf-8')])
connection.execute('commit')
connection.execute('commit')
答案 2 :(得分:0)
感谢大家的帮助。尽管没有任何答案可以直接解决问题,但我仍然可以使用它们进行更多的研究,最终找到一种解决方案,该脚本可以无错误运行并将数据插入到Oracle数据库中。只是想在下面发布我的(有效)代码,以防其他人遇到同样的困难。
import csv
import cx_Oracle
import io
localfile = 'C:/User/Documents/Upload/data.csv'
connection = cx_Oracle.connect(conn_str, encoding="utf-8")
with io.open(localfile, 'r', encoding='utf-8', errors='backslashreplace') as csvfile:
for row in reader:
connection.execute("INSERT INTO database.my_table (Column_1, Column_2, Column_3) values (:1, :2, :3)", [
row[0], row[1], row[2]])
connection.execute('commit')
connection.execute('commit')