将文本文件内容保存到DB:“第1行的列'内容'的字符串值不正确:'\ xEF \ xBB \ xBF#W ...'”

时间:2011-05-18 22:21:53

标签: python mysql django file character-encoding

在我的Django应用程序中,我正在上传一个文本文件,使用file.read()获取文件的内容,然后保存到数据库(使用Django的.save()方法)。

我收到以下错误:

Environment:

Request Method: POST
Request URL: http://localhost:8000/
Django Version: 1.2.5
Python Version: 2.7.1
Installed Applications:
['django.contrib.auth',
 'django.contrib.contenttypes',
 'django.contrib.sessions',
 'django.contrib.sites',
 'django.contrib.messages',
 'django.contrib.admin',
 'django.contrib.markup',
 'files']
Installed Middleware:
('django.middleware.common.CommonMiddleware',
 'django.contrib.sessions.middleware.SessionMiddleware',
 'django.middleware.csrf.CsrfViewMiddleware',
 'django.contrib.auth.middleware.AuthenticationMiddleware',
 'django.contrib.messages.middleware.MessageMiddleware')


Traceback:
File "/usr/lib/pymodules/python2.7/django/core/handlers/base.py" in get_response
  100.                     response = callback(request, *callback_args, **callback_kwargs)
File "/home/mcrittenden/Dropbox/Code/dropdo-django/dropdo/files/views.py" in index
  31.                 return handle_upload(request.FILES['file'])
File "/home/mcrittenden/Dropbox/Code/dropdo-django/dropdo/files/views.py" in handle_upload
  60.     file.save()
File "/usr/lib/pymodules/python2.7/django/db/models/base.py" in save
  458.         self.save_base(using=using, force_insert=force_insert, force_update=force_update)
File "/usr/lib/pymodules/python2.7/django/db/models/base.py" in save_base
  551.                     result = manager._insert(values, return_id=update_pk, using=using)
File "/usr/lib/pymodules/python2.7/django/db/models/manager.py" in _insert
  195.         return insert_query(self.model, values, **kwargs)
File "/usr/lib/pymodules/python2.7/django/db/models/query.py" in insert_query
  1524.     return query.get_compiler(using=using).execute_sql(return_id)
File "/usr/lib/pymodules/python2.7/django/db/models/sql/compiler.py" in execute_sql
  788.         cursor = super(SQLInsertCompiler, self).execute_sql(None)
File "/usr/lib/pymodules/python2.7/django/db/models/sql/compiler.py" in execute_sql
  732.         cursor.execute(sql, params)
File "/usr/lib/pymodules/python2.7/django/db/backends/util.py" in execute
  15.             return self.cursor.execute(sql, params)
File "/usr/lib/pymodules/python2.7/django/db/backends/mysql/base.py" in execute
  86.             return self.cursor.execute(query, args)
File "/usr/lib/pymodules/python2.7/MySQLdb/cursors.py" in execute
  168.         if not self._defer_warnings: self._warning_check()
File "/usr/lib/pymodules/python2.7/MySQLdb/cursors.py" in _warning_check
  82.                     warn(w[-1], self.Warning, 3)

Exception Type: Warning at /
Exception Value: Incorrect string value: '\xEF\xBB\xBF# W...' for column 'contents' at row 1

我假设(因为EF BB BF是UTF BOM字符),这是由于DB和文件之间的字符集不同?这听起来有效吗?如果是这样,我该如何解决?

2 个答案:

答案 0 :(得分:2)

你走在正确的道路上。检查数据库的字符集(是utf-8吗?)。如果不是,并且您想使用UTF-8,请使用此SQL命令

更改charset
alter table yourTableName DEFAULT CHARACTER SET utf8;

如果要将UTF-8字符串转换回来,请阅读this great tutorial在Python中使用UTF-8。

您可以使用此命令剥离DOM

  

#从Unicode字符串的开头删除BOM(如果存在)   u.lstrip(unicode(codecs.BOM_UTF8,   “utf8”))

答案 1 :(得分:0)

你是对的,你正在阅读的文件在其前面插入了BOM字符。在传递数据之前,您必须检查并删除这些字符。文件的其余部分将是UTF-8字符。

我不知道如何判断数据库期望的字符集。