Django:文件名的sqlite编码

时间:2014-02-26 10:36:45

标签: python django sqlite encoding

我正在编写一个命令(通过manage.py importfiles运行),以便在我自己编写的Django文件存储库中导入真实文件系统上的给定目录结构。

def _handle_directory(self, directory_path, directory):
    for root, subFolders, files in os.walk(directory_path):
        for filename in files:
            path = os.path.join(root, filename)
            with open(path, 'r') as f:
                file_wrapper = FileWrapper(f)
                self.cnt_files += 1
                new_file = File(directory=directory, filename=filename,
                                file=file_wrapper, uploader=self.uploader)
                new_file.save()

full model can be found at GitHubfull command is currently on gist.github.com availableFileField

如果您不想查看模型:我的file课程的属性Filethanks to pajton

复制文件似乎有效,custom save()-method of my File class。不过我收到一个新的例外,我认为,sqlite编码存在问题。但我不知道如何解决它。 sys.getfilesystemencoding()的值为mbcs

Traceback (most recent call last):
  File ".\manage.py", line 10, in <module>
    execute_from_command_line(sys.argv)
  File "C:\Python27\lib\site-packages\django\core\management\__init__.py", line 399, in execute_from_command_line
    utility.execute()
  File "C:\Python27\lib\site-packages\django\core\management\__init__.py", line 392, in execute
    self.fetch_command(subcommand).run_from_argv(self.argv)
  File "C:\Python27\lib\site-packages\django\core\management\base.py", line 242, in run_from_argv
    self.execute(*args, **options.__dict__)
  File "C:\Python27\lib\site-packages\django\core\management\base.py", line 285, in execute
    output = self.handle(*args, **options)
  File "D:\Development\github\Palco\engine\filestorage\management\commands\importfiles.py", line 63, in handle
    self._handle_directory(args[0], root)
  File "D:\Development\github\Palco\engine\filestorage\management\commands\importfiles.py", line 75, in _handle_directory
    new_file.save()
  File "D:\Development\github\Palco\engine\filestorage\models.py", line 155, in save
    super(File, self).save(*args, **kwargs)
  File "C:\Python27\lib\site-packages\django\db\models\base.py", line 545, in save
    force_update=force_update, update_fields=update_fields)
  File "C:\Python27\lib\site-packages\django\db\models\base.py", line 573, in save_base
    updated = self._save_table(raw, cls, force_insert, force_update, using, update_fields)
  File "C:\Python27\lib\site-packages\django\db\models\base.py", line 635, in _save_table
    forced_update)
  File "C:\Python27\lib\site-packages\django\db\models\base.py", line 679, in _do_update
    return filtered._update(values) > 0
  File "C:\Python27\lib\site-packages\django\db\models\query.py", line 507, in _update
    return query.get_compiler(self.db).execute_sql(None)
  File "C:\Python27\lib\site-packages\django\db\models\sql\compiler.py", line 976, in execute_sql
    cursor = super(SQLUpdateCompiler, self).execute_sql(result_type)
  File "C:\Python27\lib\site-packages\django\db\models\sql\compiler.py", line 782, in execute_sql
    cursor.execute(sql, params)
  File "C:\Python27\lib\site-packages\django\db\backends\util.py", line 69, in execute
    return super(CursorDebugWrapper, self).execute(sql, params)
  File "C:\Python27\lib\site-packages\django\db\backends\util.py", line 53, in execute
    return self.cursor.execute(sql, params)
  File "C:\Python27\lib\site-packages\django\db\utils.py", line 99, in __exit__
    six.reraise(dj_exc_type, dj_exc_value, traceback)
  File "C:\Python27\lib\site-packages\django\db\backends\util.py", line 53, in execute
    return self.cursor.execute(sql, params)
  File "C:\Python27\lib\site-packages\django\db\backends\sqlite3\base.py", line 450, in execute
    return Database.Cursor.execute(self, query, params)
django.db.utils.ProgrammingError: You must not use 8-bit bytestrings unless you use a text_factory that can interpret 8-bit bytestrings (like text_factory = str
). It is highly recommended that you instead just switch your application to Unicode strings.

我在几个方面改变了filename;但总是错的。我也尝试了'foo'u'foo'等值。 :( .encode().decode()unidecode的不同组合。

我很确定,这是filename的问题。我打印了文件名的当前值,如果文件名有非ascii字符,则会发生异常。

更新1 :我按照pajton的建议并记录了sql查询。这是结果: (第一行是print filename的输出)。 D:\ temp \ prak-gdv-abgabe是我对这个命令的论证。

Eigene L÷sung.pdf
(0.000) QUERY = u'BEGIN' - PARAMS = (); args=None
(0.000) QUERY = u'INSERT INTO "filestorage_file" ("directory_id", "filename", "file", "size", "content_type", "uploader_id", "datetime", "sha512") VALUES (%s, %
s, %s, %s, %s, %s, %s, %s)' - PARAMS = (164, u'Eigene L\xf6sung.pdf', u'filestorage/5/5b/5bf32077-5531-4de0-95a7-d2ea3e10a17d.pdf', None, None, 8, u'2014-02-26
23:21:17.735000', None); args=[164, 'Eigene L\xc3\xb6sung.pdf', u'filestorage/5/5b/5bf32077-5531-4de0-95a7-d2ea3e10a17d.pdf', None, None, 8, u'2014-02-26 23:21:
17.735000', None]
(0.000) QUERY = u'BEGIN' - PARAMS = (); args=None
(0.000) QUERY = u'UPDATE "filestorage_file" SET "directory_id" = %s, "filename" = %s, "file" = %s, "size" = NULL, "content_type" = %s, "uploader_id" = %s, "date
time" = %s, "sha512" = NULL WHERE "filestorage_file"."id" = %s ' - PARAMS = (164, u'D:\\Temp\\prak-gdv-abgabe\\Protokoll\\Eigene L\ufffdsung.pdf', u'filestorage
/5/5b/5bf32077-5531-4de0-95a7-d2ea3e10a17d.pdf', u'application/pdf', 8, u'2014-02-26 23:21:17.735000', 156); args=(164, 'D:\\Temp\\prak-gdv-abgabe\\Protokoll\\E
igene L\xf6sung.pdf', u'filestorage/5/5b/5bf32077-5531-4de0-95a7-d2ea3e10a17d.pdf', 'application/pdf', 8, u'2014-02-26 23:21:17.735000', 156)

更新2:(2014-02-27 11:10 UTC) 我的sqlite数据库的编码为UTF-8,经PRAGMA encoding;验证。

我检查了数据库的记录。

   Id   |   filename                                        |   sha512      |   size
    1   |   D:\Temp\prak-gdv-abgabe\Liesmich.html           |   ffeb8c3d5   |   5927
    2   |   D:\Temp\prak-gdv-abgabe\Liesmich.md             |   d206d241f   |   407
    3   |   D:\Temp\prak-gdv-abgabe\Liesmich.txt            |   d206d241f   |   407
    4   |   D:\Temp\prak-gdv-abgabe\Linux\GDV_Praktikum.bin |   5fc5749ee   |   166925
    5   |   Eigene Lösung.pdf                               |               |

非常有趣,失败的条目(id 5)具有预期的文件名,但不是sha512或设置的大小值。其他条目具有sha512和size的预期值,但不是预期的文件名。这很有趣。似乎{{3}}是我问题的一部分......但我不明白为什么会发生这些奇怪的事情。

1 个答案:

答案 0 :(得分:0)

好吧,我找到了....解决方案。我刚刚改进了我的.save()模型的自定义File - 方法。它不再发射3次以上的保存而是一次。并且 - 这是重要的更改 - 它仅更新 我在自定义保存方法中检查的三个字段。我的保存方法现在看起来像:

def save(self, *args, **kwargs):
    super(File, self).save(*args, **kwargs)
    do_update = False
    if not self.content_type:
        self.content_type = mimetypes.guess_type(self.file.name)[0]
        do_update = True
    if not self.sha512:
        self.sha512 = hashlib.sha512(self.file.read()).hexdigest()
        do_update = True
    if not self.size:
        self.size = self.file.size
        do_update = True

    if do_update:
        super(File, self).save(update_fields=['content_type', 'sha512', 'size'], *args, **kwargs)

现在按预期导入文件!