Question

我正在尽可能多地从遗留数据库中移走数据，在这样做的过程中，我有机会将所有网站升级为unicode。尽管如此，这并不是一件轻松的事。而我目前正面临一些神秘感。

我拼凑了一个python中的脚本（我在stackoverflow上发现的部分:) :)将文件从ansi转换为utf-8，但结果不断出错。这是我使用的脚本：

#!/usr/bin/env python
import os
import codecs
from sys import argv

script, directory = argv

def convert_to_utf8(filename):
    f = codecs.open(filename, 'r', 'cp1252', errors='ignore')
    u = f.read()   # now the contents have been transformed to a Unicode string
    out = codecs.open(filename, 'w', 'utf-8', errors='ignore')
    out.write(u)   # and now the contents have been output as UTF-8


for root,dirs,files in os.walk(directory): #Find the directory, loop through the files
    for file in files:
        if file.endswith(".asp") or file.endswith(".aspx") or file.endswith(".php"):
            filename = os.path.join(root,file)
            print filename
            convert_to_utf8(filename)

上面设置的脚本将Windows-1252文件转换为utf-8。然而，尽管已将所有标题从windows-1252更改为utf-8，但我仍然遇到SQL错误，这是通过访问具有字段名称中的斯堪的纳维亚字母的旧数据库。

这是一个问题，当我将数据移动到mysql数据库时，我将主要解决这个问题，但即使将来也会有一些连接到这个旧程序，然后问题就会重新出现。

当打开并检查记事本++中的文件时，似乎认为文件是UTF-8，将其转换为UTF-8而没有任何BOM更改，并且相同的错误继续（SQL查询停止工作）。然而，当我将其转换为带有BOM的UTF-8时，页面开始工作。

这是我所面对的谜。 - 为什么python脚本转换不起作用？ - 为什么没有BOM的UTF-8不工作？ - 为什么它与BOM一起使用？

我本来希望能够批量处理所有文件而不必在notepad ++中手动完成，然后我想知道为什么它要我使用带有BOM的UTF-8。

转换为UTF-8，记事本++和python

0 个答案: