Question

我在ANSI中编码了大约600,000个文件，我想将它们转换为UTF-8。我可以在NOTEPAD++中单独执行此操作，但我无法为600,000个文件执行此操作。我可以在R或Python中执行此操作吗？

我找到了此链接，但Python脚本未运行： notepad++ converting ansi encoded file to utf-8

Answer 1

为什么不读取文件并将其写为UTF-8？你可以用Python做到这一点。

#to support encodings
import codecs

#read input file
with codecs.open(path, 'r', encoding = 'utf8') as file:
  lines = file.read()

#write output file
with codecs.open(path, 'w', encoding = 'utf8') as file:
  file.write(lines)

Answer 2

我很高兴这是一个古老的问题，但是最近刚解决了一个类似的问题，我想我会分享我的解决方案。

我有一个程序正在准备一个文件，我需要将其导入到sqlite3数据库中，但是文本文件始终是“ ANSI”，而sqlite3需要UTF-8。

ANSI编码在python中被识别为“ mbcs”，因此我使用的代码剥夺了我发现的其他内容：

blockSize = 1048576
with codecs.open("your ANSI source file.txt","r",encoding="mbcs") as sourceFile:
    with codecs.open("Your UTF-8 output file.txt","w",encoding="UTF-8") as targetFile:
        while True:
            contents = sourceFile.read(blockSize)
            if not contents:
                break
            targetFile.write(contents)

下面的链接包含有关我在研究中发现的编码类型的一些信息

https://docs.python.org/2.4/lib/standard-encodings.html

从ANSI转换为UTF-8

2 个答案: