Question

我怀疑某些数据已保存（在Windows计算机上）为ANSI。因此，原始的希伯来字符迷失了，我们看到的东西就像 ùéôåãé äòéø。

信息是否丢失或者是否有可能将字符映射回来，因为原始文本是希伯来语？

Answer 1

信息可能不会丢失，或者至少部分丢失。如果你想使用Python：

import codecs
BLOCKSIZE = 1048576 # or some other, desired size in bytes
with codecs.open("input.txt", "r", "windows-1255") as sourceFile:
    with codecs.open("output.txt", "w", "utf-8") as targetFile:
        while True:
            contents = sourceFile.read(BLOCKSIZE)
            if not contents:
               break
            targetFile.write(contents)

从How to convert a file to utf-8 in Python?

中被盗并改编

您还可以使用外部工具，例如iconv：

iconv -f windows-1255 -t utf-8 input.txt > output.txt

Iconv在大多数Linux发行版，Cygwin和其他平台上都可用。

如果文件被双重修改，您可能需要执行以下操作：

iconv -f utf-8 -t windows-1252 input.txt > tmp.txt
iconv -f windows-1255 -t utf-8 tmp.txt > output.txt

但这种事情发生的可能性微乎其微。

堕落的希伯来语：保存为ansi - 隐藏回UTF-8

1 个答案: