如何在python中编写一个检查以查看文件是否有效UTF-8?

时间:2010-07-16 22:33:51

标签: python utf-8

如标题所述,我想检查给定的文件对象(打开为二进制流)是有效的UTF-8文件。

任何?

由于

2 个答案:

答案 0 :(得分:21)

def try_utf8(data):
    "Returns a Unicode object on success, or None on failure"
    try:
       return data.decode('utf-8')
    except UnicodeDecodeError:
       return None

data = f.read()
udata = try_utf8(data)
if udata is None:
    # Not UTF-8.  Do something else
else:
    # Handle unicode data

答案 1 :(得分:8)

您可以执行类似

的操作
import codecs
try:
    f = codecs.open(filename, encoding='utf-8', errors='strict')
    for line in f:
        pass
    print "Valid utf-8"
except UnicodeDecodeError:
    print "invalid utf-8"