How to check if a string contain only UTF-8 characters

时间:2018-03-25 19:16:10

标签: python string python-3.x encoding utf-8

So far I am doing something like this:

def is_utf8(s):
    try:
        x=bytes(s,'utf-8').decode('utf-8', 'strict')
        print(x)
        return 1
    except:
        return 0

the only problem is that I don't want it to print anything, I want to delete the print(x) and when I do that, the function stops functioning correctly. For example if I do : print(is_utf8("H�tst")) while the print is in the function it returns 0 otherwise it prints 1. Am i approaching the problem in a wrong way

1 个答案:

答案 0 :(得分:2)

You could use the chardet module to detect an unknown encoding. For example if a is a byte array then you could determine the encoding like this:

import chardet

b = chardet.detect(a)
print(b["encoding"])