打印带有UTF-8编码字符的字符串,例如:“ \ u00c5 \ u009b \”

时间:2018-07-11 20:04:45

标签: python python-3.x python-unicode

我想打印这样编码的字符串:"Cze\u00c5\u009b\u00c4\u0087",但我不知道如何。示例字符串应打印为:“Cześć”。

我尝试过的是:

str = "Cze\u00c5\u009b\u00c4\u0087"
print(str) 
#gives: CzeÅÄ

str_bytes = str.encode("unicode_escape")
print(str_bytes) 
#gives: b'Cze\\xc5\\x9b\\xc4\\x87'

str = str_bytes.decode("utf8")
print(str) 
#gives: Cze\xc5\x9b\xc4\x87

哪里

print(b"Cze\xc5\x9b\xc4\x87".decode("utf8"))

给出“Cześć”,但我不知道如何将"Cze\xc5\x9b\xc4\x87"字符串转换为b"Cze\xc5\x9b\xc4\x87"字节。

我还知道问题是在使用"unicode_escape"参数对基本字符串进行编码后,字节表示中出现了额外的反斜杠,但是我不知道如何摆脱它们-str_bytes.replace(b'\\\\', b'\\')并不工作。

1 个答案:

答案 0 :(得分:5)

使用def create_hdf5_file(name, path, shape, chunks = None): """create a hdf5 file given a path, shape and chunks. If chunks is not given file will be created without any chunk. Default chunks is equal to False. The file will be compressed as gzip with a level of compression equals to 4. This function will append data to the hdf5 file""" with h5py.File(str(path+'/'+name+'.hdf5'), 'w') as f: dset = f.create_dataset(str(name), compression = 'gzip', shape= (shape), chunks= (chunks), \ maxshape = (None, None, shape[2])) n = 10**5 # size for axis 0 #m = 5*10**3 size for axis 1 while dset.shape[0] < n: # and dset.shape[1] < m: dset.resize(dset.shape[0]+10**4, axis=0) #dset.resize(dset.shape[1]+5*10**2, axis = 1) dset[-10**4:] = np.random.randint(2, size=(10**4, shape[1], shape[2])) #dset[-5*10**2:] = np.random.randint(2, size =(shape[0], 5*10**2, shape[2])) print(dset.shape) print('Final dataset size: {}'.format(dset.shape)) print('Chunks size: {}'.format(dset.chunks)) print('HDF5 file created as: {}'.format(str(name)+'.hdf5')) if __name__ == "__main__": create_hdf5_file(name='test',path='/home/neither/Desktop', \ shape = (10000, 500,2), chunks= (1000,5000,2))

raw_unicode_escape