Question

我正在尝试解码下面列表中的字符串。它们都以utf-8格式编码。

_strs=['."\n\nThe vicar\'',':--\n\nIn the', 'cathedral']

预期产出：

['.The vicar', ':--In the', 'cathedral']

我的尝试

>>> for x in _str:
    x.decode('string_escape')
    print x


'."\n\nThe vicar\''
."

The vicar'
':--\n\nIn the'
:--

In the
'cathedral'
cathedral
>>> print [x.decode('string_escape') for x in _str]
['."\n\nThe vicar\'', ':--\n\nIn the', 'cathedral']

两次尝试都失败了。有什么想法吗？

Answer 1

因此，您希望从列表中删除一些字符，可以使用下面的简单regex来完成：

import re
print [re.sub(r'[."\'\n]','',x) for x in _str]

此regex删除所有（.，"，'，\n），结果将是：

['The vicar', ':--In the', 'cathedral']

希望这会有所帮助。

如何解码以utf-8格式保存的字符串

1 个答案: