Python:使用cp1252字符修复字符串

时间:2016-12-16 22:17:44

标签: python python-3.x encoding

缩短版本: 如何编写一个函数,它将包含一个包含字符串表示形式的字符串(例如“This is a character \ u200a”)并用它们代表的字符替换它们(例如“This is a character”)

更长的版本 我需要编写一个函数,它将采用以下字符串:

<p>" \'These things matter\' a lot—in Virginia, Florida, North Carolina, and Ohio—just to name a few states."</p>

并将其转换为

<p>" 'These things matter' a lot—in Virginia, Florida, North Carolina, and Ohio—just to name a few states."</p>

到目前为止,我提出的最好的方法是将其编码为cp1252,然后将其解码为utf-8

>>> x.encode("cp1252").decode("utf-8")
<p>"\u200a\'These things matter\' a lot—in Virginia, Florida, North Carolina, and Ohio—just to name a few states."</p>

是否有一个我可以编写的函数或一个库,它可以让我在使用\'字符并找到那个糟糕的“毛球空间”编码字符方面的最后一步?

谢谢!

1 个答案:

答案 0 :(得分:2)

您的文字是正确的,但请使用print()查看。

>>> x = " \'These things matter\' a lot—in Virginia, Florida, North Carolina, and Ohio—just to name a few states."

>>> print(x.encode("cp1252").decode("utf-8"))


 'These things matter' a lot—in Virginia, Florida, North Carolina, and Ohio—just to name a few states.

Python Shell通常使用print(repr(...))自动显示每行代码的结果。它提供了比普通print(...)

更有用的信息(当您测试代码时)
>>> print(repr(x.encode("cp1252").decode("utf-8")))

"\u200a\'These things matter\' a lot—in Virginia, Florida, North Carolina, and Ohio—just to name a few states."