Question

缩短版本： 如何编写一个函数，它将包含一个包含字符串表示形式的字符串（例如“This is a character \ u200a”）并用它们代表的字符替换它们（例如“This is a character”）

更长的版本 我需要编写一个函数，它将采用以下字符串：

<p>"â€Š\'These things matter\' a lotâ€”in Virginia, Florida, North Carolina, and Ohioâ€”just to name a few states."</p>

并将其转换为

<p>" 'These things matter' a lot—in Virginia, Florida, North Carolina, and Ohio—just to name a few states."</p>

到目前为止，我提出的最好的方法是将其编码为cp1252，然后将其解码为utf-8

>>> x.encode("cp1252").decode("utf-8")
<p>"\u200a\'These things matter\' a lot—in Virginia, Florida, North Carolina, and Ohio—just to name a few states."</p>

是否有一个我可以编写的函数或一个库，它可以让我在使用\'字符并找到那个糟糕的“毛球空间”编码字符方面的最后一步？

谢谢！

Answer 1

您的文字是正确的，但请使用print()查看。

>>> x = "â€Š\'These things matter\' a lotâ€”in Virginia, Florida, North Carolina, and Ohioâ€”just to name a few states."

>>> print(x.encode("cp1252").decode("utf-8"))


 'These things matter' a lot—in Virginia, Florida, North Carolina, and Ohio—just to name a few states.

Python Shell通常使用print(repr(...))自动显示每行代码的结果。它提供了比普通print(...)

更有用的信息（当您测试代码时）

>>> print(repr(x.encode("cp1252").decode("utf-8")))

"\u200a\'These things matter\' a lot—in Virginia, Florida, North Carolina, and Ohio—just to name a few states."

Python：使用cp1252字符修复字符串

1 个答案: