我有一个长字符串,其中包含文字Your Sunday evening order with Uber Eats\nTo: test@email.com\n\n\n[image: map]\n\n[image: Uber logo]\n\xe2\x82\xac17.50\nThanks for choosing Uber,
我想在Python 3.6中用'EUR'替换'\ xe2 \ x82 \ xac'
如果我打印字符串,我看到它前面是b,即它是字节文字。
b'<div dir="ltr"><br ...' etc.
我无法对其进行编码(html = html.encode('UTF-8')
),因为我得到a bytes-like object is required, not 'str'
也无法对其进行解码('str' object has no attribute 'decode'
)
我试过了:
html = html.replace(u"\xe2\x82\xac","EUR")
html = html.replace(u'\xe2\x82\xac',"EUR")
html = html.replace('\xe2\x82\xac',"EUR")
html = html.replace(u"€","EUR")
这些都不起作用。
html.decode("utf-8")
给我一个错误'str' object has no attribute 'decode'
。
对于上下文,通过使用邮箱库读取电子邮件的内容来生成字符串:
for message in mbox:
for part in message.walk():
html = str(part.get_payload(decode=True))
答案 0 :(得分:2)
您应该使用:
html = html.replace(r"\xe2\x82\xac", "EUR")
这样字符串\xe2\x82\xac
就会被替换为EUR。假设\
确实在你的html上。
否则,你应该
html = html.replace('\u20ac', 'EUR')
但事实并非如此,因为使用unicode符号时,它不起作用。
不要认为Python在字符串中使用UTF-8(实际上它不在内部使用UTF-8)。
注意:Python使用UTF-16(或UTF-32),因此Python(从解码的字符串)永远不会编写\xe2\x82\xac
。所以或\
是文字的,或者某些输出过程会损坏它。
答案 1 :(得分:1)
import unicodedata
jil = """"Your Sunday evening order with Uber Eats\nTo: test@email.com\n\n\n[image: map]\n\n[image: Uber logo]\n\xe2\x82\xac17.50\nThanks for choosing Uber,"""
data = unicodedata.normalize("NFKD", jil)
print(data)
>>>" Your Sunday evening order with Uber Eats
To: test@email.com
[image: map]
[image: Uber logo]
â¬17.50
Thanks for choosing Uber,
答案 2 :(得分:0)
它不起作用。
html="Your Sunday evening order with Uber Eats\nTo: test@email.com\n\n\n[image: map]\n\n[image: Uber logo]\n\xe2\x82\xac17.50\nThanks for choosing Uber,"
html = html.replace(u"\xe2\x82\xac","EUR")
html = html.replace(u'\xe2\x82\xac',"EUR")
html = html.replace('\xe2\x82\xac',"EUR")
html = html.replace(u"€","EUR")
html = html.encode("utf-8",'strict');
print("Encoded String: " + str(html))
print("Decoded String: " + html.decode("utf-8",'strict'))