Question

输入字符串如下：

“ hello world ” with double quotes

我用

Htmlpage=parse(htmlwebpage) from lxml

我得到的输出字符串是：

' â\x80\x9c hello world \xa0 '

而不是

'"Hello world"'

我在窗户上感谢

Answer 1

我终于找到了一些解决方案：

我使用以下网址找到了网页格式：

webpage.headers.get_content_charset()

我在解析函数中指定了解析器格式，如：

EncodeFormat=lxml.html.HTMLParser(encoding='utf-8')

然后

Htmlpage=parse(htmlwebpage,EncodeFormat)

我在删除的字符串中仍然有一个\ xa0：

string.replace('\xa0','')