我正在使用requests
来请求页面。任务很简单,但我有编码问题。该页面包含非ascii,土耳其语字符,但在HTML源代码中,结果如下:
ÇINARTEPE # What it looks like
ÇINARTEPE # What it is like in HTML source
因此,以下操作不会返回我的预期:
# What I have tried as encoding
req.encoding = "utf-8"
req.encoding = "iso-8859-9"
req.encoding = "iso-8859-1"
# The operations
"ÇINARTEPE" in req.text # False, it must return True
bytes("ÇINARTEPE", "utf-8") in req.content # False
bytes("ÇINARTEPE", "iso-8859-9") in req.content # False
bytes("ÇINARTEPE", "iso-8859-1") in req.content # False
我想要的是找出"ÇINARTEPE"
字符串是否在HTML源代码中。
一个例子:
req = requests.get("http://www.eshot.gov.tr/tr/OtobusumNerede/290")
"ÇINARTEPE" in req.text # False
req.encoding = "iso-8859-1"
"ÇINARTEPE" in req.text # False
req.encoding = "iso-8859-9"
"ÇINARTEPE" in req.text # False
# Supposed to return True
答案 0 :(得分:3)