我想在Python中使用Beautiful Soup找到包含字符串的所有元素。
当我使用非波斯语字符时,它会起作用,但在使用波斯语字符时则不行。
from bs4 import BeautifulSoup
QUERY = 'رشته فارسی'
URL = 'http://www.example.com'
headers = {
'User-Agent': "Mozilla/5.0 . . . "
}
request = urllib2.Request(URL, headers=headers)
response = urllib2.urlopen(request)
response_content = response.read().decode('utf8')
soup = BeautifulSoup(response_content, 'html.parser')
fetched = soup.find_all(text=QUERY)
print(fetched)
对于上面的代码,输出为[]
,但如果我在查询中使用ASCII,则它会起作用。
是否有任何UTF-8转换或解决方法:)?
答案 0 :(得分:1)
#-*- coding: utf-8 -*-
import urllib2
from bs4 import BeautifulSoup
QUERY = 'خدمات'
URL = 'https://bayan.ir/service/bayan/'
headers = {
'User-Agent': "Mozilla/5.0 . . . "
}
request = urllib2.Request(URL, headers=headers)
response = urllib2.urlopen(request)
response_content = response.read()
soup = BeautifulSoup(response_content, 'html.parser')
fetched = soup.find_all(string=QUERY)
print(fetched)
有效!