运行我的刮刀我可以看到它从yell.com中取不出任何东西。到目前为止,我知道Xpaths没问题。无法确定我是否犯过任何错误。希望有任何解决方法。我尝试使用以下代码:
import requests
from lxml import html
url="https://www.yell.com/ucs/UcsSearchAction.do?keywords=pizza&location=all+states&scrambleSeed=821749505"
def Startpoint(address):
response = requests.get(address)
tree = html.fromstring(response.text)
titles = tree.xpath('//div[contains(@class,"col-sm-24")]')
for title in titles:
try:
Name=title.xpath('.//h2[@itemprop="name"]/text()')[0]
print(Name)
except exception as e:
print(e.message)
continue
Startpoint(url)
答案 0 :(得分:1)
您需要指定假装为真实浏览器的User-Agent
字符串 :
response = requests.get(address, headers={'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.81 Safari/537.36'})
其他一些说明:
Exception
以大写字母开头你不应该在你的定位器中使用col-sm-24
类 - 这种引导类是特定于布局的,并没有真正带来任何数据容器特定类型的信息。请改用businessCapsule
类:
titles = tree.xpath("//div[contains(concat(' ', @class, ' '), ' businessCapsule ')]")
您可以使用findtext()
方法查找结果标题:
results = tree.xpath("//div[contains(concat(' ', @class, ' '), ' businessCapsule ')]")
for result in results:
name = result.findtext('.//h2[@itemprop="name"]')
print(name)