我想使用python3抓取href链接
现有代码:
re.findall()
来自此代码:
import lxml.html
import requests
dom = lxml.html.fromstring(requests.get('https://www.tripadvisor.co.uk/Search?singleSearchBox=true&geo=191&pid=3825&redirect=&startTime=1576072392277&uiOrigin=MASTHEAD&q=the%20grilled%20cheese%20truck&supportedSearchTypes=find_near_stand_alone_query&enableNearPage=true&returnTo=https%253A__2F____2F__www__2E__tripadvisor__2E__co__2E__uk__2F__&searchSessionId=AF4BFA0308CF336B90FD9602FA122CD11576072382852ssid&social_typeahead_2018_feature=true&sid=AF4BFA0308CF336B90FD9602FA122CD11576072410521&blockRedirect=true&ssrc=a&rf=1').content)
result = dom.xpath("//a[@class='review_count']/@href")
print (result)
使用我现有的代码,我将得到空白打印
我已经在此处找到链接:
<a class="review_count" href="/Restaurant_Review-g54774-d10073153-Reviews-The_Grilled_Cheese_Truck-Rapid_City_South_Dakota.html#REVIEWS" onclick="return false;" data-clicksource="ReviewCount">3 reviews</a>
因此将需要帮助,在这种情况下,获取locationId和selectedId进行打印会更好
有什么想法吗?
答案 0 :(得分:0)
您遇到的问题是因为数据是通过javascript加载的-尝试在禁用javascript的情况下查看页面
您可以尝试使用可与javascript一起运行的工具,例如。硒-https://selenium-python.readthedocs.io/
或者尝试跟踪JavaScript从何处加载数据,然后直接使用python请求