我正在尝试使用BeautifulSoup(Python 3.7)在块内选择特定链接。如何在所选块中选择特定链接?
这是我目前正在做的工作,我以前使用过硒,但我认为还没有必要。
from bs4 import BeautifulSoup
import requests
base_url = 'http://www.shop.pr'
shop_urls = {'econo' : '/econo/shoppers' ,
'pueblo' : '/pueblo/shoppers' ,
'costco' : '/costco/shoppers' ,
'econo' : '/econo/shoppers'}
selected_shop = 'econo'
append_to_url = shop_urls.get(selected_shop)
url = base_url + append_to_url
page = requests.get(url)
soup = BeautifulSoup(page.text , 'html.parser')
toString = str(soup.prettify)
file = open('page.txt','w+')
file.write(toString)
wrapper = soup.find("div", {"class": "wrapper"})
sub_wrapper = wrapper.find('div' , {'class' : 'breadcrumb-holder' })
print(sub_wrapper)
深入研究代码之后,我明白了:
<div class="breadcrumb-holder">
<div data-react-class="SliderPageLink" data-react-
props='{"baseLink":"/econo/shoppers/donde-mejor-se-compra-20190711/4878/product-list-view","page":1,"linkText":"VER PRODUCTOS","sliderSelector":"#shopper-terminal .catalog-view .slider","show":true,"back":false}'></div>
<ul class="breadcrumb">
<li>
<a href="/">Shoppers</a>
</li>
<li>
<a href="/econo/shoppers?clientid=1"><strong>Econo</strong>
</a></li>
</ul>
</div>
,后来尝试获得:
"/econo/shoppers/donde-mejor-se-compra-20190711/4878/product-list-view"
,但返回“无”。
答案 0 :(得分:0)
TextBox
似乎是有效的python字典。如果是这样,我建议您使用data-react-props
将其转换为字典,然后获取所需的内容。
import ast # Your code here drp = wrapper.find('div' , {'data-react-class': 'SliderPageLink'})['data-react-props'] drp_dict = ast.literal_eval(drp.replace(':true', ':True').replace(':false', ':False')) base_link = drp_dict['baseLink'] # Your link here
使用ast.literal_eval
似乎很安全,如其文档所述
Help on function literal_eval in module ast: literal_eval(node_or_string) Safely evaluate an expression node or a string containing a Python expression. The string or node provided may only consist of the following Python literal structures: strings, numbers, tuples, lists, dicts, booleans, and None.
但是,可能需要对字符串进行一些更改,例如ast.literal_eval
不是python表达式。
答案 1 :(得分:0)
如果我正确理解了您要寻找的东西,那么应该可以:
首先
import json
然后,将以下内容添加到代码的wrapper
部分:
target = sub_wrapper.find('div')
td = json.loads(target['data-react-props'])
print(td['baseLink'])
输出:
'/ econo / shoppers / donde-mejor-se-compra-20190711 / 4878 / product-list-view'