无法使用BeautifulSoup CSS选择器选择HTML元素,但能够使用CSS选择器在JS中获取元素

时间:2019-04-19 16:01:22

标签: python-3.x beautifulsoup css-selectors

我正在使用Python和BeautfulSoup HTML解析器来选择HTML元素。但是,我无法正常工作。

response = requests_session.post(login_url, headers=headers, data=data_credentials) # log in to the requests Session so that you can reuse it

search_url= 'https://www.website.com/search.php'
p_id='342953'

response = requests_session.get(search_url,headers=headers, params={'query':p_id,'type':'p'})
redirected_urls=response.url
th_soup = BeautifulSoup(response.content, 'html.parser')
trx_ht =th_soup.select("body > table > tbody > tr > td > table > tbody > tr:nth-child(2) > td:nth-child(2) > div:nth-child(3) > table > tbody > tr:nth-child(11) > td > table > tbody > tr:nth-child(4) > td:nth-child(5) > input[type='hidden']:nth-child(1)")

2 个答案:

答案 0 :(得分:1)

从您在pastebin中提供的HTML中,可以使用具有特定属性的#posts_list { display: flex; flex-direction: column; overflow-y: scroll; max-height: 60vh; }调用来定位隐藏的输入。如果您想要的字段始终以<link rel="stylesheet" href="https://stackpath.bootstrapcdn.com/bootstrap/4.3.1/css/bootstrap.min.css" integrity="sha384-ggOyR0iXCbMQv3Xipma34MD+dH/1fQ784/j6cY/iJTQUOhcWr7x9JvoRxT2MZw1T" crossorigin="anonymous"> <div> <div class="container"> <div class="row align-items-center mt-3"> <div class="col-6 text-center"> <img width="150" height="150" src="img/user-placeholder.svg"> <p class="mt-2 mb-0">pholder</p> </div> <div class="col-6"> <div class="mb-2"> <div class="row"> <div class="col-6 text-center">pholder</div> <div class="col-6 text-center">pholder</div> </div> <div class="row"> <div class="col-6 text-center">Friends</div> <div class="col-6 text-center">Posts</div> </div> </div> <butto class="btn btn-transparent mt-2">Click</button> </div> </div> <br> <p class="text-center m-0">Contents</p> <hr class="mt-0 mb-2"> </div> <div id="posts_list" class="h-60 text-center"> <div class="sample"> <img width="270" height="270" src="my_img.jpg"> <p>coding</p> </div> <div class="sample"> <img width="270" height="270" src="my_img.jpg"> <p>coding</p> </div> <div class="sample"> <img width="270" height="270" src="my_img.jpg"> <p>coding</p> </div> </div> </div>开头,则可以对BeautifulSoup使用正则表达式来查找所有匹配元素,如下所示:

.find_all()

对于您提供的HTML,这将返回一个元素,如下所示:

qtyb-

from bs4 import BeautifulSoup import re # Read the HTML in from a file (normally requests is used) with open('sm7iXcUq.html', encoding='utf-8') as f_html: html = f_html.read() soup = BeautifulSoup(html, 'html.parser') for i in soup.find_all('input', attrs={'type' : 'hidden', 'name' : re.compile('qtyb-.*')}): print(i) 的值可以使用:

<input name="qtyb-52843099" type="hidden" value="1"/>

此方法将为您提供所有具有匹配的name的元素。

答案 1 :(得分:0)

您还可以使用以下内容吗?假设[ { "id": 5, "name": "my new comment", "expression": "My new expression", "datecreated": "2019-04-19T15:38:23.174877+05:30", "dateupdated": "2019-04-19T15:38:23.174877+05:30", "parent": null, "replied_to": [ { "id": 6, "name": "my sub comment", "expression": "My new expression", "datecreated": "2019-04-19T15:38:55.061534+05:30", "dateupdated": "2019-04-23T23:56:01.846904+05:30", "parent": 5, "replied_to": [ { "id": 7, "name": "my sub sub comment", "expression": "My new expression", "datecreated": "2019-04-19T15:42:46.981884+05:30", "dateupdated": "2019-04-23T23:56:38.131256+05:30", "parent": 6, "replied_to": [] } ] } ] }, { "id": 8, "name": "new test user comment", "expression": "My new expression", "datecreated": "2019-04-24T00:43:57.939082+05:30", "dateupdated": "2019-04-24T00:43:57.939082+05:30", "parent": null, "replied_to": [] } ] 在所有来源中都是常数

input[value=1][name]