我正在使用Python和BeautfulSoup HTML解析器来选择HTML元素。但是,我无法正常工作。
response = requests_session.post(login_url, headers=headers, data=data_credentials) # log in to the requests Session so that you can reuse it
search_url= 'https://www.website.com/search.php'
p_id='342953'
response = requests_session.get(search_url,headers=headers, params={'query':p_id,'type':'p'})
redirected_urls=response.url
th_soup = BeautifulSoup(response.content, 'html.parser')
trx_ht =th_soup.select("body > table > tbody > tr > td > table > tbody > tr:nth-child(2) > td:nth-child(2) > div:nth-child(3) > table > tbody > tr:nth-child(11) > td > table > tbody > tr:nth-child(4) > td:nth-child(5) > input[type='hidden']:nth-child(1)")
答案 0 :(得分:1)
从您在pastebin中提供的HTML中,可以使用具有特定属性的#posts_list {
display: flex;
flex-direction: column;
overflow-y: scroll;
max-height: 60vh;
}
调用来定位隐藏的输入。如果您想要的字段始终以<link rel="stylesheet" href="https://stackpath.bootstrapcdn.com/bootstrap/4.3.1/css/bootstrap.min.css" integrity="sha384-ggOyR0iXCbMQv3Xipma34MD+dH/1fQ784/j6cY/iJTQUOhcWr7x9JvoRxT2MZw1T" crossorigin="anonymous">
<div>
<div class="container">
<div class="row align-items-center mt-3">
<div class="col-6 text-center">
<img width="150" height="150" src="img/user-placeholder.svg">
<p class="mt-2 mb-0">pholder</p>
</div>
<div class="col-6">
<div class="mb-2">
<div class="row">
<div class="col-6 text-center">pholder</div>
<div class="col-6 text-center">pholder</div>
</div>
<div class="row">
<div class="col-6 text-center">Friends</div>
<div class="col-6 text-center">Posts</div>
</div>
</div>
<butto class="btn btn-transparent mt-2">Click</button>
</div>
</div>
<br>
<p class="text-center m-0">Contents</p>
<hr class="mt-0 mb-2">
</div>
<div id="posts_list" class="h-60 text-center">
<div class="sample">
<img width="270" height="270" src="my_img.jpg">
<p>coding</p>
</div>
<div class="sample">
<img width="270" height="270" src="my_img.jpg">
<p>coding</p>
</div>
<div class="sample">
<img width="270" height="270" src="my_img.jpg">
<p>coding</p>
</div>
</div>
</div>
开头,则可以对BeautifulSoup使用正则表达式来查找所有匹配元素,如下所示:
.find_all()
对于您提供的HTML,这将返回一个元素,如下所示:
qtyb-
from bs4 import BeautifulSoup
import re
# Read the HTML in from a file (normally requests is used)
with open('sm7iXcUq.html', encoding='utf-8') as f_html:
html = f_html.read()
soup = BeautifulSoup(html, 'html.parser')
for i in soup.find_all('input', attrs={'type' : 'hidden', 'name' : re.compile('qtyb-.*')}):
print(i)
的值可以使用:
<input name="qtyb-52843099" type="hidden" value="1"/>
此方法将为您提供所有具有匹配的name
的元素。
答案 1 :(得分:0)
您还可以使用以下内容吗?假设[
{
"id": 5,
"name": "my new comment",
"expression": "My new expression",
"datecreated": "2019-04-19T15:38:23.174877+05:30",
"dateupdated": "2019-04-19T15:38:23.174877+05:30",
"parent": null,
"replied_to": [
{
"id": 6,
"name": "my sub comment",
"expression": "My new expression",
"datecreated": "2019-04-19T15:38:55.061534+05:30",
"dateupdated": "2019-04-23T23:56:01.846904+05:30",
"parent": 5,
"replied_to": [
{
"id": 7,
"name": "my sub sub comment",
"expression": "My new expression",
"datecreated": "2019-04-19T15:42:46.981884+05:30",
"dateupdated": "2019-04-23T23:56:38.131256+05:30",
"parent": 6,
"replied_to": []
}
]
}
]
},
{
"id": 8,
"name": "new test user comment",
"expression": "My new expression",
"datecreated": "2019-04-24T00:43:57.939082+05:30",
"dateupdated": "2019-04-24T00:43:57.939082+05:30",
"parent": null,
"replied_to": []
}
]
在所有来源中都是常数
input[value=1][name]