我需要获取以下html代码中的文本(显示为日期)。代码a1v2v3
随页面而变化,因此我不能将其用作参考或使用CSS选择器。
相关HTML:
<div class="mvp-collapse-content-box">
<div data-v-a1v2v3="" class="mvp-row"><div data-v-a1v2v3="" class="mvp-tag mvp-tag-default mvp-tag-checked" style="margin-left: -16px; visibility: hidden;">
<!----> <span class="mvp-tag-text">LIVE</span>
<!----></div><span data-v-a1v2v3="" style="display: inline-block;">
2019.06.12 17:09
<br data-v-a1v2v3="">
完整HTML:
<div class="mvp-collapse-content-box">
<div data-v-a1v2v3="">
<div data-v-a1v2v3="" class="mvp-collapse mvp-collapse-simple">
<div data-v-a1v2v3="" class="mvp-collapse-item mvp-collapse-item-active" style="padding-left: 6px;">
<div class="mvp-collapse-header">
<!----> <div data-v-a1v2v3="" class="mvp-row-flex mvp-row-flex-middle">
<div data-v-a1v2v3="" class="mvp-col mvp-col-span-18 mvp-col-span-xs-16 mvp-col-span-sm-18 mvp-col-span-md-18 mvp-col-span-lg-18"><div data-v-a1v2v3="" class="mvp-tag mvp-tag-blue mvp-tag-checked" style="margin-left: -16px;">
<!----> <span class="mvp-tag-text mvp-tag-color-white">LIVE</span>
<!----></div><div data-v-a1v2v3="" class="versionAndMemo">
<span data-v-a1v2v3="" style="display: inline-block; line-height: 26px; vertical-align: middle; margin: 0px 1px; font-weight: bold; font-size: 14px;">1.2.3.44</span>
<!----></div></div>
<div data-v-a1v2v3="" class="mvp-col mvp-col-span-6 mvp-col-span-xs-8 mvp-col-span-sm-6 mvp-col-span-md-6 mvp-col-span-lg-6"><div data-v-a1v2v3="" style="display: inline-block; float: right; margin-right: 6px;"><i data-v-a1v2v3="" class="lal la-download" style="font-size: 1.8em; margin-top: 8px; display: table-cell; vertical-align: middle;"></i></div>
<div data-v-a1v2v3="" style="float: right; margin-right: 22px;"><i data-v-a1v2v3="" class="lal la-link" style="font-size: 1.8em; margin-top: 8px; display: table-cell; vertical-align: middle;"></i></div></div></div></div> <div class="mvp-collapse-content" style="" data-old-padding-top="" data-old-padding-bottom="" data-old-overflow="">
<div class="mvp-collapse-content-box">
<div data-v-a1v2v3="" class="mvp-row"><div data-v-a1v2v3="" class="mvp-tag mvp-tag-default mvp-tag-checked" style="margin-left: -16px; visibility: hidden;">
<!----> <span class="mvp-tag-text">LIVE</span>
<!----></div><span data-v-a1v2v3="" style="display: inline-block;">
2019.06.12 17:09
<br data-v-a1v2v3="">
这是我到目前为止所拥有的:
page = requests.get(app, headers=headers, cookies=cookies).text
soup = BeautifulSoup(page, 'html.parser')
for spantime in soup.findAll("div", {"class": "mvp-collapse-content-box"}):
print(spantime)
但是没有打印任何内容。我也尝试添加以下内容:
page = requests.get(app, headers=headers, cookies=cookies).text
soup = BeautifulSoup(page, 'html.parser')
for spantime in soup.findAll("div", {"class": "mvp-collapse-content-box"}):
print(spantime.text)
for span in spantime.find_all('span', recursive=True):
print(span.text)
但是它们都不打印任何内容。我觉得这可能与我使用的mvp-collapse-content-box
类有关-某些类的div标签不一定具有span标签,如Full HTML所示。
答案 0 :(得分:1)
使用find_next()查找span标签并使用text属性。
{
"dependencies": {
"@nestjs/common": "^6.0.0",
"@nestjs/core": "^6.0.0",
"@nestjs/mongoose": "^6.1.2",
"@nestjs/platform-express": "^6.0.0",
"mongoose": "^5.5.14",
"reflect-metadata": "^0.1.12",
"rimraf": "^2.6.2",
"rxjs": "^6.3.3"
},
"devDependencies": {
"@nestjs/testing": "^6.0.0",
"@types/express": "^4.16.0",
"@types/jest": "^23.3.13",
"@types/mongoose": "^5.5.6",
"@types/node": "^10.12.18",
"@types/supertest": "^2.0.7",
"concurrently": "^4.1.0",
"jest": "^23.6.0",
"nodemon": "^1.18.9",
"prettier": "^1.15.3",
"supertest": "^3.4.1",
"ts-jest": "24.0.2",
"ts-node": "8.1.0",
"tsconfig-paths": "3.8.0",
"tslint": "5.16.0",
"typescript": "3.4.3",
"wait-on": "^3.2.0"
},
}
from bs4 import BeautifulSoup
html='''<div class="mvp-collapse-content-box">
<div data-v-a1v2v3="" class="mvp-row"><div data-v-a1v2v3="" class="mvp-tag mvp-tag-default mvp-tag-checked" style="margin-left: -16px; visibility: hidden;">
<!----> <span class="mvp-tag-text">LIVE</span>
<!----></div><span data-v-a1v2v3="" style="display: inline-block;">
2019.06.12 17:09
<br data-v-a1v2v3="">'''
soup=BeautifulSoup(html,'html.parser')
div=soup.find('div',class_="mvp-collapse-content-box")
print(div.find_next('span').find_next('span').text.strip())
答案 1 :(得分:0)
您始终可以选择一个子标签,如下所示:
div = soup.find("div", { "class" : "mvp-collapse-content-box" })
spans = div.findChildren("span" , recursive=False)
for span in spans:
print span