有一个包含许多相似行的html代码。首先,我想从li标签中获取所有ID。这样可行。然后,我只想从带有类内容“ grouped-listing”(最后几行)的div标签获得ID。这行不通。
HTML:
<ul id="resultListItems">
<li data-id="102292896">
<div>
<article data-item="result" id="result-102292896" data-obid="102292896">
<div class="result-list-entry__grouped-listings">
<a href="/expose/102292896" id="result-102292896" data-go-to-expose-id="102292896" data-go-to-expose-referrer="RESULT_LIST_GROUPED">...</a>
<div class="slick-initialized slick-slider">
<div class="slick-list draggable">
<a href="/expose/102292896" id="result-102292896" data-go-to-expose-id="102292896" data-go-to-expose-referrer="RESULT_LIST_GROUPED">...</a>
<div class="slick-track" style="opacity: 1; width: 712px; transform: translate3d(0px, 0px, 0px);">
<div class="grouped-listing slick-slide slick-current slick-active grouped-listing--active" style="width: 162px;" data-slick-index="0" aria-hidden="false">
<a href="/expose/104436157" id="result-104436157" data-go-to-expose-id="104436157" data-go-to-expose-referrer="RESULT_LIST_GROUPED">
</a>
<div>
</div>
</div>
<div class="grouped-listing slick-slide slick-active" style="width: 162px;" data-slick-index="1" aria-hidden="false">
<a href="/expose/104435708" id="result-104435708" data-go-to-expose-id="104435708" data-go-to-expose-referrer="RESULT_LIST_GROUPED">
</a>
<div>
</div>
</div>
<div class="grouped-listing slick-slide slick-active" style="width: 162px;" data-slick-index="2" aria-hidden="false">
<a href="/Suche/controller/exposeNavigation/goToExpose.go?exposeId=104434267&searchUrl=%2FSuche%2FS-T%2FHaus-Kauf%2FBrandenburg%2FPotsdam&referrer=RESULT_LIST_GROUPED" id="result-104434267" data-go-to-expose-id="104434267" data-go-to-expose-referrer="RESULT_LIST_GROUPED">
</a>
<div>
</div>
</div>
<div class="grouped-listing slick-slide slick-active" style="width: 162px;" data-slick-index="3" aria-hidden="false">
<a href="/expose/104418108" id="result-104418108" data-go-to-expose-id="104418108" data-go-to-expose-referrer="RESULT_LIST_GROUPED">
</a>
<div>
</div>
</div>
</div>
可以正常工作的脚本:
try:
get_id = soup(url, "html.parser")
for biglist in get_id.find_all("li", {"data-id": True}):
if (biglist.parent.get("id") == "resultListItems"):
my_url = "https://www.abc.de/"+biglist.get("data-id")+"#/"
print(my_url)
那行不通:
try:
get_id = soup(url, "html.parser")
for biglist in get_id.find_all("a", {"data-go-to-expose-id": True}):
if (biglist.parent.get("class") == "grouped-listing"):
my_url = "https://www.abc.de/"+biglist.get("data-id")+"#/"
print(my_url)
有什么主意吗?
编辑:我的结果显示在这里: 该网页包含更多结果:https://www.immobilienscout24.de/Suche/S-T/Haus-Kauf/Brandenburg/Potsdam
答案 0 :(得分:0)
您在寻找这个吗?
get_id = BeautifulSoup(url, "html.parser")
for biglist in get_id.find_all("a", {"data-go-to-expose-id": True}):
x = len(biglist.parent.get("class"))
y = biglist.parent.get("class")
for i in range(x):
if (y[i] == "grouped-listing"):
my_url = "https://www.abc.de/"+biglist.get("data-go-to-expose-id")+"#/"
print(my_url)
输出:
https://www.abc.de/102292896#/
https://www.abc.de/102292896#/
答案 1 :(得分:0)
要搜索属性为<a>
且其父级为"data-go-to-expose-id"
且类为<div>
的属性"grouped-listing"
的所有'div.grouped-listing a[data-go-to-expose-id]'
标记,可以使用CSS选择器
data = """
<ul id="resultListItems">
<li data-id="102292896">
<div>
<article data-item="result" id="result-102292896" data-obid="102292896">
<div class="result-list-entry__grouped-listings">
<a href="/expose/102292896" id="result-102292896" data-go-to-expose-id="102292896" data-go-to-expose-referrer="RESULT_LIST_GROUPED">...</a>
<div class="slick-initialized slick-slider">
<div class="slick-list draggable">
<a href="/expose/102292896" id="result-102292896" data-go-to-expose-id="102292896" data-go-to-expose-referrer="RESULT_LIST_GROUPED">...</a>
<div class="slick-track" style="opacity: 1; width: 712px; transform: translate3d(0px, 0px, 0px);">
<div class="grouped-listing slick-slide slick-current slick-active grouped-listing--active" style="width: 162px;" data-slick-index="0" aria-hidden="false">
<a href="/expose/104436157" id="result-104436157" data-go-to-expose-id="104436157" data-go-to-expose-referrer="RESULT_LIST_GROUPED">
</a>
<div>
</div>
</div>
<div class="grouped-listing slick-slide slick-active" style="width: 162px;" data-slick-index="1" aria-hidden="false">
<a href="/expose/104435708" id="result-104435708" data-go-to-expose-id="104435708" data-go-to-expose-referrer="RESULT_LIST_GROUPED">
</a>
<div>
</div>
</div>
<div class="grouped-listing slick-slide slick-active" style="width: 162px;" data-slick-index="2" aria-hidden="false">
<a href="/Suche/controller/exposeNavigation/goToExpose.go?exposeId=104434267&searchUrl=%2FSuche%2FS-T%2FHaus-Kauf%2FBrandenburg%2FPotsdam&referrer=RESULT_LIST_GROUPED" id="result-104434267" data-go-to-expose-id="104434267" data-go-to-expose-referrer="RESULT_LIST_GROUPED">
</a>
<div>
</div>
</div>
<div class="grouped-listing slick-slide slick-active" style="width: 162px;" data-slick-index="3" aria-hidden="false">
<a href="/expose/104418108" id="result-104418108" data-go-to-expose-id="104418108" data-go-to-expose-referrer="RESULT_LIST_GROUPED">
</a>
<div>
</div>
</div>
</div>
"""
from bs4 import BeautifulSoup
soup = BeautifulSoup(data, 'lxml')
for a in soup.select('div.grouped-listing a[data-go-to-expose-id]'):
my_url2 = "https://www.abc.de/"+a['data-go-to-expose-id']+"#/"
print(my_url2)
,像这样:
https://www.abc.de/104436157#/
https://www.abc.de/104435708#/
https://www.abc.de/104434267#/
https://www.abc.de/104418108#/
这将打印:
Theirs