如何在某些条件下通过python选择HTML页面中的上一个元素

时间:2019-04-08 17:44:44

标签: python beautifulsoup

你好,我试图从网站上获取一些数据,然后我应该在页面中找到我上一次使用的最后一个元素,并选择第一个元素的Previews元素,请检查我的代码,我将解释更完整在我的示例中:

这是示例HTML代码:

<div class="post" id="7517049">
    <div class="p-head">
        <div class="p-c p-c-time"><span class="p-time" data="1554741054" title="2019-04-08 @ 21:00:54 ( Your Time )"><span class="t-n-m">45</span> <span class="t-u">mins</span></span>
        </div>
        <div class="p-c p-c-cat"><span class="p-cat c-5 c-7 "><a href="http://predb.me?cats=tv" class="c-adult">TV</a><a href="http://predb.me?cats=tv-hd" class="c-child">HD</a></span></div>
        <div class="p-c p-c-title">
            <h2><a class="p-title" href="http://predb.me?post=7517049">The.Repair.Shop.S04E02.720p.WEBRip.x264-LiGATE</a></h2>
            <a rel="nofollow" href="http://predb.me?post=7517049" class="tb tb-perma" title="Visit the permanent page for this release."></a>
        </div>
    </div>
</div>

<div class="post" id="7517048">
    <div class="p-head">
        <div class="p-c p-c-time"><span class="p-time" data="1554740951" title="2019-04-08 @ 20:59:11 ( Your Time )"><span class="t-n-m">47</span> <span class="t-u">mins</span></span>
        </div>
        <div class="p-c p-c-cat"><span class="p-cat c-24 c-25 "><a href="http://predb.me?cats=books" class="c-adult">Books</a><a href="http://predb.me?cats=books-ebooks" class="c-child">eBooks</a></span></div>
        <div class="p-c p-c-title">
            <h2><a class="p-title" href="http://predb.me?post=7517048">John.Bell.Young.Puccini.A.Listeners.Guide.Dover.Books.on.Music.and.Music.History.2016.RETAiL.ePub.eBook-VENTOLiN</a></h2>
            <a rel="nofollow" href="http://predb.me?post=7517048" class="tb tb-perma" title="Visit the permanent page for this release."></a>
        </div>
    </div>
</div>

<div class="post" id="7517047">
    <div class="p-head">
        <div class="p-c p-c-time"><span class="p-time" data="1554740927" title="2019-04-08 @ 20:58:47 ( Your Time )"><span class="t-n-m">48</span> <span class="t-u">mins</span></span>
        </div>
        <div class="p-c p-c-cat"><span class="p-cat c-5 c-6 "><a href="http://predb.me?cats=tv" class="c-adult">TV</a><a href="http://predb.me?cats=tv-sd" class="c-child">SD</a></span></div>
        <div class="p-c p-c-title">
            <h2><a class="p-title" href="http://predb.me?post=7517047">The.Repair.Shop.S04E01.WEB.h264-LiGATE</a></h2>
            <a rel="nofollow" href="http://predb.me?post=7517047" class="tb tb-perma" title="Visit the permanent page for this release."></a>
        </div>
    </div>
</div>

在顶部,我们有3个主要div,其中包含另一个div,例如,我在第3个主要div中给出了<a>标签的值,值为The.Repair.Shop.S04E01.WEB.h264-LiGATE,而我想下一次我的脚本重新加载了页面,然后在页面中找到The.Repair.Shop.S04E01.WEB.h264-LiGATE,并通过网站实际值通过电视值选择了具有<span><a>的上一个div,我需要选择上一个元素通过电视价值拥有<a>。在示例html中,第1个div具有TV值,而第2个div没有TV值。有这个主意吗?

我尝试过的python代码:

my_soup = Wsoup(my_driver, "html.parser")


last_rls = input("Please Insert starter Release From Predb.me ::::")


previous_rls = my_soup.find("a", text=last_rls)

print(previous_rls)
Entry= previous_rls.parent.parent.parent.parent


previous_rls_parent = Entry.find_previous_sibling("div",{"class":"post"})

print(previous_rls_parent)

python代码可以显示先前的元素,但是我需要通过电视值显示包含<a>标签的先前的elemenet

1 个答案:

答案 0 :(得分:0)

如果您要显示所搜索帖子的3个<div>元素中的文本,则可以尝试以下方法:

from bs4 import BeautifulSoup

search = "The.Repair.Shop.S04E01.WEB.h264-LiGATE"
soup = BeautifulSoup(my_driver, "html.parser")

rls = soup.find("a", text=search)
div_parent = rls.find_previous('div', class_='p-head')

for div in div_parent.find_all('div'):
    print(div.get_text(strip=True))

这将显示以下3个项目:

48mins
TVSD
The.Repair.Shop.S04E01.WEB.h264-LiGATE