如何使用Python和Selenium在其父<div>标记下抓取某些<div>标记的内容?

时间:2018-06-22 00:12:56

标签: html python-3.x selenium-webdriver web-scraping

HTML代码:

<div class="ComparablePlanList__hot-offer-section">
    <div class="ComparablePlanList__header">Hot offers</div>
    <div class="ComparablePlanList__content">
        <div class="ComparablePlanList__content-container ComparablePlanList__clearfix">
            <div class="ComparablePlanList__content-text">
                <div class="ComparablePlanList__offer-title">*Get 14GB bonus data</div>
                <div class="ComparablePlanList__offer-content">
                    <div>*Get 14GB/mth bonus data to use in Oz when you sign up to this plan. Offer ends 10/07/18. T&amp;C apply.</div>
                </div>
            </div>
        </div>
        <div class="ComparablePlanList__content-container ComparablePlanList__clearfix">
            <div class="ComparablePlanList__content-text">
                <div class="ComparablePlanList__offer-title">Offer for current customers</div>
                <div class="ComparablePlanList__offer-content">
                    <div>Get $5 off plan fees each month when you sign up to this plan as an additional service to your current plan of equal or lesser value. Not available with Student offer. Offer ends 10/07/18. T&amp;C apply.</div>
                </div>
            </div>
        </div>
   </div>
</div>

网页中有几个类似的代码块。

我想提取<div class="ComparablePlanList__offer-title">在其父div标签<div class="ComparablePlanList__hot-offer-section">下的内容。

如果我运行以下代码:

driver = webdriver.Chrome(executable_path='')
titles = driver.find_elements_by_css_selector(
".ComparablePlanList__hot-offer-section .ComparablePlanList__offer-title")
offer_titles = [t.get_attribute("textContent") for t in titles] 

offer_titles列表将是:

['*Get 9GB bonus data', 'Offer for current customers']

无论如何,是否可以通过它们各自的父div标签对值进行排序和分组,并使列表看起来像这样:

[['*Get 9GB bonus data', 'Offer for current customers'], [value1, value2]]

或者这样:

['*Get 9GB bonus data \n Offer for current customers', 'value1 \n value2']

谢谢!

0 个答案:

没有答案