提取<a> content in Python, Selenium Webdriver

时间:2017-03-05 12:22:28

标签: python selenium selenium-webdriver webdriver

I actually make a script, which check auction portal for new interested auctions for me. Now script choose the item name, category, add time and make a list of auctions. Here is start my problem. My code:

#List of auctions
time.sleep(2)
lists= driver.find_elements_by_class_name("vela__item__1FnoI")
print ("Found " + str(len(lists)) + " auctions")

for link in driver.find_elements_by_xpath('//div[@class="vela__item__1FnoI"]//a'):
    print (link.get_attribute('href') + "-" + link.text)

Now it's look horrible:

<selenium.webdriver.firefox.webelement.FirefoxWebElement (session="dae57d0d-9570-4693-bb7f-8aa31ab24699", element="49e4afcd-f6c3-4b62-bba0-a3b21e08c78d")>
<selenium.webdriver.firefox.webelement.FirefoxWebElement (session="dae57d0d-9570-4693-bb7f-8aa31ab24699", element="3f2a9f43-26b8-40f6-a4b6-497d46e41598")> etc
Please help me to achive this result wiev:

http://allegro.pl/doris-wozek-dla-lalek-3f-nosidlo-torba-posciel-15k-i6735944795.html - DORISWÓZEKDLALALEK3FNOSIDŁOTORBAPOÅCIEL15K

http://allegro.pl/sukienka-ubranko-dla-lalki-barbie-de-lux-i6739976160.html - Sukienka ubranko dla lalki Barbie! DE LUX!

HTML搜索结果:

<article class="item__item__2lO83 ">
                    <div class="vela__item__1FnoI">
                        <div class="vela__item__details__1di9R">
                            <div class="photo__thumbnail__1SaYl ">
                                <noscript>
                                    <i><img src="https://1.allegroimg.com/s128/0166b6/964534be46848305f499770a74f1" alt="DORIS WÓZEK DLA LALEK 3F NOSIDŁO TORBA POŚCIEL 15K" /></i>
                                </noscript>
                            </div>
                            <h2 class="header__title__2RWO4">
                                <a href="http://allegro.pl/doris-wozek-dla-lalek-3f-nosidlo-torba-posciel-15k-i6735944795.html">DORIS WÓZEK DLA LALEK 3F NOSIDŁO TORBA POŚCIEL 15K</a>
                            </h2>
                        </div>
                    </div>
                </article><article class="item__item__2lO83 ">
                    <div class="vela__item__1FnoI">
                        <div class="vela__item__details__1di9R">
                            <div class="photo__thumbnail__1SaYl ">
                                <noscript>
                                    <i><img src="https://e.allegroimg.com/s128/0129ef/ec0ceef742ce9cdecbe3465a67fe" alt="Sukienka ubranko dla lalki Barbie! DE LUX!" /></i>
                                </noscript>
                            </div>
                            <h2 class="header__title__2RWO4">
                                <a href="http://allegro.pl/sukienka-ubranko-dla-lalki-barbie-de-lux-i6739976160.html">Sukienka ubranko dla lalki Barbie! DE LUX!</a>
                            </h2>
                        </div>
                    </div>
                </article>

2 个答案:

答案 0 :(得分:1)

print (item)中,您正在打印WebElement to_string()方法。要打印文本,请使用

print (item.text)

答案 1 :(得分:0)

您可以使用以下代码来提取链接和链接文字:

for link in driver.find_elements_by_xpath('//div[@class="vela__item__1FnoI "]//a'):
    print(link.get_attribute('href') + "-" + link.text)