Question

所以我想从＆lt; 获取 href p> 标记使用xpath。

我想使用＆lt;中的文字 h1＆gt; 标签（＆＃39;电缆条纹针织L / S Polo＆＃39;）并同时＆lt; 文字 p> 标记（＆＃39;白色＆＃39;）以找到＆lt;中的href p＆gt; 标记

。

注意：一个项目有更多颜色（更多文章包含不同的＆lt; p＆gt;标签，但相同的＆lt; h1＆gt;标签）！


Html来源

<article> <div class="inner-article"> <a href="/shop/tops-sweaters/ix4leuczr/a1ykz7f2b" style="height:150px;"> </a> <h1> <a href="/shop/tops-sweaters/ix4leuczr/a1ykz7f2b" class="name-link">Cable Stripe Knit L/S Polo </a> </h1> <p> <a href="/shop/tops-sweaters/ix4leuczr/a1ykz7f2b" class="name-link">White</a> </p> </div> </article>


我已尝试过此代码，但它无法正常工作

specificProductColor = driver.find_element_by_xpath("//div[@class='inner-article' and contains(text(), 'White') and contains(text(), 'Cable')]/p") driver.get(specificProductColor.get_attribute("href"))


非常感谢您的回复！

Answer 1

根据html源代码，获取href标签的xpath将是这样的：

specificProductColors = driver.find_elements_by_xpath("//div[@class='inner-article']//a[contains(text(), 'White') or contains(text(), 'Cable')]")

specificProductColors[0].get_attribute("href")

specificProductColors[1].get_attribute("href")

由于有2个超链接标记，您应该使用find_elements_by_xpath返回元素列表。在这种情况下，它将返回2个超链接标记，您可以使用get_attribute方法获取它们的href。

Answer 2

我有一个正常工作的代码。它不是最快的 - 这部分需要~550毫秒，但它的工作原理。如果有人可以简化，我会非常感激：）

从产品页面获取具有指定关键字（Cable）的所有产品，并从产品页面获取具有指定颜色（白色）的所有产品。它比较了href链接，并将想要的产品与想要的颜色匹配。

还想简化循环 - 如果链接匹配则停止for循环

specificProduct = driver.find_elements_by_xpath("//div[@class='inner-article']//*[contains(text(), '"+ productKeyword[arrayCount] +"')]")
specificProductColor = driver.find_elements_by_xpath("//div[@class='inner-article']//*[contains(text(), '"+ desiredColor[arrayCount] +"')]")



for i in specificProductColor: 
    specProductColor = i.get_attribute("href")
    for i in specificProduct: 
        specProduct = i.get_attribute("href")
        if specProductColor == specProduct:
            print(specProduct) 
            wantedProduct = specProduct


driver.get(wantedProduct)

使用python selenium xpath查找href链接

Html来源

2 个答案: