当值具有实体时,Selenium WebDriver get_attribute返回href属性的截断值

时间:2018-02-22 09:08:26

标签: python python-3.x selenium selenium-webdriver html-entities

我正在尝试使用selenium Webdriver(Python)从我的应用程序页面上的锚点选项卡中获取href属性值,并且返回的结果已被剥离。

以下是HTML代码段 -

<a class="nla-row-text" href="/shopping/brands?search=kamera&amp;nm=Canon&amp;page=0" data-reactid="790">

以下是我正在使用的代码 -

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.common.action_chains import ActionChains

driver = webdriver.Firefox()
driver.get("xxxx")

url_from_attr = driver.find_element(By.XPATH,"(//div[@class='nla-children mfr']/div/div/a)[1]").get_attribute("href")

url_from_attr_raw = "%r"%url_from_attr

print(" URL from attribute -->> " + url_from_attr)
print(" Raw string -->> " + url_from_attr_raw)

我得到的输出是 -

/shopping/brands?search=kamera&page=0

而不是 -

/shopping/brands?search=kamera&amp;nm=Canon&amp;page=0 OR
/shopping/brands?search=kamera&nm=Canon&page=0

这是因为URL中的实体表示,因为我看到实体被剥离了吗?任何帮助或指针都会很棒

1 个答案:

答案 0 :(得分:3)

根据给定的 HTML ,您尝试过的定位器策略存在问题。您已使用索引[1]find_element,这很容易出错。索引例如通过[1]返回列表时,可以应用find_elements。在此用例中,优化的表达式为:

url_from_attr = driver.find_element(By.XPATH,"//div[@class='nla-children mfr']/div/div/a[@class='nla-row-text']").get_attribute("href")

定位器策略可以更加优化如下:

url_from_attr = driver.find_element(By.XPATH,"//div[@class='nla-children mfr']//a[@class='nla-row-text']").get_attribute("href")

更新A

根据您的评论,您仍然需要使用索引,优化的定位器策略可以是:

url_from_attr = driver.find_elements(By.XPATH,"//div[@class='nla-children mfr']//a[@class='nla-row-text'][1]").get_attribute("href")

get_attribute(attribute_name)

根据Python-API Source

    def get_attribute(self, name):
    """Gets the given attribute or property of the element.

    This method will first try to return the value of a property with the
    given name. If a property with that name doesn't exist, it returns the
    value of the attribute with the same name. If there's no attribute with
    that name, ``None`` is returned.

    Values which are considered truthy, that is equals "true" or "false",
    are returned as booleans.  All other non-``None`` values are returned
    as strings.  For attributes or properties which do not exist, ``None``
    is returned.

    :Args:
        - name - Name of the attribute/property to retrieve.

    Example::

        # Check if the "active" CSS class is applied to an element.
        is_active = "active" in target_element.get_attribute("class")

    """

    attributeValue = ''
    if self._w3c:
        attributeValue = self.parent.execute_script(
        "return (%s).apply(null, arguments);" % getAttribute_js,
        self, name)
    else:
        resp = self._execute(Command.GET_ELEMENT_ATTRIBUTE, {'name': name})
        attributeValue = resp.get('value')
        if attributeValue is not None:
        if name != 'value' and attributeValue.lower() in ('true', 'false'):
            attributeValue = attributeValue.lower()
    return attributeValue   

更新B

正如您在评论中提到的,该方法返回的网址值不存在于网页的任何位置,这意味着您也试图访问 href 属性早。所以可以有以下两种解决方案:

  • 遍历 DOM树并构建一个定位器,它将唯一地标识该元素,并使用 expected_conditions 来诱导WebDriverwait作为element_to_be_clickable,然后提取 href 属性。

  • 出于调试目的,您可以为元素添加time.sleep(10)以在 HTML DOM 中正确呈现,然后尝试提取 href 属性