如何从给定的锚标签中提取标题

时间:2019-06-22 01:13:04

标签: python selenium xpath css-selectors webdriverwait

如何获取xpath以从此html行提取标题。

没有任何用处,因为cssClass会随时间变化,因此代码可能会中断。我认为既然此标签中的href和text都是我要提取的名称,则可以使用相等条件。

<a class="FPmhX notranslate nJAzx" title="ceorackz_adpp" href="/ceorackz_adpp/">ceorackz_adpp</a>

我希望使用selenium API调用或常规正则表达式兼容python代码,以获取此定位标记的标题或文本。

4 个答案:

答案 0 :(得分:0)

使用以下列表中的任何xpath:

//a[@title='ceorackz_adpp']

//a[text()='ceorackz_adpp']

//a[@title='ceorackz_adpp' and text()='ceorackz_adpp']

答案 1 :(得分:0)

要从元素中提取标题,即 ceorackz_adpp ,您必须为visibility_of_element_located()引入 WebDriverWait ,并且可以使用以下任一解决方案:

  • 使用CSS_SELECTOR

    print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "a.notranslate[href='/ceorackz_adpp/']"))).get_attribute("title"))
    
  • 使用LINK_TEXT

    print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.LINK_TEXT, "ceorackz_adpp"))).get_attribute("title"))
    
  • 使用XPATH

    print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//a[contains(@class, 'notranslate') and @href='/ceorackz_adpp/']"))).get_attribute("title"))
    
  • 注意:您必须添加以下导入:

    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support import expected_conditions as EC
    

答案 2 :(得分:0)

右键单击“检查”部分中的class Signature { constructor() { this.color = "#000000"; this.sign = false; this.begin_sign = false; this.width_line = 5; this.canvas = document.getElementById('canvas'); this.cursorX, this.cursorY; this.context = canvas.getContext('2d'); this.context.lineJoin = 'round'; this.context.lineCap = 'round'; this.whenMouseDown(); .... } whenMouseDown() { document.addEventListener("mousedown", ({ pageX, pageY }) => { this.sign = true; this.cursorX = (pageX - this.offsetLeft); this.cursorY = (pageY - this.offsetTop); }) } ... } document.addEventListener("DOMContentLoaded", event => { new Signature(); }); 元素。 然后转到HTML。 然后使用此代码

Copy > Copy XPath

答案 3 :(得分:-1)

我不太确定,但我猜想,也许是这样的表达:

title="(.+?)">\s*(.+?)\s*<

可能是一个起点。

Demo

测试

import re

regex = r"title=\"(.+?)\">\s*(.+?)\s*<"

test_str = "<a class=\"FPmhX notranslate nJAzx\" title=\"ceorackz_adpp\" href=\"/ceorackz_adpp/\">ceorackz_adpp</a>"

matches = re.finditer(regex, test_str, re.DOTALL)

for matchNum, match in enumerate(matches, start=1):

    print ("Match {matchNum} was found at {start}-{end}: {match}".format(matchNum = matchNum, start = match.start(), end = match.end(), match = match.group()))

    for groupNum in range(0, len(match.groups())):
        groupNum = groupNum + 1

        print ("Group {groupNum} found at {start}-{end}: {group}".format(groupNum = groupNum, start = match.start(groupNum), end = match.end(groupNum), group = match.group(groupNum)))