Question

我有此页面 https://www.punters.com.au/form-guide/2020-01-14/

有些赛事名称如Spendthrift Australia Park，Dalby等。我想找到一种提取特定国家/地区比赛的方法。例如，我的剧本应该在澳大利亚参加比赛。但我不知道该如何对这些种族进行正确的xpath操作。因为比赛次数每次都不一样。或任何其他国家。我只需要正确的xpath

from selenium import webdriver

country = input('Enter country name (ex Australia, New Zealand..): ')
driver = webdriver.Chrome()
driver.get("https://www.punters.com.au/form-guide/2020-01-14/")
for i in driver.find_elements_by_xpath("//tr[./td/img[@title='Australia']]//following-sibling::tr/td[@class='upcoming-race__td upcoming-race__meeting-name upcoming-races__show-pdfs']//following-sibling::td[1]/a".format(country)):
    print(i.text)

driver.close()

Answer 1

如果要选择具有一个“ magic xpath”的目标节点，则为：

from selenium import webdriver

country = 'South Africa'
driver = webdriver.Chrome()
driver.get("https://www.punters.com.au/form-guide/2020-01-14/")

xpath = f"//tr[preceding-sibling::tr[contains(@class, 'upcoming-race__row--country')]/td/img[@title='{country}']][position()<=(count(//tr[preceding-sibling::tr[contains(@class, 'upcoming-race__row--country')]/td/img[@title='{country}']])-count(//tr[preceding-sibling::tr[preceding-sibling::tr[contains(@class, 'upcoming-race__row--country')]/td/img[@title='{country}'] and contains(@class, 'upcoming-race__row--country')][1]])-count(//tr[preceding-sibling::tr[contains(@class, 'upcoming-race__row--country')]/td/img[@title='{country}'] and contains(@class, 'upcoming-race__row--country')][1]))]/td[1]"
found_nodes = driver.find_elements_by_xpath(xpath)

driver.close()

让我们描述此XPath在新西兰示例中的作用：

我将为XPath的块加上别名，以使结果概念的可读性更好。

1。第一部分是关于寻找起点的信息-让我们找到带有新西兰头文件（以TARGET_XPATH为别名）的节点：

`//tr[preceding-sibling::tr[contains(@class, 'upcoming-race__row--country')]/td/img[@title='New Zealand']]`

2。。现在我们需要将找到的结果限制为仅单个国家/地区。我知道在当前情况下此操作的最佳选择-“位置”运算符。我们必须在结果中提供最后一个有用元素的位置（在第一个“垃圾”之前）。让我们计算一下：

`(count(//tr[preceding-sibling::tr[contains(@class, 'upcoming-race__row--country')]/td/img[@title='New Zealand']])-count(//tr[preceding-sibling::tr[preceding-sibling::tr[contains(@class, 'upcoming-race__row--country')]/td/img[@title='New Zealand'] and contains(@class, 'upcoming-race__row--country')][1]])-count(//tr[preceding-sibling::tr[contains(@class, 'upcoming-race__row--country')]/td/img[@title='New Zealand'] and contains(@class, 'upcoming-race__row--country')][1]))`

我们在这里计数三种类型的元素：

a。国家标头节点（命名为COUNT_TOTALS个）之后的节点数：

count(//tr[preceding-sibling::tr[contains(@class, 'upcoming-race__row--country')]/td/img[@title='New Zealand']])

b。第一个“混乱”节点（命名为COUNT_AFTER_TRASHY_HEADER）之后的节点数：

count(//tr[preceding-sibling::tr[preceding-sibling::tr[contains(@class, 'upcoming-race__row--country')]/td/img[@title='New Zealand'] and contains(@class, 'upcoming-race__row--country')][1]])

c。并且我们必须检查任何“无用”节点，以防万一，当我们在表格中搜索比赛中的最后一个国家时-它不会有下一个“无用”节点（命名为COUNT_TRASHY_HEADER）：

count(//tr[preceding-sibling::tr[contains(@class, 'upcoming-race__row--country')]/td/img[@title='New Zealand'] and contains(@class, 'upcoming-race__row--country')][1])

3。。使用我们的计数作为过滤器：

TARGET_XPATH[position()<=(COUNT_TOTALS - COUNT_AFTER_TRASHY_HEADER - COUNT_TRASHY_HEADER)]

Answer 2

让我们这样尝试（仅适用于澳大利亚）：

from selenium import webdriver
import pandas as pd

driver = webdriver.Chrome()
driver.get("https://www.punters.com.au/form-guide/2020-01-14/")

tabs = driver.find_elements_by_xpath('//table')
rows = []
for i in tabs[0].find_elements_by_xpath("//tr[./td/img[@title='Australia']]/following-sibling::tr[position()<5]"):
    row = []
    for dat in i.find_elements_by_xpath('.//td'):        
        row.append(dat.text)
    rows.append(row)
pd.DataFrame(rows)

输出（请格式化）

             0  1   2   3   4   5   6   7   8   9   10
0   Spendthrift Australia Park  ABD ABD ABD ABD ABD ABD ABD ABD     
1   Dalby   6,2 3,2 8,9,4   8,4,7   10,5,1  ABD 8,9,6   3,6,4   11,9,1  6,1,5
2   Corowa  3,1,4   6,4,3   2,4 2,1,5   2,7,9   12,2,6  3,1,6           
3   Scone   14,9,6  10,1,18 5,3,1   7,2,6   12,6,8  12,2,10 12,7,2

通过xpath选择元素

2 个答案: