我有此页面 https://www.punters.com.au/form-guide/2020-01-14/
有些赛事名称如Spendthrift Australia Park,Dalby等。 我想找到一种提取特定国家/地区比赛的方法。例如,我的剧本应该在澳大利亚参加比赛。但我不知道该如何对这些种族进行正确的xpath操作。因为比赛次数每次都不一样。 或任何其他国家。 我只需要正确的xpath
from selenium import webdriver
country = input('Enter country name (ex Australia, New Zealand..): ')
driver = webdriver.Chrome()
driver.get("https://www.punters.com.au/form-guide/2020-01-14/")
for i in driver.find_elements_by_xpath("//tr[./td/img[@title='Australia']]//following-sibling::tr/td[@class='upcoming-race__td upcoming-race__meeting-name upcoming-races__show-pdfs']//following-sibling::td[1]/a".format(country)):
print(i.text)
driver.close()
答案 0 :(得分:1)
如果要选择具有一个“ magic xpath”的目标节点,则为:
from selenium import webdriver
country = 'South Africa'
driver = webdriver.Chrome()
driver.get("https://www.punters.com.au/form-guide/2020-01-14/")
xpath = f"//tr[preceding-sibling::tr[contains(@class, 'upcoming-race__row--country')]/td/img[@title='{country}']][position()<=(count(//tr[preceding-sibling::tr[contains(@class, 'upcoming-race__row--country')]/td/img[@title='{country}']])-count(//tr[preceding-sibling::tr[preceding-sibling::tr[contains(@class, 'upcoming-race__row--country')]/td/img[@title='{country}'] and contains(@class, 'upcoming-race__row--country')][1]])-count(//tr[preceding-sibling::tr[contains(@class, 'upcoming-race__row--country')]/td/img[@title='{country}'] and contains(@class, 'upcoming-race__row--country')][1]))]/td[1]"
found_nodes = driver.find_elements_by_xpath(xpath)
driver.close()
让我们描述此XPath在新西兰示例中的作用:
我将为XPath的块加上别名,以使结果概念的可读性更好。
1。第一部分是关于寻找起点的信息-让我们找到带有新西兰头文件(以TARGET_XPATH为别名)的节点:
`//tr[preceding-sibling::tr[contains(@class, 'upcoming-race__row--country')]/td/img[@title='New Zealand']]`
2。。现在我们需要将找到的结果限制为仅单个国家/地区。 我知道在当前情况下此操作的最佳选择-“位置”运算符。 我们必须在结果中提供最后一个有用元素的位置(在第一个“垃圾”之前)。让我们计算一下:
`(count(//tr[preceding-sibling::tr[contains(@class, 'upcoming-race__row--country')]/td/img[@title='New Zealand']])-count(//tr[preceding-sibling::tr[preceding-sibling::tr[contains(@class, 'upcoming-race__row--country')]/td/img[@title='New Zealand'] and contains(@class, 'upcoming-race__row--country')][1]])-count(//tr[preceding-sibling::tr[contains(@class, 'upcoming-race__row--country')]/td/img[@title='New Zealand'] and contains(@class, 'upcoming-race__row--country')][1]))`
我们在这里计数三种类型的元素:
a。国家标头节点(命名为COUNT_TOTALS个)之后的节点数:
count(//tr[preceding-sibling::tr[contains(@class, 'upcoming-race__row--country')]/td/img[@title='New Zealand']])
b。第一个“混乱”节点(命名为COUNT_AFTER_TRASHY_HEADER)之后的节点数:
count(//tr[preceding-sibling::tr[preceding-sibling::tr[contains(@class, 'upcoming-race__row--country')]/td/img[@title='New Zealand'] and contains(@class, 'upcoming-race__row--country')][1]])
c。并且我们必须检查任何“无用”节点,以防万一,当我们在表格中搜索比赛中的最后一个国家时-它不会有下一个“无用”节点(命名为COUNT_TRASHY_HEADER):
count(//tr[preceding-sibling::tr[contains(@class, 'upcoming-race__row--country')]/td/img[@title='New Zealand'] and contains(@class, 'upcoming-race__row--country')][1])
3。。使用我们的计数作为过滤器:
TARGET_XPATH[position()<=(COUNT_TOTALS - COUNT_AFTER_TRASHY_HEADER - COUNT_TRASHY_HEADER)]
答案 1 :(得分:0)
让我们这样尝试(仅适用于澳大利亚):
from selenium import webdriver
import pandas as pd
driver = webdriver.Chrome()
driver.get("https://www.punters.com.au/form-guide/2020-01-14/")
tabs = driver.find_elements_by_xpath('//table')
rows = []
for i in tabs[0].find_elements_by_xpath("//tr[./td/img[@title='Australia']]/following-sibling::tr[position()<5]"):
row = []
for dat in i.find_elements_by_xpath('.//td'):
row.append(dat.text)
rows.append(row)
pd.DataFrame(rows)
输出(请格式化)
0 1 2 3 4 5 6 7 8 9 10
0 Spendthrift Australia Park ABD ABD ABD ABD ABD ABD ABD ABD
1 Dalby 6,2 3,2 8,9,4 8,4,7 10,5,1 ABD 8,9,6 3,6,4 11,9,1 6,1,5
2 Corowa 3,1,4 6,4,3 2,4 2,1,5 2,7,9 12,2,6 3,1,6
3 Scone 14,9,6 10,1,18 5,3,1 7,2,6 12,6,8 12,2,10 12,7,2