我是网络抓取新手。我需要从以下HTML表格代码的第二列的每第二行获取pdf文件:
<table class="tablebg" width="100%">
<tbody>
<tr>
<th colspan="4" align="left">Nov 09, 2017</th></tr>
<tr>
<td style="word-wrap:break-word;width:450;">
<a class="link2" href="FS_Notification.aspx?Id=11162&fn=5&Mode=0">Risk Management and Inter-Bank Dealings – Simplified Hedging Facility</a>
</td>
<td nowrap="" colspan="3">
<a target="_blank" href="http://rbidocs.rbi.org.in/rdocs/notification/PDFs/APD118ED4C6E75FAC43A0BA5A738C21F8A8A7.PDF"><img src="../Images/pdf.gif" border="0" align="bsmiddle"></a>
97 kb
</td>
</tr>
我试过下面的代码,但它没有拿到第二行的第二列:
from selenium import webdriver
chrome_path = r"C:/chromedriver.exe"
driver = webdriver.Chrome(chrome_path)
browser = driver.get("https://rbi.org.in/")
driver.find_element_by_xpath("""//*[@id="FEMA"]/a""").click()
driver.find_element_by_xpath("""//*[@id="FEMANotifications"]""").click()
result = driver.find_elements_by_xpath("//table//tr")
for rows in result:
second_row = result.__getitem__(2)
second_col = second_row.find_elements_by_partial_link_text("http://")
print(second_col)
请建议任何帮助?
答案 0 :(得分:1)
要打印html表第2列的第2行,您可以使用以下代码行:
print(driver.find_elements_by_xpath("//table[@class='tablebg']//tr//td/a[contains(@href,'http://rbidocs.rbi.org.in/rdocs/notification/PDFs')]").get_attribute('href'))