我正在尝试解析HTML表,并单独单击第三列中的每个超链接(在该超链接中显示href =“ javascript:showPayCheck)。有大量的帖子显示了如何解析表,但是我可以。找不到我正在使用的这张表中的任何内容:
<div class="screen-group-content">
<div class="checkview-checks">
<table cellpadding="2px" class="asureTable" cellspacing="0px" style="border-collapse: collapse;">
<tbody><tr class="trHeader">
<td style="font-weight: bold;">Payment Date</td>
<td style="font-weight: bold;">Payment Type</td>
<td style="font-weight: bold;">Check/ACH</td>
<td style="font-weight: bold;">View $</td>
</tr>
<tr>
<td style="cursor: default;">01/18/2019</td>
<td style="cursor: default;">Regular Check</td>
<td style="cursor: default;">ACH</td>
<td style="cursor: default;"><a href="javascript:showPayCheck(589, 3106, 'REG', 'D');" title="View Check Detail">$3,023.10</a></td>
</tr>
<tr>
<td style="cursor: default;">01/04/2019</td>
<td style="cursor: default;">Regular Check</td>
<td style="cursor: default;">ACH</td>
<td style="cursor: default;"><a href="javascript:showPayCheck(588, 3106, 'REG', 'D');" title="View Check Detail">$3,141.80</a></td>
</tr>
</tbody></table>
</div>
</div>
我尝试使用BeautifulSoup:
import BeautifulSoup as bSoup
soup = bSoup(driver.page_source, "html.parser")
td_list = soup.findAll('td')
for td in td_list:
print(td.text)
我已经尝试过硒:
elems = driver.find_elements_by_name("td")
for elem in elems:
print(elem.text)
elem.click()
我什么都没得到。该表的XPath是:
//*[@id="form1"]/div[3]/div/div/table
并且我尝试通过XPath获取表:
table=driver.find_element_by_xpath('//*[@id="form1"]/div[3]/div/div/table')
for elem in table:
print(elem.text)
但是我得到了错误:
NoSuchElementException: Message: no such element: Unable to locate element: {"method":"xpath","selector":"//*[@id="form1"]/div[3]/div/div/table"}
我在做什么错了?
答案 0 :(得分:1)
您的xpath可能更具体,建议您使用增量方法,首先尝试:
driver.find_element_by_xpath('//*[@id="form1"]//div[@class="screen-group-content"]')
如果以上返回True
driver.find_element_by_xpath('//*[@id="form1"]//div[@class="screen-group-content"]//table[@class="asureTable"]')
如果上面的情况也是如此;那么您可以在Xpath上方按索引获取行和数据。 另外,请检查帖子中随附的HTML代码段的上层结构中的所有框架。
答案 1 :(得分:0)
该表位于iFrame中。您必须选择它。在this之后,我按如下方式编辑了代码:
wait = WebDriverWait(driver, 10)
wait.until(eConds.frame_to_be_available_and_switch_to_it((wdBy.CSS_SELECTOR, "iframe[id='hr2oScreen']:nth-of-type(1)")))
for table in wait.until(eConds.presence_of_all_elements_located((wdBy.CSS_SELECTOR, "table tr")))[1:]:
data = [item.text for item in table.find_elements_by_css_selector("th,td")]
print(data)
感谢Pooja给了我有关如何确定文本不存在的提示。
答案 2 :(得分:-1)
您尝试过使用正则表达式吗?
使用硒:
import re
from selenium import webdriver
#n = webdriver.Firefox() or n.webdriver.Chrome()
n.get_url( your_url )
html_source_code = str(n.page_source)
# Using a regular expression
# The element that you want to fetch/collect
# will be inside of the 'values' variable
values = re.findall( r'title=\"View Check Detail\"\>(.+)\</td>', html_source_code )
更新:如果内容在 iframe 内,则可以使用 Selenium + Chrome驱动程序进行操作:
from selenium import webdriver
from selenium.webdriver.chrome import options
o = options.Options()
o.headless = True
n = webdriver.Chrome(options=o)
n.get_url( your_url )
links = n.find_elements_by_tag_name("iframe")
outer = [ e.get_attribute("src") for e in links]
# In the best case outer will be a list o strings,
# each outer's element contain the values of the src attribute.
# Compute the correct element inside of outer
n.get_url(correct_outer_element)
# This will make a 'new' html code.
# Create a new xpath and fetch the data!