我有一个包含表格的网页,只有在我点击“检查元素”时才会显示该表格。并且通过“查看源”页面无法看到。该表只包含两行,每行包含几个单元格,看起来类似于:
<table class="datadisplaytable">
<tbody>
<tr>
<td class="dddefault">16759</td>
<td class="dddefault">MATH</td>
<td class="dddefault">123</td>
<td class="dddefault">001</td>
<td class="dddefault">Calculus</td>
<td class="dddefault"></td>
<td class="dddead"></td>
<td class="dddead"></td>
</tr>
<tr>
<td class="dddefault">16449</td>
<td class="dddefault">PHY</td>
<td class="dddefault">456</td>
<td class="dddefault">002</td>
<td class="dddefault">Physics</td>
<td class="dddefault"></td>
<td class="dddead"></td>
<td class="dddead"></td>
</tr>
</tbody>
</table>
我尝试做的是迭代行并返回每个单元格中包含的文本。我似乎无法用Selenium来做。元素不包含ID,我不确定如何获取它们。我对使用xpath等不是很熟悉。
这是一个返回TypeError
的调试尝试:
def check_grades(self):
table = []
for i in self.driver.find_element_by_class_name("dddefault"):
table.append(i)
print(table)
从行中获取文本的简单方法是什么?
答案 0 :(得分:9)
如果要使用xpath逐行进行,可以使用以下命令:
h = """<table class="datadisplaytable">
<tr>
<td class="dddefault">16759</td>
<td class="dddefault">MATH</td>
<td class="dddefault">123</td>
<td class="dddefault">001</td>
<td class="dddefault">Calculus</td>
<td class="dddefault"></td>
<td class="dddead"></td>
<td class="dddead"></td>
</tr>
<tr>
<td class="dddefault">16449</td>
<td class="dddefault">PHY</td>
<td class="dddefault">456</td>
<td class="dddefault">002</td>
<td class="dddefault">Physics</td>
<td class="dddefault"></td>
<td class="dddead"></td>
<td class="dddead"></td>
</tr>
</table>"""
from lxml import html
xml = html.fromstring(h)
# gets the table
table = xml.xpath("//table[@class='datadisplaytable']")[0]
# iterate over all the rows
for row in table.xpath(".//tr"):
# get the text from all the td's from each row
print([td.text for td in row.xpath(".//td[@class='dddefault'][text()])
哪个输出:
['16759', 'MATH', '123', '001', 'Calculus']
['16449', 'PHY', '456', '002', 'Physics']
使用td[text()]
将避免为不包含任何文字的td返回任何None。
所以使用硒来做同样的事情:
table = driver.find_element_by_xpath("//table[@class='datadisplaytable']")
for row in table.find_elements_by_xpath(".//tr"):
print([td.text for td in row.find_elements_by_xpath(".//td[@class='dddefault'][1]"])
对于多个表:
def get_row_data(table):
for row in table.find_elements_by_xpath(".//tr"):
yield [td.text for td in row.find_elements_by_xpath(".//td[@class='dddefault'][text()]"])
for table in driver.find_elements_by_xpath("//table[@class='datadisplaytable']"):
for data in get_row_data(table):
# use the data
答案 1 :(得分:1)
另一个版本(由Padraic Cunningham修改和更正的帖子): 使用Python 3.x进行测试
#!/usr/bin/python
h = """<table class="datadisplaytable">
<tr>
<td class="dddefault">16759</td>
<td class="dddefault">MATH</td>
<td class="dddefault">123</td>
<td class="dddefault">001</td>
<td class="dddefault">Calculus</td>
<td class="dddefault"></td>
<td class="dddead"></td>
<td class="dddead"></td>
</tr>
<tr>
<td class="dddefault">16449</td>
<td class="dddefault">PHY</td>
<td class="dddefault">456</td>
<td class="dddefault">002</td>
<td class="dddefault">Physics</td>
<td class="dddefault"></td>
<td class="dddead"></td>
<td class="dddead"></td>
</tr>
</table>"""
from lxml import html
xml = html.fromstring(h)
# gets the table
table = xml.xpath("//table[@class='datadisplaytable']")[0]
# iterate over all the rows
for row in table.xpath(".//tr"):
# get the text from all the td's from each row
print([td.text for td in row.xpath(".//td[@class='dddefault']")])
答案 2 :(得分:1)
XPath很脆弱。最好使用CSS选择器或类:
mytable = find_element_by_css_selector('table.datadisplaytable')
for row in mytable.find_elements_by_css_selector('tr'):
for cell in row.find_elements_by_tag_name('td'):
print(cell.text)
答案 3 :(得分:0)
@Padraic Cunningham的答案中硒部分的纠正:
table = driver.find_element_by_xpath("//table[@class='datadisplaytable']")
for row in table.find_elements_by_xpath(".//tr"):
print([td.text for td in row.find_elements_by_xpath(".//td[@class='dddefault']")])
注意:结尾处缺少一个圆括号;还删除了[1]索引,以匹配第一个XML示例。
另一注:但是,还应保留索引为[1]的示例,以显示如何提取单个元素。