我正在使用硒库进行网页抓取项目,在其中我需要从某些表中提取一些数据。作为项目的一部分,我需要迭代表行并提取文章条件的作者,但它仅适用于第一行。似乎该变量保存了第一行的数据,即使每次迭代后也不会改变。 这是我的代码的一部分:
div_result = driver.find_element_by_class_name("result-body-paper")
papers = div_result.find_elements_by_tag_name("tr")
papers_information = []
for paper in papers:
data = paper.find_elements_by_tag_name("td")
result_title = data[1].text
author = paper.find_element_by_xpath('//span[@data-paper-person="{id}"]'.format(id=person_id))
try:
first_author = author.find_element_by_tag_name("i").get_attribute("class")
except:
first_author = ""
author_condition = "Helper"
if first_author != "":
if "pencil" in first_author:
author_condition = "First Writer"
if "asterisk" in first_author:
author_condition = "Orginal Writer"
if "star" in first_author:
author_condition == "Orginal Worker"
papers_information.append([author_condition,result_title])
first_author
和author
都与表的第一行相同。但是,其他部件可以正常工作并正常运行。
那是臭虫还是什么?
顺便说一下,这是我试图从中提取数据的html代码的一部分(仅由两个表行组成):
<tr class="zarEn selectable">
<td class="result row center" width="35">1</td>
<td class="result title "><a href="...">Hepatic insulin resistance, metabolic syndrome and cardiovascular disease</a></td>
<td class="result author zarsmallEn" width="200">
<span data-paper-person="98155">
<a href="...">
<img src="..." class="person-avatar-mini">
<i class="fa fa-fw fa-pencil crimson absolute"></i>
</a>
</span>
</td>
<td class="result source_title ">
<a href="...">Clinical Biochemistry</a>
</td>
<td class="result source_cs">
<a href="...">2.35</a>
</td>
<td class="result published_year center">2009</td>
<td class="result citation center">217</td>
</tr>
<tr class="zarEn selectable">
<td class="result row center">2</td>
<td class="result title "><a href="...">Molecular and cellular mechanisms linking inflammation to insulin resistance and β-cell dysfunction</a></td>
<td class="result author zarsmallEn">
<span data-paper-person="14144442">
<a href="...">
<img src="...">
<i class="fa fa-fw fa-pencil lightgray absolute"></i>
</a>
</span>
<span data-paper-person="14137800">
<a href="..."><img src="..."></a>
</span>
<span data-paper-person="98155">
<a href="...">
<img src="...">
<i class="fa fa-fw fa-asterisk crimson absolute"></i>
</a>
</span>
</td>
<td class="result source_title ">
<a href="...">Translational Research</a>
</td>
<td class="result source_cs">
<a href="...">4.26</a>
</td>
<td class="result published_year center">2016</td>
<td class="result citation center">71</td>
</tr>
first_author
得到第一个,并且不再更改!