硒:即使在导入更改后变量也不会更改

时间:2018-09-25 17:44:35

标签: python html python-3.x selenium-webdriver html-table

我正在使用硒库进行网页抓取项目,在其中我需要从某些表中提取一些数据。作为项目的一部分,我需要迭代表行并提取文章条件的作者,但它仅适用于第一行。似乎该变量保存了第一行的数据,即使每次迭代后也不会改变。 这是我的代码的一部分:

div_result = driver.find_element_by_class_name("result-body-paper")
papers = div_result.find_elements_by_tag_name("tr")
papers_information = []
for paper in papers:
  data = paper.find_elements_by_tag_name("td")
  result_title = data[1].text

  author = paper.find_element_by_xpath('//span[@data-paper-person="{id}"]'.format(id=person_id))
  try:
    first_author = author.find_element_by_tag_name("i").get_attribute("class")
  except:
    first_author = ""
  author_condition = "Helper"
  if first_author != "":
    if "pencil" in first_author:
      author_condition = "First Writer"
    if "asterisk" in first_author:
       author_condition = "Orginal Writer"
    if "star" in first_author:
      author_condition == "Orginal Worker"
  papers_information.append([author_condition,result_title])
与我期望的不同,每次first_authorauthor都与表的第一行相同。但是,其他部件可以正常工作并正常运行。 那是臭虫还是什么? 顺便说一下,这是我试图从中提取数据的html代码的一部分(仅由两个表行组成):

<tr class="zarEn selectable">
  <td class="result row center" width="35">1</td>
  <td class="result title "><a href="...">Hepatic insulin resistance, metabolic syndrome and cardiovascular disease</a></td>
  <td class="result author zarsmallEn" width="200">
    <span data-paper-person="98155">
      <a href="...">
        <img src="..." class="person-avatar-mini">
          <i class="fa fa-fw fa-pencil crimson absolute"></i>
      </a>
    </span>
  </td>
  <td class="result source_title ">
    <a href="...">Clinical Biochemistry</a>
  </td>
  <td class="result source_cs">
    <a href="...">2.35</a>
  </td>
  <td class="result published_year center">2009</td>
  <td class="result citation center">217</td>
</tr>
<tr class="zarEn selectable">
  <td class="result row center">2</td>
  <td class="result title "><a href="...">Molecular and cellular mechanisms linking inflammation to insulin resistance and β-cell dysfunction</a></td>
  <td class="result author zarsmallEn">
    <span data-paper-person="14144442">
      <a href="...">
      <img src="...">
      <i class="fa fa-fw fa-pencil lightgray absolute"></i>
      </a>
    </span>
    <span data-paper-person="14137800">
      <a href="..."><img src="..."></a>
    </span>
    <span data-paper-person="98155">
      <a href="...">
      <img src="...">
      <i class="fa fa-fw fa-asterisk crimson absolute"></i>
      </a>
    </span>
  </td>
  <td class="result source_title ">
     <a href="...">Translational Research</a>
  </td>
  <td class="result source_cs">
    <a href="...">4.26</a>
  </td>
  <td class="result published_year center">2016</td>
  <td class="result citation center">71</td>
</tr>
如您所见,两个“”的类名不同,但是first_author得到第一个,并且不再更改!

0 个答案:

没有答案