selenium,python从没有id,类等的表中提取特定数据

时间:2017-06-22 18:20:30

标签: python html selenium html-table

我刚开始通过播放基于文本的浏览器游戏“学习python”。我买了很多不同的武器,我可以卖,但首先我想知道它的价值。 但我无法弄清楚如何从表中获取特定数据,例如:

我的代码:

from selenium import webdriver  
from bs4 import BeautifulSoup as bs  
browser = webdriver.Firefox()  
browser.get("http://www.mafiaway.nl/shop.php?p=sell")  
source = browser.page_source
soup = bs(source, "html.parser")  
for td in soup.findAll("td"):  
    print(td.text)

得到这个:

Mes

Mes Inclusief training en gratis herregistraties  
 Geld terug  
$5.700 >>> is what i want  
 Minpunten  
Power -30  
 Aantal  
Nog 8.758.118 >>> is what i want  
 Verkopen  
+ like 50 other tables like this

基本上我的代码打印出表格,但我只想要上面的2个对象..

html code =

<table align="center" width="100%">
   <form method="post">
   </form>
  </table>
  <table align="center" cellspacing="1" width="610">
   <tbody>
    <tr>
     <td class="subtitle" colspan="6">
      Mes
     </td>
    </tr>
    <tr>
     <td align="center" class="maintxt" rowspan="5" width="150">
      <img height="150" src="images/item-Knife.gif"/>
     </td>
     <td class="maintxt" colspan="5" width="450">
      <span class="tekstheader">
       Mes
      </span>
      <br/>
      <br/>
      <i>
       Inclusief training en gratis herregistraties
      </i>
     </td>
    </tr>
    <tr>
     <td class="maintxt" width="225">
      <img class="icon" src="images/icons/money.png"/>
      <b>
       Geld terug
      </b>
     </td>
     <td class="maintxt" colspan="4" width="225">
      $5.700
     </td>
    </tr>
    <tr>
     <td class="maintxt" width="225">
      <img class="icon" src="images/icons/chart_pie_add.png"/>
      <b>
       Minpunten
      </b>
     </td>
     <td class="maintxt" colspan="4" width="225">
      <span class="errorbold">
       <b>
        Power -30
       </b>
      </span>
     </td>
    </tr>
    <tr>
     <td class="maintxt" width="225">
      <img class="icon" src="images/icons/group.png"/>
      <b>
       Aantal
      </b>
     </td>
     <td class="maintxt" colspan="4" width="225">
      Nog 8.758.118
     </td>
    </tr>
    <tr>
     <td class="maintxt" width="225">
      <img class="icon" src="images/icons/basket.png"/>
      <b>
       Verkopen
      </b>
     </td>
     <td align="center" class="maintxt" colspan="4" width="225">
      <input name="num" size="4" type="text"/>
      <input name="koop" type="submit" value="Verkoop"/>
      <input maxlength="20" name="id" type="hidden" value="1"/>
     </td>
    </tr>
   </tbody>
  </table>
  <br/>
  <form method="post">
   <table align="center" cellspacing="1" width="610">
    <tbody>
     <tr>
      <td class="subtitle" colspan="6">
       Walter P99
      </td>
     </tr>
     <tr>
      <td align="center" class="maintxt" rowspan="5" width="150">
       <img height="150" src="images/item-Walter_P99.gif"/>
      </td>
      <td class="maintxt" colspan="5" width="450">
       <span class="tekstheader">
        Walter P99
       </span>
       <br/>
       <br/>
       <i>
        Inclusief training en gratis herregistraties
       </i>
      </td>
     </tr>
     <tr>
      <td class="maintxt" width="225">
       <img class="icon" src="images/icons/money.png"/>
       <b>
        Geld terug
       </b>
      </td>
      <td class="maintxt" colspan="4" width="225">
       $14.250
      </td>
     </tr>
     <tr>
      <td class="maintxt" width="225">
       <img class="icon" src="images/icons/chart_pie_add.png"/>
       <b>
        Minpunten
       </b>
      </td>
      <td class="maintxt" colspan="4" width="225">
       <span class="errorbold">
        <b>
         Power -75
        </b>
       </span>
      </td>
     </tr>
     <tr>
      <td class="maintxt" width="225">
       <img class="icon" src="images/icons/group.png"/>
       <b>
        Aantal
       </b>
      </td>
      <td class="maintxt" colspan="4" width="225">
       Nog 37.251
      </td>
     </tr>
     <tr>
      <td class="maintxt" width="225">
       <img class="icon" src="images/icons/basket.png"/>
       <b>
        Verkopen
       </b>
      </td>
      <td align="center" class="maintxt" colspan="4" width="225">
       <input name="num" size="4" type="text"/>
       <input name="koop" type="submit" value="Verkoop"/>
       <input maxlength="20" name="id" type="hidden" value="2"/>
      </td>
     </tr>
    </tbody>
   </table>

2 个答案:

答案 0 :(得分:0)

我会发现表格中的所有单元格都包含'$'或'Nog'

cells = browser.find_elements_by_xpath("//td[contains(text(), '$') or contains(text(), 'Nog')]")

for cell in cells:
    print cell.text

答案 1 :(得分:0)

我认为这应该对您有用,因为它会找到所有td元素,其中文本包含$的数字和Nog的Nog编号:

number_elements = browser.find_elements_by_xpath("//td[contains(text(), '$')]")

nog_number_elements = browser.find_elements_by_xpath("//td[contains(text(), 'Nog')]")

for number_element in number_elements:
    print(number_element.text)
for nog_number_element in nog_number_elements:
    print(nog_number_element.text)

请注意,这是一种硒方法,我根本就没有使用BeautifulSoup。