从这个表html python获取数据

时间:2017-04-26 12:20:46

标签: python html selenium web driver

我想从此表中提取显示货币汇率的数据。

访问https://www.iceplc.com/travel-money/exchange-rates

我尝试过这种方法,但它不起作用

      table_id = driver.find_element(By.ID, 
     'data_configuration_feeds_ct_fields_body0')
      rows = table_id.find_elements(By.TAG_NAME, "tr") # get all of the 
      rows in the table
      for row in rows:

      col = row.find_elements(By.TAG_NAME, "td")[1] #note: index start from 
      0, 1 is col 2
      print(col.text) #prints text from the element

这是html

    </td>

                    <td valign="top" class="OuterProdCell test">

                                <table class="ProductCell">
                                    <tr>
                                    <td class="rateCountryFlag">
                                        <ul id="prodImages">
                                            <li>
                                                <a href="/travel-money/buy-chilean-peso" title="Buy Chilean Peso" class="flags chilean-peso" ></a>
                                            </li>
                                        </ul>
                                    </td>

                                    <td class="ratesName">
                                    <a href="/travel-money/buy-chilean-peso" title="Buy Chilean Peso">
                                    Chilean Peso</a>
                                    </td>

                                    <td class="ratesClass">
                                    <a  class="orderText" href="/travel-money/buy-chilean-peso" title="Buy Chilean Peso">774.8540</a>
                                    </td>
                                    <td class="orderNow">                                           
                                        <ul id="prodImages">
                                            <li>
                                                <a class="reserveNow" href="/travel-money/buy-chilean-peso" title="Buy Chilean Peso">Order<br/>now</a>
                                            </li>
                                            <li>
                                                <a href="/travel-money/buy-chilean-peso" title="Buy Chilean Peso" class="flags arrowGreen" ></a>
                                            </li>
                                        </ul>
                                    </td>
                                    </tr>
                                </table>

我也尝试过python selenium方法但是我可以得到每个的货币汇率而不是名字

             driver.get("https://www.iceplc.com/travel-money/exchange-
             rates")
             rates = driver.find_elements_by_class_name("ratesClass")

             for rate in rates:
             print(rate.text)

2 个答案:

答案 0 :(得分:1)

如果您只是想获得汇率,那么最好使用api,请参阅this question。网页抓取使您容易受到目标网页更改的影响而破坏您的代码。

如果抓取是你的目标,你只需要重复使用你的selenium方法,但是要搜索“ratesName”类。

例如:

driver.get("https://www.iceplc.com/travel-money/exchange-rates")
rates.append( (driver.find_elements_by_class_name("ratesName"), driver.find_elements_by_class_name("ratesClass")) )

for rate in rates:
print( "Name: %s, Rate: %s" % (rate[0], rate[1]) )

答案 1 :(得分:1)

通过分析页面的结构,很明显你必须逐行分析,你必须选择你感兴趣的列组件。

对于使用find_element_by_tag_namefind_element_by_class_name

提取您感兴趣的两个元素的每一行

(文档http://selenium-python.readthedocs.io/locating-elements.html

driver.get("https://www.iceplc.com/travel-money/exchange-rates")
rates=driver.find_elements_by_tag_name('tr')
for i in rates:
        print i.find_element_by_class_name('ratesName').text, i.find_element_by_class_name('ratesClass').text

输出是:

US - Dollar 1.2536
Croatia - Kuna 8.3997
Canada - Dollar 1.7006
Australia - Dollar 1.6647
Euro - 1.1469
...