使用scrapy抓取数据

时间:2014-03-27 07:45:56

标签: python html web-crawler scrapy

当我打开html页面废弃数据时,我想废弃此链接中的数据http://money.moneygram.com.au/(链接可以是(https://www.moneygram.com/wps/portal/moneygramonline/home/estimator?LC=en-GB)),即本页提到的汇率我是我获得第一滴选择按钮的选项是aud(austallian货币)到usd(美元),但如何在第一个选项中选择inr(印度ruppee)。当我选择并使用此URL时,它默认选择aud(australlian货币)。我正在使用的代码是......

 from __future__ import absolute_import
    #import __init__
 from scrapy.spider import BaseSpider
 from scrapy.selector import HtmlXPathSelector
 import MySQLdb

class DmozSpider(BaseSpider):
    name = "moneygram"
    allowed_domains = ["moneygram.com"]
    start_urls = ["http://money.moneygram.com.au/"]

    def parse(self, response):
        filename = response.url.split("/")[-2]
        open(filename, 'wb').write(response.body)
        hxs = HtmlXPathSelector(response)

和我想要选择的htmlpart是..

<div class="firstSelector">
     <select id="FromCurrency_dropDown" name="FromCurrency_dropDown" style="width:100%">
     <option value="AED">AED (UAE Dirham)</option>
     <option value="ARS">ARS (Argentine Peso)</option>
     <option selected="selected" value="AUD">AUD (Australian Dollar)</option>  
     <option value="BGN">BGN (Bulgarian Lev)</option>
     <option value="BND">BND (Brunei Dollar)</option>
     <option value="BRL">BRL (Brazilian Real)</option>
     <option value="CAD">CAD (Canadian Dollar)</option>
     <option value="CHF">CHF (Swiss Franc)</option>
     <option value="CLP">CLP (Chilean Peso)</option>
     <option value="CNH">CNH (Chinese Renminbi Off-Shore)</option>
     <option value="CNY">CNY (Chinese Yuan)</option>
     <option value="CZK">CZK (Czech Koruna)</option>
     <option value="DKK">DKK (Danish Kroner)</option>
     <option value="EGP">EGP (Egyptian Pound)</option>
     <option value="EUR">EUR (Euro)</option>
     <option value="FJD">FJD (Fiji Dollar)</option>
     <option value="GBP">GBP (British Pound)</option>
     <option value="HKD">HKD (Hong Kong Dollar)</option>
     <option value="HUF">HUF (Hungarian Forint)</option>
     <option value="IDR">IDR (Indonesian Rupiah)</option>
     <option value="ILS">ILS (Israeli New Shekel)</option>
     <option value="INR">INR (Indian Rupee)</option> ////////////"i want this to be selected"///////
     <option value="ISK">ISK (Icelandic Krona)</option>
     <option value="JPY">JPY (Japanese Yen)</option>
     <option value="KRW">KRW (Korean Won)</option>
     <option value="KWD">KWD (Kuwaiti Dinar)</option>
     <option value="LKR">LKR (Sri Lanka Rupee)</option>
     <option value="MAD">MAD (Moroccan Dirham)</option>
     <option value="MGA">MGA (Malagasy Ariary)</option>
     <option value="MXN">MXN (Mexican Peso)</option>
     <option value="MYR">MYR (Malaysian Ringgit)</option>
     <option value="NOK">NOK (Norway Kroner)</option>
     <option value="NZD">NZD (New Zealand Dollar)</option>
     <option value="OMR">OMR (Omani Rial)</option>
     <option value="PEN">PEN (Peruvian Nuevo Sol)</option>
     <option value="PGK">PGK (Papua New Guinea Kina)</option>

</div>

1 个答案:

答案 0 :(得分:0)

这是一个ajax问题。 js脚本将参数发布到服务器,然后服务器返回数据。

使用Chrome工具,可以浏览帖子数据的详细网址。

详细网址为“http://money.moneygram.com.au/forex-tools/currency-converter-widget-part”。

帖子表单参数: “FromCurrency = AED&安培; ToCurrency = VND&安培; FromCurrency_dropDown = AED&安培; ToCurrency_dropDown = VND&安培; FromAmount = 2561&安培; ToAmount =安培; X-请求-随着= XMLHttpRequest的”。

所以你可以使用scrapy POST参数到这个url来获取html数据,并解析得到你想要的东西。