Scrapy:从特定div中提取信息

时间:2014-11-22 18:53:28

标签: python web scrapy web-crawler

这是我的代码:

def parse(self, response):
    return scrapy.FormRequest.from_response(
        response,
        formdata={'uuid': 'user', 'password': 'cons'},
        callback=self.after_login
    )

def after_login(self, response):
    # check login succeed before going on
    if "authentication failed" in response.body:
        self.log("Login failed", level=log.ERROR)
else:
    self.log('LOGGED')  
    sel = Selector(response)
    sel.xpath("//div[@class='amount cSpringGreen']/text()").extract()

但是当我执行它时没有出现。它的工作方式是在登录网站后显示该信息。 HTML代码就是这个。

<h1 class="hide2"></h1>
<div id="vodaint-local" class="wrapper rhomb">
<div class="spring">
<script type="text/javascript">
<div class="mod mod-selectsizeheader vodaint-local">
<div id="mivf" class="content">
<div id="navigation-breadcrumb" class="belt">
<div class="belt">
<div class="miVFR">
<div class="mainMiVF cf">
<div class="headerMiVF cf">
<div class="bodyMiVF cf">
<div class="mainNav" style="height: auto;">
<div class="mainContent withHeader" style="height: 585px;">
<style>
<div id="contentSpinner" style="margin-bottom: 432px; display: none;">
<script>
<section>
<script type="text/javascript">
<div class="mainContentContainer home">
<div class="headerBanner">
<script type="text/javascript">
<div class="lineContainer ">
<h6 class="topHeading prepago"> </h6>
<div class="columnGroup cf">
<div class="column newPromo">
<div class="columnContent">
<p class="cTitle"> Tu saldo</p>
--THIS IS THE INFO I WANT TO SHOW--
<div class="amount cSpringGreen">
0,
<span> 96</span>
€
</div>

谢谢!

编辑:在这个pastebin中你可以找到整个HTML文件http://pastebin.com/B2HpACCw登录后我想要显示的东西是“0'96”,谢谢!

1 个答案:

答案 0 :(得分:0)

将其存储到项目编辑中; items.py

class TestItem(scrapy.Item):
    text= scrapy.Field()

然后在蜘蛛中

item=TestItem()
item['text'] = sel.xpath("//div[@class='amount cSpringGreen']/text()").extract()
print item['text']