如何编写有2列的xPath

时间:2016-05-15 09:27:31

标签: python xpath scrapy

我正在使用scrapy刮去内容物。 我试了很多如何抓住这个有2列的网站。 网站代码:

<form >
  <input type"email" ng-model="user.email" />
  <input type"password" ng-model="user.password" />
  <button ng-click="login">login</button>
</form>

$scope.user = {};
$scope.login= function () {
   $http({
      url: 'http://localhost:3000/',
      method: 'POST',
      data: {
        email: user.email,
        password:user.password
      }
   });
});

我的代码:

router.post('/', function (req, res, next) {
  console.log(req.body);
  //custom authentication or use passport.js
});

我的代码错了吗?我似乎无法运行它。 spirder刚关闭。

1 个答案:

答案 0 :(得分:0)

您可以使用以下蜘蛛从http://www.bebizzy.com/the-bebizzy-blog/

中删除所有博客帖子
import scrapy
from scrapy.linkextractors import LinkExtractor
from scrapy.spiders import CrawlSpider, Rule

from check_site.items import YourItem


class StackSpider(CrawlSpider):
    name = 'stack'
    allowed_domains = ['bebizzy.com']
    start_urls = ['http://www.bebizzy.com/the-bebizzy-blog/']

    rules = (
        Rule(LinkExtractor(restrict_css='a.more-link'), callback='parse_item', follow=True),
        Rule(LinkExtractor(restrict_css='div.pagination>div>a'), callback='parse', follow=True),
    )

    def parse_item(self, response):
        self.logger.info(response.url)
        i = YourItem()
        #TODO: fill your item
        #i['title'] = ...
        return i

蜘蛛收到的日志:

2016-05-15 21:45:18 [scrapy] INFO: Enabled downloader middlewares: HttpAuthMiddleware, DownloadTimeoutMiddleware, UserAgentMiddleware, RetryMiddleware, DefaultHeadersMiddleware, MetaRefreshMiddleware, HttpCompressionMiddleware, RedirectMiddleware, CookiesMiddleware, ChunkedTransferMiddleware, DownloaderStats
2016-05-15 21:45:18 [scrapy] INFO: Enabled spider middlewares: HttpErrorMiddleware, OffsiteMiddleware, RefererMiddleware, UrlLengthMiddleware, DepthMiddleware
2016-05-15 21:45:18 [scrapy] INFO: Enabled item pipelines: 
2016-05-15 21:45:18 [scrapy] INFO: Spider opened
2016-05-15 21:45:18 [scrapy] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2016-05-15 21:45:26 [stack] INFO: http://www.bebizzy.com/2016/04/12/learn-smartphone-features-spring/
2016-05-15 21:45:26 [stack] INFO: http://www.bebizzy.com/2016/03/04/why-you-need-a-responsive-website/
2016-05-15 21:45:26 [stack] INFO: http://www.bebizzy.com/2016/03/14/samsung-galaxy-s7-s7-edgereview/
2016-05-15 21:45:26 [stack] INFO: http://www.bebizzy.com/2016/03/10/marketing-your-business-online/
2016-05-15 21:45:26 [stack] INFO: http://www.bebizzy.com/2016/03/16/demographics-of-social-media-users/
2016-05-15 21:45:27 [stack] INFO: http://www.bebizzy.com/2016/03/02/websites-launched-creekside-farmstands-and-mandan-farmers-market/
2016-05-15 21:45:27 [stack] INFO: http://www.bebizzy.com/2016/03/01/what-is-wordpress/
2016-05-15 21:45:32 [stack] INFO: http://www.bebizzy.com/2016/03/18/mobile-friendly-sites-increase-seo-rank-google/
2016-05-15 21:45:33 [stack] INFO: http://www.bebizzy.com/2016/02/21/manage-multiple-wordpress-installations-with-managewp/
2016-05-15 21:45:33 [stack] INFO: http://www.bebizzy.com/2016/03/24/buy-laptop-tablet-2/
2016-05-15 21:45:33 [stack] INFO: http://www.bebizzy.com/2016/03/30/customizing-android-smartphone-screens/
2016-05-15 21:45:34 [stack] INFO: http://www.bebizzy.com/2015/09/18/vzwbuzz-recap-show-mobile-music/
2016-05-15 21:45:34 [stack] INFO: http://www.bebizzy.com/2015/09/03/choosing-a-new-logo/
2016-05-15 21:45:37 [stack] INFO: http://www.bebizzy.com/2015/10/16/best-android-apps-for-your-ghost-hunting-adventure/
2016-05-15 21:45:38 [stack] INFO: http://www.bebizzy.com/2015/10/21/samsung-note-5/
2016-05-15 21:45:39 [stack] INFO: http://www.bebizzy.com/2015/10/22/ue-roll-bluetooth-speaker/
2016-05-15 21:45:39 [stack] INFO: http://www.bebizzy.com/2015/11/17/best-apps-for-the-upcoming-election/
2016-05-15 21:45:39 [stack] INFO: http://www.bebizzy.com/2015/12/07/best-star-wars-android-apps/
2016-05-15 21:45:40 [stack] INFO: http://www.bebizzy.com/2016/02/19/using-microsoft-office-on-your-mobile-device/
2016-05-15 21:45:41 [stack] INFO: http://www.bebizzy.com/2016/01/08/best-android-business-apps-for-2016/
2016-05-15 21:45:41 [stack] INFO: http://www.bebizzy.com/2015/09/01/best-games-for-your-android-phone-essentialapps/
2016-05-15 21:45:44 [stack] INFO: http://www.bebizzy.com/2015/03/12/android-apps-for-your-spring-to-do-list/
2016-05-15 21:45:44 [stack] INFO: http://www.bebizzy.com/2015/02/02/mobile-technology-for-a-better-valentines-day/
2016-05-15 21:45:45 [stack] INFO: http://www.bebizzy.com/2015/03/18/logitech-k480-bluetooth-keyboard/
2016-05-15 21:45:45 [stack] INFO: http://www.bebizzy.com/2015/03/01/the-samsung-s6-and-the-htc-one-m9/
2016-05-15 21:45:47 [stack] INFO: http://www.bebizzy.com/2015/07/07/i-had-switchersremorse-once-once/
2016-05-15 21:45:47 [stack] INFO: http://www.bebizzy.com/2015/04/10/best-android-fishing-apps/
2016-05-15 21:45:48 [stack] INFO: http://www.bebizzy.com/2015/05/17/htcs-new-flagship-the-htc-one-m9/
2016-05-15 21:45:48 [stack] INFO: http://www.bebizzy.com/2015/07/28/windows10-twitter-stream/
2016-05-15 21:45:49 [stack] INFO: http://www.bebizzy.com/2015/01/06/my-3-words/

只需在#TODO:评论

之后添加项目填充逻辑