Scrapy Web Scrapping和Facebook

时间:2015-08-23 05:09:29

标签: python python-2.7 web-scraping scrapy

有关我为何无法登录的任何想法?我一直在尝试使用相同的方法通过Facebook和linkedin登录;没有成功。我使用的是最新版本的Scrapy。我正试着去找消息'测试,但我知道它不起作用,因为它将我重定向回登录页面......在LinkedIn上也是如此。

import scrapy
from scrapy.spiders import BaseSpider
from scrapy.http import FormRequest
from scrapy.contrib.spiders import CrawlSpider
from linkedIn.items import LinkedinItem
from scrapy.http import Request
#from spider.settings import JsonWriterPipeline

class MySpider (CrawlSpider):
    name = 'fb'
    allowed_domains = ['facebook.com']
    start_urls = ['https://login.facebook.com/login.php']

def parse(self, response):
    return [FormRequest.from_response(response,
                formname='login_form',
                formdata={'email':'my_email@example.com',
                          'pass':'test!'},
                callback=self.after_login)]
def after_login(self, response):
    # check login succeed before going on
    if "the password you entered is incorrect" in response.body:
        self.log("\n\n\n\nLogin failed\n\n\n\n", level=self.log())
        return
    else:
        self.log("\n\n\n Login was successful!!!\n\n\n")
        self.log(response.body)
        return Request(url="https://facebook.com/messages",
               callback=self.parse_items)

def parse_items(self,response):
    hxs = scrapy.Selector(response)
    titles =hxs.xpath("//title")
    items = []
    for title in titles:
        item = LinkedinItem()
        item['friendName']= titles.xpath("//title").extract()
        #item['numberOffriends']= titles.select("some path here").extract().pop()    
        items.append(item)
    return (items)

1 个答案:

答案 0 :(得分:1)

Facebook和Linkedin都使用CSRF令牌。您必须首先使用登录表单获取页面,然后解析HTML并获取CSRF令牌,然后最后使用用户名/密码和CSRF令牌发出POST请求。