有关我为何无法登录的任何想法?我一直在尝试使用相同的方法通过Facebook和linkedin登录;没有成功。我使用的是最新版本的Scrapy。我正试着去找消息'测试,但我知道它不起作用,因为它将我重定向回登录页面......在LinkedIn上也是如此。
import scrapy
from scrapy.spiders import BaseSpider
from scrapy.http import FormRequest
from scrapy.contrib.spiders import CrawlSpider
from linkedIn.items import LinkedinItem
from scrapy.http import Request
#from spider.settings import JsonWriterPipeline
class MySpider (CrawlSpider):
name = 'fb'
allowed_domains = ['facebook.com']
start_urls = ['https://login.facebook.com/login.php']
def parse(self, response):
return [FormRequest.from_response(response,
formname='login_form',
formdata={'email':'my_email@example.com',
'pass':'test!'},
callback=self.after_login)]
def after_login(self, response):
# check login succeed before going on
if "the password you entered is incorrect" in response.body:
self.log("\n\n\n\nLogin failed\n\n\n\n", level=self.log())
return
else:
self.log("\n\n\n Login was successful!!!\n\n\n")
self.log(response.body)
return Request(url="https://facebook.com/messages",
callback=self.parse_items)
def parse_items(self,response):
hxs = scrapy.Selector(response)
titles =hxs.xpath("//title")
items = []
for title in titles:
item = LinkedinItem()
item['friendName']= titles.xpath("//title").extract()
#item['numberOffriends']= titles.select("some path here").extract().pop()
items.append(item)
return (items)
答案 0 :(得分:1)
Facebook和Linkedin都使用CSRF令牌。您必须首先使用登录表单获取页面,然后解析HTML并获取CSRF令牌,然后最后使用用户名/密码和CSRF令牌发出POST请求。