我很难在网页上抓取数据。 我经过大量的Youtube和Stackovrflow发布,但对我来说似乎都很棘手。
要访问登录后的信息,我尝试了以下代码。
import scrapy
import requests
class ExampleSpider(scrapy.Spider):
name = 'example'
start_urls = ['https://nestiolistings.com']
LOGIN_URL = 'https://nestiolistings.com/login/?next=/'
URL = 'https://nestiolistings.com/company/listings/'
def parse(self, response):
my_data = {'user': 'MYUSERNAME', 'pass': 'MYPASSWORD', 'plant': '1','login':'Login'}
yield scrapy.FormRequest.from_response(
response= response,
formdata= my_data,
callback=self.open_page,
)
def open_page(self, response):
yield scrapy.Request(
url=self.URL,
callback=self.scrap_page
)
def scrap_page(self, response):
print (response.body)
然后,我编写了如下代码
from bs4 import BeautifulSoup
import requests
source = requests.get('https://nestiolistings.com/listings/?listing_type=10&min_price=2500&max_price=3000').text
soup = BeautifulSoup(source, 'lxml')
address = soup.find_all('span', class_ = 'building-title-content')
print(address)
此信息。
任何人都可以让我知道如何获取“登录”背后的数据吗?