我可以使用以下代码登录到Github。但是,当我在其他网站上尝试相同的代码时,它仍然保留在登录页面上,无法登录。我有想念吗?
class GithubSpider(scrapy.Spider):
name = 'test'
start_urls = ['https://github.com/login']
def parse(self, response):
token = response.css('form input::attr(value)').extract_first()
return FormRequest.from_response(response,
formdata=
{
'csrf_token': token,
'login': '*******',
'password': '*******'
},
callback=self.start_scraping)
def start_scraping(self, response):
open_in_browser(response)
print('yes')
class AirlineSpider(scrapy.Spider):
name = 'test'
allowed_domains = ['hawaiianairlines.com']
start_urls = ['https://www.hawaiianairlines.com/my-account/login/']
def parse(self, response):
token = response.css('form input::attr(value)').extract_first()
return FormRequest.from_response(response,
formdata=
{
'csrf_token': token,
'UserName': '*********',
'Password': '*********'
},
callback=self.start_scraping)
def start_scraping(self, response):
open_in_browser(response)
print('yes')
答案 0 :(得分:1)
查看您提供的网站的来源,您可以在此处找到表单字段:
[Sat Sep 14 00:04:31.016107 2019] [wsgi:error] [pid 4693:tid 140670017648384] [remote 66.249.66.44:60013] File "/var/www/xxxx/__init__.py", line 1612, in showArticleTitlePage
[Sat Sep 14 00:04:31.016119 2019] [wsgi:error] [pid 4693:tid 140670017648384] [remote 66.249.66.44:60013] language = session['locale'])
[Sat Sep 14 00:04:31.016130 2019] [wsgi:error] [pid 4693:tid 140670017648384] [remote 66.249.66.44:60013] File "/var/www/cardsrealm/SQL.py", line 1376, in get_articles_before
[Sat Sep 14 00:04:31.016146 2019] [wsgi:error] [pid 4693:tid 140670017648384] [remote 66.249.66.44:60013] cursor.execute(query, t)
[Sat Sep 14 00:04:31.016176 2019] [wsgi:error] [pid 4693:tid 140670017648384] [remote 66.249.66.44:60013] File "/var/www/xxxx/venv/lib/python3.6/site-packages/pymysql/cursors.py", line 170, in execute
[Sat Sep 14 00:04:31.016188 2019] [wsgi:error] [pid 4693:tid 140670017648384] [remote 66.249.66.44:60013] result = self._query(query)
[Sat Sep 14 00:04:31.016198 2019] [wsgi:error] [pid 4693:tid 140670017648384] [remote 66.249.66.44:60013] File "/var/www/xxxx/venv/lib/python3.6/site-packages/pymysql/cursors.py", line 328, in _query
[Sat Sep 14 00:04:31.016209 2019] [wsgi:error] [pid 4693:tid 140670017648384] [remote 66.249.66.44:60013] conn.query(q)
[Sat Sep 14 00:04:31.016219 2019] [wsgi:error] [pid 4693:tid 140670017648384] [remote 66.249.66.44:60013] File "/var/www/xxxx/venv/lib/python3.6/site-packages/pymysql/connections.py", line 517, in query
[Sat Sep 14 00:04:31.016230 2019] [wsgi:error] [pid 4693:tid 140670017648384] [remote 66.249.66.44:60013] self._affected_rows = self._read_query_result(unbuffered=unbuffered)
[Sat Sep 14 00:04:31.016241 2019] [wsgi:error] [pid 4693:tid 140670017648384] [remote 66.249.66.44:60013] File "/var/www/xxxx/venv/lib/python3.6/site-packages/pymysql/connections.py", line 732, in _read_query_result
[Sat Sep 14 00:04:31.016251 2019] [wsgi:error] [pid 4693:tid 140670017648384] [remote 66.249.66.44:60013] result.read()
[Sat Sep 14 00:04:31.016262 2019] [wsgi:error] [pid 4693:tid 140670017648384] [remote 66.249.66.44:60013] File "/var/www/xxxx/venv/lib/python3.6/site-packages/pymysql/connections.py", line 1075, in read
[Sat Sep 14 00:04:31.016273 2019] [wsgi:error] [pid 4693:tid 140670017648384] [remote 66.249.66.44:60013] first_packet = self.connection._read_packet()
[Sat Sep 14 00:04:31.016297 2019] [wsgi:error] [pid 4693:tid 140670017648384] [remote 66.249.66.44:60013] File "/var/www/xxxx/venv/lib/python3.6/site-packages/pymysql/connections.py", line 674, in _read_packet
[Sat Sep 14 00:04:31.016310 2019] [wsgi:error] [pid 4693:tid 140670017648384] [remote 66.249.66.44:60013] recv_data = self._read_bytes(bytes_to_read)
[Sat Sep 14 00:04:31.016321 2019] [wsgi:error] [pid 4693:tid 140670017648384] [remote 66.249.66.44:60013] File "/var/www/xxxx/venv/lib/python3.6/site-packages/pymysql/connections.py", line 688, in _read_bytes
[Sat Sep 14 00:04:31.016332 2019] [wsgi:error] [pid 4693:tid 140670017648384] [remote 66.249.66.44:60013] self._sock.settimeout(self._read_timeout)
[Sat Sep 14 00:04:31.016342 2019] [wsgi:error] [pid 4693:tid 140670017648384] [remote 66.249.66.44:60013] AttributeError: 'NoneType' object has no attribute 'settimeout'
这尤其是self.conn = self.create_connection()
cursor = self.conn.cursor()
query = ''' SELECT MY QUERY '''.format(self.xxxColumn, self.xxxColumn)
cursor.execute(query, t)
result = cursor.fetchall()
cursor.close()
self.close_connection()
部分告诉您,该表单将永远不会通过浏览器通过常规方法提交给“ href”目标,并且(除非您在未启用JavaScript的情况下对表单进行了测试,并且它可以正常工作),该网站可能只能在启用了javascript的浏览器中使用。
然后通过XHR找出实际提交表单的方式,您需要在站点的javascript代码中查找并分解connect_timeout = 200
read_timeout = 200
write_timeout = 200
max_allowed_packet = 1073741824
函数,并在您的代码中进行仿真。
另一种选择是使用带有scrapy的javascript引擎,该引擎可以为您处理脚本,但是其缺点是资源消耗大并且可能难以设置。