Python请求不会通过身份验证

时间:2020-05-01 22:37:15

标签: python web-scraping request

由于某种原因,我的脚本无法通过网站进行身份验证,并在登录页面后面抓取了内容

ERROR: MethodError: no method matching push!(::Type{Set{Array{Float64,1}}}, ::Array{Float64,1})
Closest candidates are:
  push!(::Any, ::Any, ::Any) at abstractarray.jl:2158
  push!(::Any, ::Any, ::Any, ::Any...) at abstractarray.jl:2159
  push!(::Array{Any,1}, ::Any) at array.jl:919
  ...
Stacktrace:
 [1] push!(::Type{T} where T, ::Array{Float64,1}, ::Array{Float64,1}) at .\abstractarray.jl:2158
 [2] push!(::Type{T} where T, ::Array{Float64,1}, ::Array{Float64,1}, ::Array{Float64,1}, ::Vararg{Array{Float64,1},N} where N) at .\abstractarray.jl:2159
 [3] top-level scope at none:0

并且:

import requests
from lxml import html


USERNAME = "bla@gmail.com"
PASSWORD = "somePass999"

LOGIN_URL = "https://login.com/incidents"
URL = "https://login.com/secretstuff"


def main():
    session_requests = requests.session()

    # Get login csrf token
    result = session_requests.get(LOGIN_URL)
    tree = html.fromstring(result.text)
    authenticity_token = list(set(tree.xpath("//input[@name='csrf']/@value")))[0]

    # Create payload
    payload = {
        "username": USERNAME, 
        "password": PASSWORD, 
        "csrf": authenticity_token
    }

    # Perform login
    result = session_requests.post(LOGIN_URL, data = payload, headers = dict(referer = LOGIN_URL))

    # Scrape url
    result = session_requests.get(URL, headers = dict(referer = URL))
    tree = html.fromstring(result.content)
    dump = tree.xpath("//div[@class='description-wrapper']")

    print(dump)


if __name__ == '__main__':
    main()

由于某种原因,它将仅打印登录页面门户。我不确定我还想念什么?

1 个答案:

答案 0 :(得分:0)

  • 目标站点由于某种原因可能无法对您进行身份验证
    • 可能是因为POST数据与网站格式不符
    • 或者因为该网站(我认为)在后端使用Flask或Django
  • 您可能太早加载了最后一块,请尝试在开始抓取之前增加一些延迟。也许可以有所作为