Question

由于某种原因，我的脚本无法通过网站进行身份验证，并在登录页面后面抓取了内容

ERROR: MethodError: no method matching push!(::Type{Set{Array{Float64,1}}}, ::Array{Float64,1})
Closest candidates are:
  push!(::Any, ::Any, ::Any) at abstractarray.jl:2158
  push!(::Any, ::Any, ::Any, ::Any...) at abstractarray.jl:2159
  push!(::Array{Any,1}, ::Any) at array.jl:919
  ...
Stacktrace:
 [1] push!(::Type{T} where T, ::Array{Float64,1}, ::Array{Float64,1}) at .\abstractarray.jl:2158
 [2] push!(::Type{T} where T, ::Array{Float64,1}, ::Array{Float64,1}, ::Array{Float64,1}, ::Vararg{Array{Float64,1},N} where N) at .\abstractarray.jl:2159
 [3] top-level scope at none:0

并且：

import requests
from lxml import html


USERNAME = "bla@gmail.com"
PASSWORD = "somePass999"

LOGIN_URL = "https://login.com/incidents"
URL = "https://login.com/secretstuff"


def main():
    session_requests = requests.session()

    # Get login csrf token
    result = session_requests.get(LOGIN_URL)
    tree = html.fromstring(result.text)
    authenticity_token = list(set(tree.xpath("//input[@name='csrf']/@value")))[0]

    # Create payload
    payload = {
        "username": USERNAME, 
        "password": PASSWORD, 
        "csrf": authenticity_token
    }

    # Perform login
    result = session_requests.post(LOGIN_URL, data = payload, headers = dict(referer = LOGIN_URL))

    # Scrape url
    result = session_requests.get(URL, headers = dict(referer = URL))
    tree = html.fromstring(result.content)
    dump = tree.xpath("//div[@class='description-wrapper']")

    print(dump)


if __name__ == '__main__':
    main()

由于某种原因，它将仅打印登录页面门户。我不确定我还想念什么？

Answer 1

目标站点由于某种原因可能无法对您进行身份验证
- 可能是因为POST数据与网站格式不符
- 或者因为该网站（我认为）在后端使用Flask或Django
您可能太早加载了最后一块，请尝试在开始抓取之前增加一些延迟。也许可以有所作为

Python请求不会通过身份验证

1 个答案: