使用请求无法选择和上传pdf文件

时间:2020-03-08 14:38:11

标签: python python-3.x web-scraping python-requests

我正在尝试使用http请求发布在python中创建脚本,以将该 pdf 文件上传到网页中。我已经尝试过以下操作,但是很遗憾,脚本无法上传文件。

这是log-in链接。这是用户名 SmthShift_123 和密码 7/B!yzRd8wuK!N2 供您考虑。现在转到this page,然后单击最后一个标签Anhang,您将在其中找到上传选项。

为了让您形象化- this 是该页面的外观。

这是我到目前为止的尝试:

import requests
from bs4 import BeautifulSoup

login_url = 'https://jobs.commerzbank.com/index.php?ac=login'
application_link = 'https://jobs.commerzbank.com/index.php?ac=application&jobad_id=30670'
target_link = 'https://jobs.commerzbank.com/index.php?ac=application&page=6'
upload_link = 'https://jobs.commerzbank.com/inc/candidate_attachments.php'


with requests.Session() as s:
    s.headers['User-Agent'] = 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.132 Safari/537.36'
    res = s.get(login_url)
    sauce = BeautifulSoup(res.text,"lxml")
    elem = {i['name']:i.get('value','') for i in sauce.select('input[name]')}
    elem['username'] = 'SmthShift_123'
    elem['password'] = '7/B!yzRd8wuK!N2'

    s.post(login_url,data=elem)
    s.get(application_link)
    resp = s.get(target_link)

    soup = BeautifulSoup(resp.text,"lxml")
    payload = {i['name']:i.get('value','') for i in soup.select('input[name]')}
    payload['form-control'] = 'Anschreiben'
    payload['upload'] = 'Datei hochladen'
    payload['save'] = ''

    files = {
        'searchButton': open('CV.pdf','rb')
    }
    s.post(upload_link,files=files,data=payload)

执行上述脚本时,它既不保存该文件也不引发任何错误。

我也这样尝试过(仅使用硒进行上传),但是脚本也无法选择并上传文件:

s.post(login_url,data=elem)
s.get(application_link)
resp = s.get(target_link)

driver = webdriver.Chrome()
driver.get(resp.url)
driver.delete_all_cookies()

for cookie in s.cookies.items():
    driver.add_cookie({"name": cookie[0], "value": cookie[1]})

driver.get(resp.url)

select = Select(WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.CSS_SELECTOR, "select#upload_category"))))
select.select_by_visible_text("Lebenslauf")
WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.CSS_SELECTOR, "input#upload_file"))).send_keys("C://Users/WCS/Desktop/CV.pdf")

如何使用请求选择和上传pdf文件?

2 个答案:

答案 0 :(得分:1)

我可以使用硒上传它。这个网站很棘手。它有一个隐藏的Message must not be null,仅在将按钮悬停以进行上传时才会显示。

尝试一下:

input

希望,这也将为您工作。祝你好运!

答案 1 :(得分:1)

解决方案1 ​​

js file具有fileno功能,用于上传附件文件。

find(name, attrs, recursive, text, **kwargs)-匹配并返回第一个对象。

例如。

attachFfwAjaxUpload()

解决方案2-通过硒上传文件

import requests
from bs4 import BeautifulSoup

login_url = 'https://jobs.commerzbank.com/index.php?ac=login'
application_link = 'https://jobs.commerzbank.com/index.php?ac=application&jobad_id=30670'
target_link = 'https://jobs.commerzbank.com/index.php?ac=application&page=6'
upload_link = 'https://jobs.commerzbank.com/inc/candidate_attachments.php'

with requests.Session() as sessionObj:
    sessionObj.headers['User-Agent'] = 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.132 Safari/537.36'
    res = sessionObj.get(login_url)
    sauce = BeautifulSoup(res.text,"lxml")
    elem = {i['name']:i.get('value','') for i in sauce.select('input[name]')}
    elem['username'] = 'SmthShift_123'
    elem['password'] = '7/B!yzRd8wuK!N2'

    sessionObj.post(login_url,data=elem)
    sessionObj.get(application_link)
    resp = sessionObj.get(target_link)
    soup = BeautifulSoup(resp.text,"lxml")

    # get attachment form tag object
    form = soup.find("form", attrs={'action':'index.php'})
    payload = dict()

    # set upload category
    # you have four category option, values are 2, 1, 4 and 12, 
    # select one value option
    payload['category'] = '12'
    payload['application_token'] = form.find('input', 
                        attrs={'name':'application_token'}).get('value','')
    payload['action'] = 'upload'

    # you can see upload file attachment attachFfwAjaxUpload() function in 
    # frontend.min.js file in browser source tab between 38878 to 38903 lines
    print(payload)

    with open('CV.pdf', 'rb') as f:

        file = {"attachment": f}
        atteachment_response = sessionObj.post(upload_link, files=file, data=payload)

        # print post request response status code 
        print(atteachment_response.status_code)
        print(atteachment_response.text)