POST请求包括使用python的文件

时间:2018-02-08 13:10:51

标签: python html python-requests multipartform-data

我是网络编程的初学者,很抱歉,如果这是一个非常基本的东西,但无法找到与stackoverflow中的问题一样具体的东西。 所以我有很多文本文件(10k),我需要上传到这个网站https://rostlab.org/services/nlsdb/,然后点击"评估NES / NLS"。这会触发SQL查询并返回表格形式的一些信息。然后我需要点击" CSV"按钮将文件下载到我的电脑。 当然我不想手动上传每个文件,所以我试图用Python生成请求但是不能完成它,我甚至没有到达表的位置来自初始网站的回复,因此下载CSV是我尚未遇到的挑战:

import requests

url = 'https://rostlab.org/services/nlsdb/query'
files = {'file-upload': ('some.txt', open('C:\\some.txt', 'rb'), 'text/plain')}
data = {'_token':'', 'input-data':'', 'query-sig2':''}

r = requests.post('https://rostlab.org/services/nlsdb/query', files=files, data=data)

作为回复,我收到大量文本,我可以从HTML中恢复错误代码500,所以我肯定在这里做错了我不能看到什么。我提交文件时来自网站的POST请求如下所示:

**General**
  Request URL:https://rostlab.org/services/nlsdb/query
  Request Method:POST
  Status Code:200 OK
  Remote Address:131.159.28.73:443
  Referrer Policy:no-referrer-when-downgrade

Response Headers
  Cache-Control:no-cache, private
  Connection:Keep-Alive
  Content-Encoding:gzip
  Content-Length:2231
  Content-Type:text/html; charset=UTF-8
  Date:Thu, 08 Feb 2018 12:39:30 GMT
  Keep-Alive:timeout=5, max=100
  Server:Apache
  Set-Cookie:nlsdb_session=eyJpdiI6IjZMRk03ZjRCNjBmU1JcL3Y0Vko4ZHFRPT0iLCJ2YWx1ZSI6Ikh2bHcyZHBuN25nNmx1QnRoOFlPMWhWU0RYdUpEdnAwbGtySWgwbDlDVElHZmRyNlBMeEdXT3ROSERcLzRRNDB2ZnVUQ2oyTDlmOVRHa3JNUUZJTnBkUT09IiwibWFjIjoiZWM3ZjFjYmQ2ZThkNmRlM2JmOTY5OWZiYWMxOTA4ZmZiZjcxZjU1ODJjNjU1ODgzYjczMmUxMGY1NGMwMjNlMCJ9; expires=Thu, 08-Feb-2018 14:39:30 GMT; Max-Age=7200; path=/; httponly
  Set-Cookie:XSRF-TOKEN=eyJpdiI6IjExMjBaRHNmWHVLZTBzSURYZFwvUmF3PT0iLCJ2YWx1ZSI6InQyWUE5QzZEd2xmZU5rMjlyekV1Z2JcL3lGNkNvbHl1TnBHMVh5eWtLeWtNb3JHcTJJSFpyR0lDVkxNV2h2cGsrTUhYMGl3ZDBET0hucHdpNzV0YkRpdz09IiwibWFjIjoiNzcxODBhYjIzYjEzNDU1OTNhNGRhNjI3OTAxNWY1MjFkYjI5MWQ5NjgwNGE4ZjVmMzQzZThkNWUzZWE0YTgwYSJ9; expires=Thu, 08-Feb-2018 14:39:30 GMT; Max-Age=7200; path=/
  Vary:Accept-Encoding

Request Headers
 Accept:text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8
  Accept-Encoding:gzip, deflate, br
  Accept-Language:en-US,en;q=0.9
  Cache-Control:max-age=0
  Connection:keep-alive
  Content-Length:1943
  Content-Type:multipart/form-data; boundary=----WebKitFormBoundary1tOuJdyWl1bn7H4X
  Cookie:XSRF-TOKEN=eyJpdiI6IjZaWHdTa3FPYmNHbkxsNVpoUlE3T0E9PSIsInZhbHVlIjoiQWMraGlLekd1akkrc0RDTzNMRGNIcVFkVGdBNjZFa2h4XC8xcUI0VmtIVG9CTnVPNW1IUW55NU9iNGlGY0NCWkFkd0hDZnJOaXBaT3J0VHZTSXl6b1FBPT0iLCJtYWMiOiJmMjE3N2JkZDIyMjRkNTY3ZGE4MDhlNGY5OWJiMDAwYjNiNzYyNGJjMTc2YzA4NTQwODcxZTM3YjI0YjQ5MWUyIn0%3D; nlsdb_session=eyJpdiI6IjByb2dtS0Q1ekFBU1F0WURJUk8rWnc9PSIsInZhbHVlIjoiM3lMNFU5Y2hBXC9BVU0xT0RUNnhVaUJ0ckJ0RnB5QlJqbk15alNSNkM4MjhNTGd6TFwvR0dwd0ZpWE9pU3piekhWb3ZzQjNZYVQ4ODdHeUxUMVJWM0pwUT09IiwibWFjIjoiYTE1Y2Q2NmRlN2M4Yjc1MzEyZTQxYjcwMzVmYjNiNjA1YjdiNjU4ODkxZWJhM2JmYTAwYTk1MWNhZWNkNTczMiJ9
  DNT:1
  Host:rostlab.org
  Origin:https://rostlab.org
  Referer:https://rostlab.org/services/nlsdb/
  Upgrade-Insecure-Requests:1
  User-Agent:Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.119 Safari/537.36

Request Payload  
  ------WebKitFormBoundary1tOuJdyWl1bn7H4X
  Content-Disposition: form-data; name="_token"

  GnjGT2Ejrrpo4Nlf2EbwtmLtY29GNFnoTJpl5z5o
  ------WebKitFormBoundary1tOuJdyWl1bn7H4X
  Content-Disposition: form-data; name="input-data"


  ------WebKitFormBoundary1tOuJdyWl1bn7H4X
  Content-Disposition: form-data; name="file-upload"; filename="some.txt"
  Content-Type: text/plain


  ------WebKitFormBoundary1tOuJdyWl1bn7H4X
  Content-Disposition: form-data; name="query-sig2"

  sF4MZkIaMc1K9TPZ6uYJuQ
  ------WebKitFormBoundary1tOuJdyWl1bn7H4X--

我认为数据对象不正确,但我无法做到正确,省略它似乎也不起作用。有关如何正确检索数据,然后下载相应的csv文件的任何建议吗?

1 个答案:

答案 0 :(得分:0)

该网站使用cross-site scripting tokens来防范常见的攻击类别。此外,他们还使用生成的令牌作为提交按钮。

为了能够发布任何内容,您需要:

  • 存储并返回Cookie。使用session object
  • 这是最简单的方法
  • 加载表单页面并读出CSRF令牌并提交按钮值
  • 在POST请求中使用提取的令牌

我使用BeautifulSoup来解析表单页面并提取标记:

+---+----+
| id|name|
+---+----+
| 12| cdf|
| 11| abc|
+---+----+

请注意,我还提取了from bs4 import BeautifulSoup import requests form_url = 'https://rostlab.org/services/nlsdb/' with requests.session() as sess: response = sess.get(form_url) soup = BeautifulSoup(response.content, 'html.parser') csrf_token = soup.find('input', {'name': '_token'})['value'] submit_token = soup.find('button', id='submit-sig2')['value'] action_url = soup.find('form', id='input-form')['action'] data = {'_token': csrf_token, 'query-sig2': submit_token, 'input-data':''} with open('C:\\some.txt', 'rb') as some_text: files = {'file-upload': ('some.txt', some_text, 'text/plain')} response = sess.post(action_url, data=data, files=files) 标记的action属性;最好坚持服务器告诉我们使用的内容。

以上代码生成200 OK响应,其中HTML页面列出了表格中的匹配结果。