我正在尝试使用带有Python requests
库的POST请求从网站上抓取一些数据。不幸的是,我无法发布该页面的链接,因为您必须登录该网站才能使用该页面。
我要复制的请求的文件扩展名为.ehtml,这是我要重新创建的请求有效负载的一部分:
------WebKitFormBoundary8rntuVzldIBHkILv
Content-Disposition: form-data; name="session_id"
W0pNKn8AAQEAACD-XkYAAAAJ
------WebKitFormBoundary8rntuVzldIBHkILv
Content-Disposition: form-data; name="p_session_id"
W0pMOH8AAQEAABZSUVkAAAAD
------WebKitFormBoundary8rntuVzldIBHkILv
Content-Disposition: form-data; name="attach_key"
------WebKitFormBoundary8rntuVzldIBHkILv
Content-Disposition: form-data; name="chosen"
0
------WebKitFormBoundary8rntuVzldIBHkILv
Content-Disposition: form-data; name="debug"
------WebKitFormBoundary8rntuVzldIBHkILv
Content-Disposition: form-data; name="language"
en
------WebKitFormBoundary8rntuVzldIBHkILv
Content-Disposition: form-data; name="game_system_id"
NULL
------WebKitFormBoundary8rntuVzldIBHkILv
Content-Disposition: form-data; name="collection_detail_id"
NULL
------WebKitFormBoundary8rntuVzldIBHkILv
Content-Disposition: form-data; name="competition_id"
NULL
借助一些有关stackoverflow的问题的帮助,到目前为止,我已经成功地重新创建了它:
--30b11983bde849109a3dc93e139e16d4
Content-Disposition: form-data; name="session_id"
--30b11983bde849109a3dc93e139e16d4
Content-Disposition: form-data; name="p_session_id"
--30b11983bde849109a3dc93e139e16d4
Content-Disposition: form-data; name="attach_key"
--30b11983bde849109a3dc93e139e16d4
Content-Disposition: form-data; name="chosen"
0
--30b11983bde849109a3dc93e139e16d4
Content-Disposition: form-data; name="debug"
--30b11983bde849109a3dc93e139e16d4
Content-Disposition: form-data; name="language"
en
--30b11983bde849109a3dc93e139e16d4
Content-Disposition: form-data; name="game_system_id"
NULL
--30b11983bde849109a3dc93e139e16d4
Content-Disposition: form-data; name="collection_detail_id"
NULL
--30b11983bde849109a3dc93e139e16d4
Content-Disposition: form-data; name="competition_id"
NULL
这是使用以下代码完成的:
Q = {
"session_id" : (None,""),
"p_session_id" : (None,""),
"attach_key" : (None,""),
"chosen" : (None,"0"),
"debug" : (None,""),
"language" : (None,"en"),
"game_system_id" : (None,"NULL"),
"collection_detail_id" : (None,"NULL"),
"competition_id" : (None,"NULL")
}
with requests.Session() as s:
p = s.post(login_URL2,data=payload)
#print(p.text)
#d = s.post(req_url,files=Q)
d2 = Request("POST",req_url,files=Q)
d3 = d2.prepare()
print(d3.body.decode('utf-8'))
我相信我缺少的最后一件事是WebKitFormBoundary部分,我找不到任何地方如何插入该部分。这是我第一次使用.ehtml文件进行抓取,因此,如果我错过了其他明显的问题,我们将不胜感激。
答案 0 :(得分:3)
import requests
import random,string
from requests_toolbelt import MultipartEncoder
fields = {
'file': ('test.png', your_data, "image/png"),
'file_id': "0"
}
boundary = '----WebKitFormBoundary' \
+ ''.join(random.sample(string.ascii_letters + string.digits, 16))
m = MultipartEncoder(fields=fields, boundary=boundary)
headers = {
"Host": "xxxx",
"Connection": "keep-alive",
"Content-Type": m.content_type
}
req = requests.post('https://xxxx/api/upload', headers=headers, data=m)
print(req.text)
通过这种方式,我们可以制作像------WebKitFormBoundary8rntuVzldIBHkILv
这样的边界格式。
答案 1 :(得分:1)
边界的确切名称并不重要,只要在标头中声明了边界即可
Content-Type: multipart/mixed; boundary=gc0p4Jq0M2Yt08jU534c0p
有了此标头,边界将是
--gc0p4Jq0M2Yt08jU534c0p
服务器将查看Content-Type
标头并找出正文部分。
答案 2 :(得分:1)
当您通过jQuery发送ajax请求并且要发送FormData时,无需在此FormData上使用JSON.stringify。同样,当您发送文件时,内容类型必须是包含边界的multipart / form-data-类似于multipart / form-data; boundary = ---- WebKitFormBoundary0BPm0koKA
So to send FormData including some file via jQuery ajax you need to:
Set data to the FormData without any modifications.
Set processData to false (Lets you prevent jQuery from automatically transforming the data into a query string).
Set the contentType to false (This is needed because otherwise jQuery will set it incorrectly).
Your request should look like this:
var formData = new FormData();
formData.append('name', dogName);
// ...
formData.append('file', document.getElementById("dogImg").files[0]);
$.ajax({
type: "POST",
url: "/foodoo/index.php?method=insertNewDog",
data: formData,
processData: false,
contentType: false,
success: function(response) {
console.log(response);
},
error: function(errResponse) {
console.log(errResponse);
}
});
答案 3 :(得分:0)
------ WebKitFormBoundary89uZMBZwSHfYjySK 内容处置:表单数据; name =“ account_number”
等等 ------ WebKitFormBoundary89uZMBZwSHfYjySK 内容处置:表单数据; name =“ date_of_birth”
等等 ------ WebKitFormBoundary89uZMBZwSHfYjySK 内容处置:表单数据; name =“ first_name”
等等 ------ WebKitFormBoundary89uZMBZwSHfYjySK 内容处置:表单数据; name =“ last_name”
等等 ------ WebKitFormBoundary89uZMBZwSHfYjySK-
我基本上已经将这些webkitform边界转换为JSON,如下所示:
导入请求
数据= { “ account_number”:等等, “ date_of_birth”:“等等”, “ first_name”:“等等”, “ last_name”:“等等” }
标题= { “授权”:“承载等等” }
req = requests.post('https://rest.blah/v1/blah/sign-in',headers = headers,data = data) 打印(要求内容)
响应:
b'{“代码”:200,“数据”:{“ user_id”:“ 15442”,“ building_id”:“ 11”,“ apartment_id”:“ 4192”}}
答案 4 :(得分:0)
手动设置 Content-Type 标头意味着它缺少边界参数。删除该标题并允许 fetch 生成完整的内容类型。它看起来像这样:
Content-Type: multipart/form-data;boundary=----WebKitFormBoundaryyrV7KO0BoCBuDbTL
Fetch 根据作为请求正文内容传入的 FormData 对象知道要创建哪种内容类型标头。