我正在尝试在我的代码中实现Yandex OCR translator tool。在Burp Suite的帮助下,我设法发现以下请求是用于发送图像的请求:
我正在尝试使用以下代码模拟此请求:
import requests
from requests_toolbelt import MultipartEncoder
files={
'file':("blob",open("image_path", 'rb'),"image/jpeg")
}
#(<filename>, <file object>, <content type>, <per-part headers>)
burp0_url = "https://translate.yandex.net:443/ocr/v1.1/recognize?srv=tr-image&sid=9b58493f.5c781bd4.7215c0a0&lang=en%2Cru"
m = MultipartEncoder(files, boundary='-----------------------------7652580604126525371226493196')
burp0_headers = {"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:65.0) Gecko/20100101 Firefox/65.0", "Accept": "*/*", "Accept-Language": "en-US,en;q=0.5", "Accept-Encoding": "gzip, deflate", "Referer": "https://translate.yandex.com/", "Content-Type": "multipart/form-data; boundary=-----------------------------7652580604126525371226493196", "Origin": "https://translate.yandex.com", "DNT": "1", "Connection": "close"}
print(requests.post(burp0_url, headers=burp0_headers, files=m.to_string()).text)
尽管很遗憾,它会产生以下输出:
{"error":"BadArgument","description":"Bad argument: file"}
有人知道如何解决吗?
非常感谢!
答案 0 :(得分:2)
您正在将mutate
结果传递给list
参数。现在,您正在请求将多部分编码器的结果编码为多部分组件的请求。那是太多一次了。
您无需在此处复制每个字节,只需发布文件,然后也许设置用户代理,引荐来源和来源:
library(tidyverse)
lst1 <- mget(ls(pattern = "^df_\\d+")) %>%
map(~ .x %>%
mutate(x = str_pad(x, width = 8, pad = "0")))
Connection 标头最好留给请求,它可以控制何时保持连接正常。 Accept * 标头用于告知服务器您的客户端可以处理的内容,MultipartEncoder.to_string()
也可以自动设置它们。
我收到该代码的200 OK响应:
files
但是,如果您没有设置其他标头(删除files = {
'file': ("blob", open("image_path", 'rb'), "image/jpeg")
}
url = "https://translate.yandex.net:443/ocr/v1.1/recognize?srv=tr-image&sid=9b58493f.5c781bd4.7215c0a0&lang=en%2Cru"
headers = {
"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:65.0) Gecko/20100101 Firefox/65.0",
"Referer": "https://translate.yandex.com/",
"Origin": "https://translate.yandex.com",
}
response = requests.post(url, headers=headers, files=files)
print(response.status)
print(response.json())
参数),请求 也会起作用,因此Yandex在这里似乎没有为机器人过滤。 / p>