使用python请求时接收不完整的数据

时间:2018-07-25 07:58:58

标签: python json django post request

我正在抓取一个网站并收集其数据。我有一些搜寻器机器,它们在其中将数据发送到中央服务器。将数据发送到服务器的搜寻器的部分代码如下:

requests.post(url, json=data, timeout=timeout, cookies=cookies, headers=headers)

在使用django的中央服务器端,我有以下代码:

def save_users_data(request):
    body = json.loads(request.body)
    // do something on data received

有时服务器从爬网程序接收不完整的数据,因此json包无法加载数据并引发错误。例如,服务器在request.body中接收到以下数据:

b'{"social_network": "some network", "text": "\\u0646\\u06cc\\u0633 \\u0628\\u0627\\u06cc\\u062f \\u0622\\u062a\\u06cc\\u0634 \\u062f\\u0631\\u0633\\u062a \\u06a9\\u0631\\u062f\\u0628\\u0631\\u06af\\u0634\\u062a\\u'

并引发以下错误:

json.decoder.JSONDecodeError: Invalid \uXXXX escape

问题出在哪里?

编辑

这几行nginx error.log文件:

2018/07/25 12:54:39 [info] 29199#29199: *2520751 client 45.55.4.47 closed keepalive connection
2018/07/25 12:54:39 [info] 29199#29199: *2520753 client 188.166.71.114 closed keepalive connection
2018/07/25 12:55:35 [info] 29199#29199: *2520755 client 45.55.4.47 closed keepalive connection
2018/07/25 12:55:58 [info] 29199#29199: *2520757 client 45.55.4.47 closed keepalive connection
2018/07/25 12:55:59 [info] 29199#29199: *2520759 client 45.55.197.140 closed keepalive connection
2018/07/25 12:56:03 [info] 29199#29199: *2520761 client 188.166.71.114 closed keepalive connection
2018/07/25 12:56:04 [info] 29197#29197: *2520715 epoll_wait() reported that client prematurely closed connection, so upstream connection is closed too while sending request to upstream, client: 167.99.189.246, server: 91.208.165.33, request: "POST /crawler/save/users-data/ HTTP/1.1", upstream: "http://unix:/home/social/centralsystem/centralsystem.sock:/crawler/save/users-data/", host: "91.208.165.33"
2018/07/25 12:56:11 [info] 29197#29197: *2520723 epoll_wait() reported that client prematurely closed connection, so upstream connection is closed too while sending request to upstream, client: 159.89.20.103, server: 91.208.165.33, request: "POST /crawler/save/users-data/ HTTP/1.1", upstream: "http://unix:/home/social/centralsystem/centralsystem.sock:/crawler/save/users-data/", host: "91.208.165.33"
2018/07/25 12:56:12 [info] 29197#29197: *2520724 epoll_wait() reported that client prematurely closed connection, so upstream connection is closed too while sending request to upstream, client: 209.97.142.45, server: 91.208.165.33, request: "POST /crawler/save/users-data/ HTTP/1.1", upstream: "http://unix:/home/social/centralsystem/centralsystem.sock:/crawler/save/users-data/", host: "91.208.165.33"
2018/07/25 12:56:16 [info] 29199#29199: *2520765 client 67.207.92.190 closed keepalive connection
2018/07/25 12:56:17 [info] 29197#29197: *2520729 epoll_wait() reported that client prematurely closed connection, so upstream connection is closed too while sending request to upstream, client: 188.226.178.98, server: 91.208.165.33, request: "POST /crawler/save/users-data/ HTTP/1.1", upstream: "http://unix:/home/social/centralsystem/centralsystem.sock:/crawler/save/users-data/", host: "91.208.165.33"
2018/07/25 12:56:22 [info] 29199#29199: *2520770 client 188.166.71.114 closed keepalive connection
2018/07/25 12:56:26 [info] 29199#29199: *2520767 client 159.89.20.103 closed keepalive connection
2018/07/25 12:56:27 [info] 29197#29197: *2520777 client 159.89.20.103 closed keepalive connection
2018/07/25 12:56:28 [info] 29199#29199: *2520773 client 188.226.178.98 closed keepalive connection
2018/07/25 12:56:28 [info] 29197#29197: *2520779 client 45.55.197.140 closed keepalive connection
2018/07/25 12:56:29 [info] 29197#29197: *2520782 client 188.226.178.98 closed keepalive connection
2018/07/25 12:56:30 [info] 29199#29199: *2520768 client 209.97.142.45 closed keepalive connection
2018/07/25 12:56:30 [info] 29197#29197: *2520781 client 67.207.92.190 closed keepalive connection
2018/07/25 12:56:31 [info] 29197#29197: *2520786 client 209.97.142.45 closed keepalive connection
2018/07/25 12:56:36 [info] 29199#29199: *2520775 client 67.207.92.190 closed keepalive connection

2 个答案:

答案 0 :(得分:0)

如评论问题中所述,我的requests.post超时对于服务器响应而言很低,并且客户端在服务器响应之前关闭了连接。

答案 1 :(得分:-1)

编辑

您可以尝试使用json.dumps而不是json.loads来运行它吗?

json.loads仅接受unicode字符串,因此可能需要解码。

body_unicode = request.body.decode('utf-8')
body = json.loads(body_unicode)
content = body['content']

在这里阅读:Trying to parse `request.body` from POST in Django