我正在抓取一个网站并收集其数据。我有一些搜寻器机器,它们在其中将数据发送到中央服务器。将数据发送到服务器的搜寻器的部分代码如下:
requests.post(url, json=data, timeout=timeout, cookies=cookies, headers=headers)
在使用django的中央服务器端,我有以下代码:
def save_users_data(request):
body = json.loads(request.body)
// do something on data received
有时服务器从爬网程序接收不完整的数据,因此json包无法加载数据并引发错误。例如,服务器在request.body中接收到以下数据:
b'{"social_network": "some network", "text": "\\u0646\\u06cc\\u0633 \\u0628\\u0627\\u06cc\\u062f \\u0622\\u062a\\u06cc\\u0634 \\u062f\\u0631\\u0633\\u062a \\u06a9\\u0631\\u062f\\u0628\\u0631\\u06af\\u0634\\u062a\\u'
并引发以下错误:
json.decoder.JSONDecodeError: Invalid \uXXXX escape
问题出在哪里?
编辑
这几行nginx error.log文件:
2018/07/25 12:54:39 [info] 29199#29199: *2520751 client 45.55.4.47 closed keepalive connection
2018/07/25 12:54:39 [info] 29199#29199: *2520753 client 188.166.71.114 closed keepalive connection
2018/07/25 12:55:35 [info] 29199#29199: *2520755 client 45.55.4.47 closed keepalive connection
2018/07/25 12:55:58 [info] 29199#29199: *2520757 client 45.55.4.47 closed keepalive connection
2018/07/25 12:55:59 [info] 29199#29199: *2520759 client 45.55.197.140 closed keepalive connection
2018/07/25 12:56:03 [info] 29199#29199: *2520761 client 188.166.71.114 closed keepalive connection
2018/07/25 12:56:04 [info] 29197#29197: *2520715 epoll_wait() reported that client prematurely closed connection, so upstream connection is closed too while sending request to upstream, client: 167.99.189.246, server: 91.208.165.33, request: "POST /crawler/save/users-data/ HTTP/1.1", upstream: "http://unix:/home/social/centralsystem/centralsystem.sock:/crawler/save/users-data/", host: "91.208.165.33"
2018/07/25 12:56:11 [info] 29197#29197: *2520723 epoll_wait() reported that client prematurely closed connection, so upstream connection is closed too while sending request to upstream, client: 159.89.20.103, server: 91.208.165.33, request: "POST /crawler/save/users-data/ HTTP/1.1", upstream: "http://unix:/home/social/centralsystem/centralsystem.sock:/crawler/save/users-data/", host: "91.208.165.33"
2018/07/25 12:56:12 [info] 29197#29197: *2520724 epoll_wait() reported that client prematurely closed connection, so upstream connection is closed too while sending request to upstream, client: 209.97.142.45, server: 91.208.165.33, request: "POST /crawler/save/users-data/ HTTP/1.1", upstream: "http://unix:/home/social/centralsystem/centralsystem.sock:/crawler/save/users-data/", host: "91.208.165.33"
2018/07/25 12:56:16 [info] 29199#29199: *2520765 client 67.207.92.190 closed keepalive connection
2018/07/25 12:56:17 [info] 29197#29197: *2520729 epoll_wait() reported that client prematurely closed connection, so upstream connection is closed too while sending request to upstream, client: 188.226.178.98, server: 91.208.165.33, request: "POST /crawler/save/users-data/ HTTP/1.1", upstream: "http://unix:/home/social/centralsystem/centralsystem.sock:/crawler/save/users-data/", host: "91.208.165.33"
2018/07/25 12:56:22 [info] 29199#29199: *2520770 client 188.166.71.114 closed keepalive connection
2018/07/25 12:56:26 [info] 29199#29199: *2520767 client 159.89.20.103 closed keepalive connection
2018/07/25 12:56:27 [info] 29197#29197: *2520777 client 159.89.20.103 closed keepalive connection
2018/07/25 12:56:28 [info] 29199#29199: *2520773 client 188.226.178.98 closed keepalive connection
2018/07/25 12:56:28 [info] 29197#29197: *2520779 client 45.55.197.140 closed keepalive connection
2018/07/25 12:56:29 [info] 29197#29197: *2520782 client 188.226.178.98 closed keepalive connection
2018/07/25 12:56:30 [info] 29199#29199: *2520768 client 209.97.142.45 closed keepalive connection
2018/07/25 12:56:30 [info] 29197#29197: *2520781 client 67.207.92.190 closed keepalive connection
2018/07/25 12:56:31 [info] 29197#29197: *2520786 client 209.97.142.45 closed keepalive connection
2018/07/25 12:56:36 [info] 29199#29199: *2520775 client 67.207.92.190 closed keepalive connection
答案 0 :(得分:0)
如评论问题中所述,我的requests.post
超时对于服务器响应而言很低,并且客户端在服务器响应之前关闭了连接。
答案 1 :(得分:-1)
编辑
您可以尝试使用json.dumps
而不是json.loads
来运行它吗?
json.loads
仅接受unicode字符串,因此可能需要解码。
body_unicode = request.body.decode('utf-8')
body = json.loads(body_unicode)
content = body['content']