我对python一般都是新手。我对一切都了解不多,我决定尝试以异步方式自动处理数据。 找到关于aiohttp并且一切都很好,POST请求是异步完成的,检查服务器我得到了输入。
我的问题是当我尝试使用更大的文件时,有300.000行,每行包含一个post请求(实际上有一个包含50.000行的文件和另一个包含6的文件,包含data1 = datafromfile1& data2 = datafromfile2)
我尝试使用带队列的线程来读取文件,同时发布请求,但它对我不起作用。无论如何,这是我的代码:
import random
import asyncio
import re
import aiohttp
from aiohttp import ClientSession
import queue
headers = [ "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.103 Safari/537.36",
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.103 Safari/537.36",
"Mozilla/5.0 (Windows NT 10.0; WOW64; rv:47.0) Gecko/20100101 Firefox/47.0",
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_5) AppleWebKit/601.6.17 (KHTML, like Gecko) Version/9.1.1 Safari/601.6.17",
"Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.103 Safari/537.36",
"Mozilla/5.0 (Windows NT 6.1; WOW64; Trident/7.0; rv:11.0) like Gecko"
]
port = input("Port: ")
thread = input("Thread: ")
async def fetch(url,data1,data2):
url = "https://" + url + ":%s"%str(port)+"/?&d1="+data1 +"&d2=" + data3 + "&data3=testing"
header = { "User:Agent" : random.choice(headers)}
async with ClientSession(connector=aiohttp.TCPConnector(verify_ssl=False)) as session:
try:
#with aiohttp.Timeout(10):
async with session.post(url,headers=header) as response:
test = response.status
if test == 200:
print ("POSTED")
else:
print ("WRONG [%d]"%test )
return test
except Exception as e:
print ("ERROR [%s]"%type(e).__name__ )
return 0
async def bound_fetch(sem, url,data1,data2):
# getter function with semaphore
async with sem:
await fetch(url,data1,data2)
async def run(tuples):
tasks = []
# create instance of Semaphore
sem = asyncio.Semaphore(int(thread))
for (url,data1,data2) in tuples:
# pass Semaphore to every GET request
task = asyncio.ensure_future(bound_fetch(sem,url,data1,data2))
tasks.append(task)
responses = asyncio.gather(*tasks)
await responses
这是我读取数据并处理它的地方。我正在创建一个带有url和帖子的元组,以防我决定更改post.php或添加更多内容。
其余代码:
def main():
global tuples
with open("data1.txt") as log_file:
loop = asyncio.get_event_loop()
for line in log_file:
tuples = []
with open("data2.txt") as data_file:
for data2 in data_file:
data2 = data2.strip('\n')
tuples.append((url,data1,data2))
future = asyncio.ensure_future(run(tuples))
#why doesn't it stop after the
#first 6(one data from a file, 6 datas from the other)
#links done?
loop.run_until_complete(future)
print ("Done a batch.")
loop.close()
main()
我很确定我不理解aiohttp上的文档。有什么像thread.join这样的函数就像是来自线程的函数,以便在run_untill_complete完成它的工作之前停止阅读,直到我真的得到一个回复?我不太了解Future Object。
另外,我在Debian Linux上得到OSError,我的虚拟机在一段时间后测试脚本。它说名称或服务器是未知的,即使它工作,一些请求得到这个。有什么建议?谢谢!