我可以对下面的代码做什么(我认为会话会解决这个问题?)以防止每个GET请求创建新的TCP连接?我每秒钟大约要求1000个请求,大约10,000个请求用完了套接字后:
def ReqOsrm(url_input):
ul, qid = url_input
conn_pool = HTTPConnectionPool(host='127.0.0.1', port=5005, maxsize=1)
try:
response = conn_pool.request('GET', ul)
json_geocode = json.loads(response.data.decode('utf-8'))
status = int(json_geocode['status'])
if status == 200:
tot_time_s = json_geocode['route_summary']['total_time']
tot_dist_m = json_geocode['route_summary']['total_distance']
used_from, used_to = json_geocode['via_points']
out = [qid, status, tot_time_s, tot_dist_m, used_from[0], used_from[1], used_to[0], used_to[1]]
return out
else:
print("Done but no route: %d %s" % (qid, req_url))
return [qid, 999, 0, 0, 0, 0, 0, 0]
except Exception as err:
print("%s: %d %s" % (err, qid, req_url))
return [qid, 999, 0, 0, 0, 0, 0, 0]
# run:
pool = Pool(int(cpu_count()))
calc_routes = pool.map(ReqOsrm, url_routes)
pool.close()
pool.join()
HTTPConnectionPool(host ='127.0.0.1',port = 5005):超出最大重试次数 用url: /viaroute?loc=44.779708,4.2609877&loc=44.648439,4.2811959&alt=false&geometry=false (由NewConnectionError引起(':无法建立新连接: [WinError 10048]每个套接字地址只有一种用法 (协议/网络地址/端口)通常允许',)))
Eric - 非常感谢你的反应,我认为这正是我的需要。但是,我无法正确修改它。代码在前几个周期正确返回10,000个响应然后它似乎中断并返回少于10,000个,这导致我认为我错误地实现了队列?
ghost = 'localhost'
gport = 8989
def CreateUrls(routes, ghost, gport):
return [
["http://{0}:{1}/route?point={2}%2C{3}&point={4}%2C{5}&vehicle=car&calc_points=false&instructions=false".format(
ghost, gport, alat, alon, blat, blon),
qid] for qid, alat, alon, blat, blon in routes]
def LoadRouteCSV(csv_loc):
if not os.path.isfile(csv_loc):
raise Exception("Could not find CSV with addresses at: %s" % csv_loc)
else:
return pd.read_csv(csv_loc, sep=',', header=None, iterator=True, chunksize=1000 * 10)
class Worker(Process):
def __init__(self, qin, qout, *args, **kwargs):
super(Worker, self).__init__(*args, **kwargs)
self._qin = qin
self._qout = qout
def run(self):
# Create threadsafe connection pool
conn_pool = HTTPConnectionPool(host=ghost, port=gport, maxsize=10)
class Consumer(threading.Thread):
def __init__(self, qin, qout):
threading.Thread.__init__(self)
self.__qin = qin
self.__qout = qout
def run(self):
while True:
msg = self.__qin.get()
ul, qid = msg
try:
response = conn_pool.request('GET', ul)
s = float(response.status)
if s == 200:
json_geocode = json.loads(response.data.decode('utf-8'))
tot_time_s = json_geocode['paths'][0]['time']
tot_dist_m = json_geocode['paths'][0]['distance']
out = [qid, s, tot_time_s, tot_dist_m]
elif s == 400:
print("Done but no route for row: ", qid)
out = [qid, 999, 0, 0]
else:
print("Done but unknown error for: ", s)
out = [qid, 999, 0, 0]
except Exception as err:
print(err)
out = [qid, 999, 0, 0]
self.__qout.put(out)
self.__qin.task_done()
num_threads = 10
[Consumer(self._qin, self._qout).start() for _ in range(num_threads)]
if __name__ == '__main__':
try:
with open(os.path.join(directory_loc, 'gh_output.csv'), 'w') as outfile:
wr = csv.writer(outfile, delimiter=',', lineterminator='\n')
for x in LoadRouteCSV(csv_loc=os.path.join(directory_loc, 'gh_input.csv')):
routes = x.values.tolist()
url_routes = CreateUrls(routes, ghost, gport)
del routes
stime = time.time()
qout = Queue()
qin = JoinableQueue()
[qin.put(url_q) for url_q in url_routes]
[Worker(qin, qout).start() for _ in range(cpu_count())]
# Block until all urls in qin are processed
qin.join()
calc_routes = []
while not qout.empty():
calc_routes.append(qout.get())
# Time diagnostics
dur = time.time() - stime
print("Calculated %d distances in %.2f seconds: %.0f per second" % (len(calc_routes),
dur,
len(calc_routes) / dur))
del url_routes
wr.writerows(calc_routes)
done_count += len(calc_routes)
# Continually update progress in terms of millions
print("Saved %d calculations" % done_count)
答案 0 :(得分:1)
我在想更像这样的事情。我们的想法是为每个核心生成一个进程,并为每个进程生成一个线程池。每个进程都有一个单独的连接池,在该进程的线程之间共享。如果没有某种线程,我认为你无法获得更高效的解决方案。
from multiprocessing import Pool, cpu_count
import Queue
from urllib3 import HTTPConnectionPool
import threading
def ReqOsrm(url_input):
# Create threadsafe connection pool
conn_pool = HTTPConnectionPool(host='127.0.0.1', port=5005, maxsize=1000)
# Create consumer thread class
class Consumer(threading.Thread):
def __init__(self, queue):
threading.Thread.__init__(self)
self._queue = queue
def run(self):
while True:
msg = self._queue.get()
try:
response = conn_pool.request('GET', url)
print response
except Exception as err:
print err
self._queue.task_done()
# Create work queue and a pool of workers
queue = Queue.Queue()
num_threads = 20
workers = []
for _ in xrange(num_threads):
worker = Consumer(queue)
worker.start()
workers.append(worker)
for url in url_input:
queue.put(url)
queue.join()
url_routes = [
["/proc1-0", "/proc1-1"],
["/proc2-0", "/proc2-1"],
["/proc3-0", "/proc3-1"],
["/proc4-0", "/proc4-1"],
["/proc5-0", "/proc5-1"],
["/proc6-0", "/proc6-1"],
["/proc7-0", "/proc7-1"],
["/proc8-0", "/proc8-1"],
["/proc9-0", "/proc9-1"],
]
pool = Pool(int(cpu_count()))
calc_routes = pool.map(ReqOsrm, url_routes)
pool.close()
pool.join()
答案 1 :(得分:0)
感谢帮助 - 我的工作解决方案:
class Worker(Process):
def __init__(self, qin, qout, *args, **kwargs):
super(Worker, self).__init__(*args, **kwargs)
self._qin = qin
self._qout = qout
def run(self):
# Create threads to run in process
class Consumer(threading.Thread):
def __init__(self, qin, qout):
threading.Thread.__init__(self)
self.__qin = qin
self.__qout = qout
def run(self):
# Close once queue empty (otherwise process will linger)
while not self.__qin.empty():
msg = self.__qin.get()
ul, qid = msg
try:
response = conn_pool.request('GET', ul)
s = float(response.status)
if s == 200:
json_geocode = json.loads(response.data.decode('utf-8'))
tot_time_s = json_geocode['paths'][0]['time']
tot_dist_m = json_geocode['paths'][0]['distance']
out = [qid, s, tot_time_s, tot_dist_m]
elif s == 400:
#print("Done but no route for row: ", qid)
out = [qid, 999, 0, 0]
else:
print("Done but unknown error for: ", s)
out = [qid, 999, 0, 0]
except Exception as err:
print(err)
out = [qid, 999, 0, 0]
#print(out)
self.__qout.put(out)
self.__qin.task_done()
# Create thread-safe connection pool
concurrent = 10
conn_pool = HTTPConnectionPool(host=ghost, port=gport, maxsize=concurrent)
num_threads = concurrent
# Start threads (concurrent) per process
[Consumer(self._qin, self._qout).start() for _ in range(num_threads)]
# Block until all urls in self._qin are processed
self._qin.join()
return
if __name__ == '__main__':
# Fill queue input
qin = JoinableQueue()
[qin.put(url_q) for url_q in url_routes]
# Queue to collect output
qout = Queue()
# Start cpu_count number of processes (which will launch threads and sessions)
workers = []
for _ in range(cpu_count()):
workers.append(Worker(qin, qout))
workers[-1].start()
# Block until all urls in qin are processed
qin.join()
# Fill routes
calc_routes = []
while not qout.empty():
calc_routes.append(qout.get())
del qin, qout