我目前正在编写一个Python程序来接收来自TCP / UDP套接字的数据,然后将数据写入文件。现在,我的程序是通过将每个数据报写入文件来限制I / O(我为非常大的文件执行此操作,因此减速很大)。考虑到这一点,我决定尝试在一个线程中尝试从套接字接收数据,然后在不同的线程中写入该数据。到目前为止,我已经提出了以下草案。目前,它只将一个数据块(512字节)写入文件。
f = open("t1.txt","wb")
def write_to_file(data):
f.write(data)
def recv_data():
dataChunk, addr = sock.recvfrom(buf) #THIS IS THE DATA THAT GETS WRITTEN
try:
w = threading.Thread(target = write_to_file, args = (dataChunk,))
threads.append(w)
w.start()
while(dataChunk):
sock.settimeout(4)
dataChunk,addr = sock.recvfrom(buf)
except socket.timeout:
print "Timeout"
sock.close()
f.close()
threads = []
r = threading.Thread(target=recv_data)
threads.append(r)
r.start()
我想我做错了什么,我只是不确定使用线程的最佳方法是什么。现在,我的问题是我必须在创建线程时提供参数,但该参数的值没有正确更改以反映进来的新数据块。但是,如果我放行{在w=threading.Thread(target=write_to_file, arg=(dataChunk,))
循环中{1}},我不会在每次迭代中创建一个新线程吗?
此外,对于它的价值,这只是我使用单独的接收和写入线程的小概念验证。这不是最终应该使用这个概念的更大的程序。
答案 0 :(得分:1)
您需要有一个读取线程写入的缓冲区,并且写入线程从中读取。 deque
from the collections
module是完美的,因为它允许来自任何一方的追加/弹出而不会降低性能。
所以,不要将dataChunk
传递给你的线程,而是缓冲区。
import collections # for the buffer
import time # to ease polling
import threading
def write_to_file(path, buffer, terminate_signal):
with open(path, 'wb') as out_file: # close file automatically on exit
while not terminate_signal.is_set() or buffer: # go on until end is signaled
try:
data = buffer.pop() # pop from RIGHT end of buffer
except IndexError:
time.sleep(0.5) # wait for new data
else:
out_file.write(data) # write a chunk
def read_from_socket(sock, buffer, terminate_signal):
sock.settimeout(4)
try:
while True:
data, _ = sock.recvfrom(buf)
buffer.appendleft(data) # append to LEFT of buffer
except socket.timeout:
print "Timeout"
terminate_signal.set() # signal writer that we are done
sock.close()
buffer = collections.deque() # buffer for reading/writing
terminate_signal = threading.Event() # shared signal
threads = [
threading.Thread(target=read_from_socket, kwargs=dict(
sock=sock,
buffer=buffer,
terminate_signal=terminate_signal
)),
threading.Thread(target= write_to_file, kwargs=dict(
path="t1.txt",
buffer=buffer,
terminate_signal=terminate_signal
))
]
for t in threads: # start both threads
t.start()
for t in threads: # wait for both threads to finish
t.join()