Question

我有一个程序可以读取一些输入文本文件并将它们全部写入一个单独的文件中。我使用了两个线程，所以它运行得更快！我用一个线程和两个线程尝试了以下python代码！为什么当我使用一个线程运行它比我用两个线程运行时运行得更快？

processedFiles=[]
# Define a function for the threads
def print_time( threadName, delay):
   for file in glob.glob("*.txt"):
      #check if file has been read by another thread already 
      if file not in processedFiles:
         processedFiles.append(file)
         f = open(file,"r")
         lines = f.readlines()
         f.close()
         time.sleep(delay)
         f = open('myfile','a')
         f.write("%s \n" %lines) # python will convert \n to os.linesep
         f.close() # you can omit in most cases as the destructor will call it
         print "%s: %s" % ( threadName, time.ctime(time.time()) )

   
# Create two threads as follows
try:
   f = open('myfile', 'r+')
   f.truncate()

   start = timeit.default_timer()

   t1 = Thread(target=print_time, args=("Thread-1", 0,))
   t2 = Thread(target=print_time, args=("Thread-2", 0,))
   t1.start()
   t2.start()


   stop = timeit.default_timer()

   print stop - start

except:
   print "Error: unable to start thread"

Answer 1

我偶尔会遇到一些问题，但通常你的程序是磁盘限制的（它不会比你的硬盘驱动器快），所以即使一个正确的线程程序也不是。更快。由于文件系统缓存，可能很难测量磁盘性能：您使用线程运行一次，然后以硬盘驱动器速度运行，无需线程再次运行它，并且文件在系统中，因此请快速运行。当数据不再在系统缓存中时，很难弄清楚代码将如何执行。

所以现在问题。

if file not in processedFiles:不是线程安全的。两个线程都可以查看空列表并决定复制相同的文件。至少你需要一把锁。或者您可以执行glob一次并将文件传递给线程读取的队列。

逐行读取文件然后加入\n是编写文件的一种疯狂的慢速方式。改为使用shutil.copyfileobj - 它可以有效地复制文件。

f = open('myfile','a')现在你有多个文件描述符到一个文件，每个文件描述符将独立地提前它们....所以一个覆盖另一个。

f.write("%s \n" %lines)也不是线程安全的。您最终可能会在输出文件中相互交错。

stop = timeit.default_timer() - 你没有等待线程完成他们的工作，所以你没有真正衡量任何有用的东西。代码严重低估了执行时间。

使用简单的单线程脚本会好得多。

时间比较多线程

1 个答案: