我当前正在编写一个代码,允许用户使用多线程功能同时在不同的网页上拍摄多个屏幕截图。
代码:
import selenium
import threading
import time, datetime
from datetime import date, timedelta
from selenium import webdriver
domain_file = r'C:\Users\a\testfiles\testdomains.txt'
driver = webdriver.PhantomJS()
def file_len(file):
with open(file, 'r') as f:
for i, l in enumerate(f):
pass
return i + 1
current_date = date.today().strftime('%Y-%m-%d_')
def threadedloop(d):
with open(domain_file, 'r') as f:
for line in f:
stripped_line = line.rstrip()
url1 = 'http://' + stripped_line
url2 = 'https://' + stripped_line
imgname = current_date + 'http_' + stripped_line + '.png'
imgSname = current_date + 'https_' + stripped_line + '.png'
### Screenshot function ###
def scrshot():
print('Taking screenshot of {}.'.format(stripped_line))
try:
driver.get(url1)
except TimeoutException:
print('{} timed out'.format(url1))
pass
except Exception:
print('Unknown error at {}'.format(stripped_line))
driver.maximize_window()
driver.save_screenshot(imgname)
try:
driver.get(url2)
except TimeoutException:
print('{} timed out'.format(url2))
pass
except Exception:
print('Unknown error at {}'.format(stripped_line))
driver.maximize_window()
driver.save_screenshot(imgSname)
scrshot()
d = threading.local
start = time.time()
for i in range(file_len(domain_file)):
t = threading.Thread(target = threadedloop, args=(d,))
t.start()
t.join()
end = time.time()
print(end - start)
测试文件包含4个域。 问题是网页没有分别添加到1个单线程中,而是每个页面都添加到了所有4个线程中,结果是:
Taking screenshot of google.com.
Taking screenshot of google.com.
Taking screenshot of google.com.
Taking screenshot of google.com.
Taking screenshot of reddit.com.
Taking screenshot of reddit.com.
Taking screenshot of reddit.com.
Taking screenshot of reddit.com.
Taking screenshot of facebook.com.
Taking screenshot of facebook.com.
Taking screenshot of facebook.com.
Taking screenshot of facebook.com.
Taking screenshot of facebook.com.
Taking screenshot of twitter.com.
Taking screenshot of twitter.com.
Taking screenshot of twitter.com.
Taking screenshot of twitter.com.
非常感谢您的帮助。
答案 0 :(得分:0)
我遍历了您的代码,意识到您没有正确划分子任务。
def threadedloop(d):
with open(domain_file, 'r') as f:
for line in f:
函数的这两行读取每一行,作为对“ threadlocal”函数的输入。 这意味着,每次 函数都会被调用,每个 URL都会被读取和处理。
接下来,在多线程部分
for i in range(file_len(domain_file)):
t = threading.Thread(target = threadedloop, args=(d,))
t.start()
再次 读取每一行并将其分配给该线程,因此恰好调用函数 threadedloop 。 我想您已经看到问题了。
一种更好的方法是仅在创建线程之前执行url分发部分(您在代码中排名第二的方式)。您使用用于传递 threading.local 的args参数将url传递给函数。