我在Django视图中使用BeautifulSoup进行网页抓取,以获取一些img src并在我的页面上显示它们。我的问题是:当我在Jupyter Notebook中执行代码时,执行此任务所需的时间不到一秒,但是当我在django视图中执行此操作时,它需要大约10秒钟(取决于查询)。
这是我的代码:
from bs4 import BeautifulSoup
import requests
import re
try:
import urllib.request as urllib2
except ImportError:
import urllib2
import time
import json
def get_soup(url,header):
return BeautifulSoup(urllib2.urlopen(urllib2.Request(url,headers=header)),'html.parser')
def get_images(query):
start_time = time.time()
url="https://www.google.co.in/search?q="+query+"&source=lnms&tbm=isch"
header={'User-Agent':"Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/43.0.2357.134 Safari/537.36"
}
soup = get_soup(url,header)
print("--- %s seconds ---" % (time.time() - start_time))
start_time = time.time()
ActualImages=[]# contains the link for Large original images, type of image
for a in soup.find_all("div",{"class":"rg_meta"}):
link =json.loads(a.text)["ou"]
ActualImages.append(link)
print("--- %s seconds ---" % (time.time() - start_time))
dic = {'images': ActualImages[:10]}
return dic
get_images('thunder')
问题出在运行' get_soap()'功能。这是正常的吗?有什么想让它更快?