我正在通过创建一个Web应用程序来练习Django,我可以用一个单词发送电子邮件,然后该应用程序将其翻译(从eng-西班牙文开始,反之亦然),然后每天给我发送几个单词来学习。
我的问题: 我不知道将翻译搜索字词的webscraper代码放在何处,也不知道如何在收到搜索字词时触发它,以便将结果添加到“结果”模型中
模型 我目前有两种型号。第一个模型包含我的搜索字词,第二个模型包含翻译结果-两者均从具有公共字段的抽象模型继承:
from django.db import models
from django.conf import settings
class CommonInfo(models.Model):
created_at = models.DateTimeField(auto_now_add=True)
updated_at = models.DateTimeField(auto_now=True)
class Meta:
abstract = True
class Search(CommonInfo):
search_term = models.CharField(max_length=100)
user = models.ForeignKey(
settings.AUTH_USER_MODEL,
on_delete=models.SET_NULL,
null=True
)
def __str__(self):
return self.search_term
class Result(CommonInfo):
search = models.ForeignKey(
Search,
on_delete=models.SET_NULL,
null=True
)
translation = models.CharField(max_length=100)
example = models.TextField()
is_english = models.BooleanField(default=True)
def __str__(self):
return self.translation
我的视图 我的视图有一个入口,该入口接收一个HTTP POST请求,其中包含来自Sendgrid解析器的已解析电子邮件。它从主题行中提取要翻译的单词,然后将其添加到搜索模型中,并将其链接到相关用户:
from vocab.models import Search
from django.views import View
from django.http import HttpResponse
from django.views.decorators.csrf import csrf_exempt
from django.utils.decorators import method_decorator
import re
from users.models import CustomUser
@method_decorator(csrf_exempt, name='dispatch')
class Parser(View):
def post(self, request, *args, **kwargs):
#pull out the from field
sender = request.POST.get('from')
#regex the actual email, turn into a string and assign to result_email
result_email = re.search("(?<=<).*?(?=>)", sender).group(0)
#lookup to see if it exists in the DB and throw an error if not
if CustomUser.objects.filter(email=result_email).exists() == False:
return HttpResponse("You do not have an account, please sign up first", status=401)
#PARSING
# parse subject
subject = str(request.POST.get('subject'))
# find user ID from DB
user = CustomUser.objects.get(email=result_email)
Search.objects.create(search_term=subject, user=user)
return HttpResponse("OK")
网络爬虫 我创建了一个网络抓取工具的轮廓,该轮廓应使用搜索到的单词,并从中创建一个网址(到SpanishDict网站),然后使用BeautifulSoup提取翻译和例句:
from requests import get
from requests.exceptions import RequestException
from contextlib import closing
from bs4 import BeautifulSoup
#creates a url from the word
def url_creator(word):
return 'https://www.spanishdict.com/translate/' + str(word).lower()
# get request using the url
def simple_get(url):
try:
with closing(get(url, stream=True)) as resp:
if is_good_response(resp):
return resp.content
else:
return None
except RequestException as error:
log_error('Error during request to %s : %s ' % (url, error))
return None
# checks the get request response is HTML
def is_good_response(resp):
content_type = resp.headers['Content-Type'].lower()
return (resp.status_code == 200
and content_type is not None
and content_type.find('html') > -1)
# logs an error if there are any issues
def log_error(error):
print(error)
# creates a beautiful soup object from the raw html
def bs_html_maker(raw_html):
return BeautifulSoup(raw_html, 'html.parser')
# finds the translation and example for the word being searched
def first_definition_finder(bs_html):
return bs_html.find(class_="dictionary-neodict-indent-1")
# works out the language being searched (inferring it from the results of the get request)
def language_finder(bs_html):
if bs_html.find(id="headword-and-quickdefs-es"):
return False
elif bs_html.find(id="headword-and-quickdefs-en"):
return True
else:
raise Exception("The word you searched didn't return anything, check your spelling")
# returns the translation, the example sentences and what language the search was in in a dictionary
def result_outputter(bs_html):
translation_dictionary = {}
is_english = language_finder(bs_html)
definition_block = first_definition_finder(bs_html)
definition = definition_block.find(class_="dictionary-neodict-translation-translation").string
examples = examples = definition_block.find(class_="dictionary-neodict-example").strings
example_string = "%s - %s" % (next(examples), next(examples))
translation_dictionary["definition"] = definition
translation_dictionary["example"] = example_string
translation_dictionary["is_english"] = is_english
return translation_dictionary
# pulls it all together in one method which will ideally be called whenever a search is saved to the database and the results can then be used to add the translation to the database
def vocab_translator(word):
url = url_creator(word)
raw_html = simple_get(url)
bs_html = bs_html_maker(raw_html)
return result_outputter(bs_html)
我的问题: 我不知道将翻译搜索字词的webscraper代码放在何处,也不知道如何在收到搜索字词时触发它,以便将结果添加到“结果”模型中
任何帮助将不胜感激。我目前正在学习Django,并且需要您提供任何反馈,因此对代码的任何注释也将非常有用。