我有一个Python类,它接受参数中的url并在新闻网站上启动爬虫。
完成对象的创建后,对象将存储在Elasticsearch集群中。
我想创建一个方法,接受输入Elasticsearch文档,并从中创建一个对象。
class NewsArticle():
def __init__(self, url):
self.url = url
# Launch a crawler and fill in the other fields like author, date, ect ...
@classmethod
def from_elasticsearch(cls, elasticsearch_article):
document = elasticsearch_article['_source']
obj = cls(document['url'])
obj.url = document['url']
obj.author = document['author']
.
.
.
问题在于,当我打电话......
# response is my document from elasticsearch
res = NewsArticle.from_elasticsearch(response)
...方法__init__
将被调用并将启动我的抓取工具。无论如何它不会启动我的爬虫或调用init方法吗?
答案 0 :(得分:1)
如何使用简单的if
和默认参数crawl
:
class NewsArticle():
def __init__(self, url, crawl=True):
self.url = url
if crawl:
# Launch a crawler and fill in the other fields like author, date, ect ...
@classmethod
def from_elasticsearch(cls, elasticsearch_article):
document = elasticsearch_article['_source']
obj = cls(document['url'], crawl=False)
obj.url = document['url']
obj.author = document['author']