我正在使用newspaper3k库,并且编写了以下脚本:
config = {
'language' : 'ar',
'fetch_images' : False,
'MAX_FILE_MEMO' : 200000,
'number_threads': 1,
}
sources = ['url1', 'url2', ..., 'urln']
for source in sources:
try:
articles = newspaper.build(source, **config).articles
if len(articles) > 0:
process_articles(source, articles)
except:
print('Failed on Source: {}'.format(source))
def process_articles(source, articles,):
for article in articles:
try:
article.download()
''' process the article which involve text exaction,
analysis and persisting data for each article '''
except Exception as e:
print('failed on article {}, error message: {}'.format(article.url, str(e)))
source
对象是新闻网站的URL列表。当我运行上一个脚本(python script.py
)并获取了一定数量的文章后,我的本地计算机挂起。在服务器上运行相同脚本时,该进程将被终止而不会显示任何错误。有什么想法是问题的原因是什么,或者我如何检测或调试它?