如何使用“ NewsPaper3k”调试性能问题?

时间:2019-07-14 20:10:37

标签: python web-scraping python-newspaper

我正在使用newspaper3k库,并且编写了以下脚本:

config = {
    'language' : 'ar',
    'fetch_images' : False,
    'MAX_FILE_MEMO' : 200000,
    'number_threads': 1,
}

sources = ['url1', 'url2', ..., 'urln']
for source in sources:
    try:
        articles = newspaper.build(source, **config).articles
        if len(articles) > 0:
            process_articles(source, articles)
    except:
        print('Failed on Source: {}'.format(source))

def process_articles(source, articles,):
    for article in articles:
        try:
            article.download()

            ''' process the article which involve text exaction,
            analysis and persisting data for each article '''  

        except Exception as e:
            print('failed on article {}, error message: {}'.format(article.url, str(e)))

source对象是新闻网站的URL列表。当我运行上一个脚本(python script.py)并获取了一定数量的文章后,我的本地计算机挂起。在服务器上运行相同脚本时,该进程将被终止而不会显示任何错误。有什么想法是问题的原因是什么,或者我如何检测或调试它?

0 个答案:

没有答案