我正在尝试从文本文件中的多个URL下载/提取文章,然后想在CSV文件中提取相同的文章
我正在创建一个博客,其中包含与特定主题相关的新闻,我想使用python从文本文件中的URL中提取新闻
from newspaper import Article
with open("untitled.txt") as url_file:
lines = url_file.readlines()
url = lines
for line in lines:
article = Article(url)
AttributeError Traceback (most recent call last)
<ipython-input-47-ac8a2b1aab1a> in <module>
1 for line in lines:
----> 2 article = Article(url)
~\Anaconda3\lib\site-packages\newspaper\article.py in __init__(self, url, title, source_url, config, **kwargs)
58
59 if source_url == '':
---> 60 scheme = urls.get_scheme(url)
61 if scheme is None:
62 scheme = 'http'
~\Anaconda3\lib\site-packages\newspaper\urls.py in get_scheme(abs_url, **kwargs)
277 if abs_url is None:
278 return None
--> 279 return urlparse(abs_url, **kwargs).scheme
280
281
~\Anaconda3\lib\urllib\parse.py in urlparse(url, scheme, allow_fragments)
365 Note that we don't break the components up in smaller bits
366 (e.g. netloc is a single string) and we don't expand % escapes."""
--> 367 url, scheme, _coerce_result = _coerce_args(url, scheme)
368 splitresult = urlsplit(url, scheme, allow_fragments)
369 scheme, netloc, url, query, fragment = splitresult
~\Anaconda3\lib\urllib\parse.py in _coerce_args(*args)
121 if str_input:
122 return args + (_noop,)
--> 123 return _decode_args(args) + (_encode_result,)
124
125 # Result objects are more helpful than simple tuples
~\Anaconda3\lib\urllib\parse.py in _decode_args(args, encoding, errors)
105 def _decode_args(args, encoding=_implicit_encoding,
106 errors=_implicit_errors):
--> 107 return tuple(x.decode(encoding, errors) if x else '' for x in args)
108
109 def _coerce_args(*args):
~\Anaconda3\lib\urllib\parse.py in <genexpr>(.0)
105 def _decode_args(args, encoding=_implicit_encoding,
106 errors=_implicit_errors):
--> 107 return tuple(x.decode(encoding, errors) if x else '' for x in args)
108
109 def _coerce_args(*args):
AttributeError: 'list' object has no attribute 'decode'
我想复制该过程,以便可以从数百个URL中提取文本。有没有一种设置方法,所以我可以创建一个包含文章的文本文件并提取文章
根据建议更新1,我更新了代码,但是仍然无法从URL中提取所有文章
from newspaper import Article
with open("untitled.txt") as url_file:
lines = url_file.readlines()
for line in lines:
article = Article(line)
article.download()
article.text
我想从URL列表中提取所有文章。