我正在尝试使用urllib和BeautifulSoup捕获并解析多个URL,但是出现以下错误:
AttributeError:“列表”对象没有属性“超时”
据我了解,解析器告诉我我提交了一个列表,它正在寻找一个URL。如何处理多个URL?
这是我的代码:
from bs4 import BeautifulSoup
from bs4.element import Comment
import urllib.request
def tag_visible(element):
if element.parent.name in ['style', 'script', 'head', 'title', 'meta', '[document]']:
return False
if isinstance(element, Comment):
return False
return True
addresses = ["https://en.wikipedia.org", "https://stackoverflow.com", "https://techcrunch.com"]
def text_from_html(body):
soup = BeautifulSoup(body, 'html.parser')
texts = soup.findAll(text=True)
visible_texts = filter(tag_visible, texts)
return u" ".join(t.strip() for t in visible_texts)
html = urllib.request.urlopen(addresses).read()
print(text_from_html(html))
答案 0 :(得分:1)
您的错误清楚地说明了data class;
length
ndc1-ndc10 $20 ;
set sashelp.class;
array nd(*) $ ndc1-ndc10 ;
if age = 13 then do;
do i=1 to dim(nd);
nd{i}="Hello";
end;
end;
run;
这是因为urlopen不在列表中。您应该将其嵌套在这样的循环中:
'list' object has no attribute 'timeout'
我建议您使用比my_texts = []
for each in addresses
html = urllib.request.urlopen(addresses).read()
print(text_from_html(html)) # or assign to variable like:
my_texts.append(text_from_html(html))
更好的http模块,而使用urllib
(requests
)