我有一个网址列表,其中我通过循环整个网址列表来抓取每个网页的标题名称
问题是代码分解列表中的url无效。所以我尝试使用尝试和除了来传递错误,如果尝试过,除非不能正常工作
以下是我正在使用的代码,(如果我在这里遗漏了某些内容,请更正)
import requests
from bs4 import BeautifulSoup as BS
url_list = ['http://www.aurecongroup.com',
'http://www.bendigoadelaide.com.au',
'http://www.burrell.com.au',
'http://www.dsdbi.vic.gov.au',
'http://www.energyaustralia.com.au',
'http://www.executiveboard.com',
'http://www.mallesons.com',
'https://www.minterellison.com',
'http://www.mta.org.nz',
'http://www.services.nsw.gov.au']
for link in url_list:
try:
r = requests.get(link)
r.encoding = 'utf-8'
html_content = r.text
soup = BS(html_content, 'lxml')
df = soup.title.string
print(df)
except IOError:
pass
执行上面的代码给了我AttributeError: 'NoneType' object has no attribute 'string'
。
有人可以帮我这个吗?
答案 0 :(得分:3)
如果您希望仅跳过错误的迭代,请将try-catch
移动到循环中。
for link in url_list:
try:
r = requests.get(link)
...
except (IOError, AttributeError):
pass
答案 1 :(得分:2)
执行此操作:
if (typeof settings.onGreenify == "function") {
settings.onGreenify ();
}
答案 2 :(得分:1)
这个怎么样:
import requests
from bs4 import BeautifulSoup
url_list = [
'http://www.aurecongroup.com',
'http://www.bendigoadelaide.com.au',
'http://www.burrell.com.au',
'http://www.dsdbi.vic.gov.au',
'http://www.energyaustralia.com.au',
'http://www.executiveboard.com',
'http://www.mallesons.com',
'https://www.minterellison.com',
'http://www.mta.org.nz',
'http://www.services.nsw.gov.au'
]
for link in url_list:
res = requests.get(link)
soup = BeautifulSoup(res.text, 'lxml')
try:
df = soup.title.string.strip()
except Exception:
df = ""
print(df)
部分输出包括无:
Aurecon – A global engineering and infrastructure advisory company
####It gives the none value
Stockbroking & Superannuation Brisbane | Burrell
Home | Economic Development
Electricity Providers - Gas Suppliers | EnergyAustralia
答案 3 :(得分:0)
#include<stdio.h>
#include<conio.h>
void lol(char *s1,int *i) {
while(*s1!='\0') {
s1 = (s1 + 1);
*i=*i+1;
}
}
void main(void) {
char s1[] = "hello";
int i=0;
lol(s1,&i);
printf("%d", i);
_getch();
}
答案 4 :(得分:0)
Try:
应为小写try:
。并在for link in url_list:
之后错过制表。
import requests
from bs4 import BeautifulSoup as BS
url_list = ['Http://www.aurecongroup.com',
'Http://www.burrell.com.au',
'Http://www.dsdbi.vic.gov.au',
'Http://www.energyaustralia.com.au',
'Http://www.executiveboard.com',
'Http://www.mallesons.com',
'Https://www.minterellison.com',
'Http://www.mta.org.nz',
'Http://www.services.nsw.gov.au']
try:
for link in url_list:
r = requests.get(link)
r.encoding = 'utf-8'
html_content = r.text
soup = BS(html_content, 'lxml')
df = soup.title.string
print(df)
except IOError:
pass