Question

我对Python很陌生，这可能是一种非常简单的错误，但不能解决错误。我试图从包含特定子字符串的网站获取链接，但是当我这样做时，得到“TypeError：'NoneType'对象不可迭代”。我相信这个问题与我从网站上获得的链接有关。谁知道这里有什么问题？

from bs4 import BeautifulSoup
from urllib.request import urlopen

html_page = urlopen("http://www.scoresway.com/?sport=soccer&page=competition&id=87&view=matches")
soup = BeautifulSoup(html_page, 'html.parser')
lista=[]
for link in soup.find_all('a'):
    lista.append(link.get('href'))

for text in lista:
    if "competition" in text:
        print (text)

Answer 1

在lista.append(link.get('href'))行link.get('href')中，None可以返回"competition" in text。之后，您尝试使用text，其中None可以等于link.get('href', '') - 它不是可迭代对象。要避免这种情况，请使用get()并设置默认值'' - 空字符串i=1 for (i in 1:max(transaged$flag)) { survey=as.data.frame(rbind(transaged$CHO[transaged$flag==i],transaged$HO[transaged$flag==i])) chisq.test(survey)$p.value result1 <- as.data.frame(cbind(flag=i,ChiSq=chisq.test(survey)$statistic,DF=chisq.test(survey)$parameter,Pvalue=chisq.test(survey)$p.value)) result<-rbind(result,result1) } finalage<-merge(result,unique(transaged[,.(HO_GROUP_CODE,START_DATE,flag)]),by='flag') finalage$identifier<-'AGE'是可迭代的。

Answer 2

您收到了TypeError个例外，因为某些'a'标记没有'href'属性，因此get('href')返回None，这是不可迭代的。

如果你替换它，你可以解决这个问题：

soup.find_all('a')

用这个：

soup.find_all('a', href=True)

确保所有链接都具有'href'属性

Answer 3

我在两个地方发现了错误。

首先，urllib模块没有request方法。

 from urllib.request import urlopen
 # should be
 from urllib import urlopen

第二个是当您从页面获取链接时，beautifulSoup正在返回None几个链接。

 print(lista) 
 # prints [None, u'http://facebook.com/scoresway', u'http://twitter.com/scoresway', ...., None]

正如您所看到的，您的列表中包含两个None，这就是为什么当您对其进行迭代时，会得到"TypeError: 'NoneType'。

如何解决？您应该从列表中删除None。

  for link in soup.find_all('a'):
      if link is not None:  # Add this line
          lista.append(link.get('href'))

TypeError：'NoneType'对象不能使用BeautifulSoup进行迭代

3 个答案: