为什么找不到对象是不可迭代的?

时间:2019-05-24 05:50:43

标签: python mongodb beautifulsoup

我想抓取数据站点。 但是我的代码有问题

我想找到为什么查找对象错误的原因 并在堆栈溢出中搜索,但我找不到此代码出了什么问题

from bs4 import BeautifulSoup
from pymongo import MongoClient
import requests
from matplotlib import font_manager, rc

client = MongoClient("localhost", 27017)
database = client.datadb
collection = database.datacol

page = requests.get("https://www.worlddata.info/average-income.php")

soup = BeautifulSoup(page.content, 'html.parser')

general_list = soup.find("tr")

#list_of_tr = general_list.find("tr")

for in_each_tr in general_list:
    list_of_td0 = general_list.find_all("td")[0]
    list_of_td1 = general_list.find_all("td")[1]
    general_list = collection.insert_one({"country":list_of_td0.get_text(), "income":list_of_td1.get_text()})


Traceback (most recent call last):
  File "C:/Users/SAMSUNG/PycharmProjects/simple/data.py", line 18, in <module>
    for in_each_tr in general_list:
TypeError: 'NoneType' object is not iterable

4 个答案:

答案 0 :(得分:0)

您的general_list的值为none

在对对象执行操作之前,您需要添加验证。

我假设此地址返回了禁止的错误,因此响应没有<tr>

如果您将地址更改为:

page = requests.get("https://www.google.com")

soup = BeautifulSoup(page.content, 'html.parser')

general_list = soup.find("tr")

for tr in general_list: 
    print(tr)

有效。

答案 1 :(得分:0)

https://www.worlddata.info/average-income.php正在根据ajax请求加载数据,因此您需要使用硒下载动态内容。

首先根据浏览器安装Selenium Web驱动程序。

导入硒Web驱动程序

from selenium import webdriver

下载网页内容

driver = webdriver.Chrome("/usr/bin/chromedriver")
driver.get('https://www.worlddata.info/average-income.php')

"/usr/bin/chromedriver" Webdriver路径在哪里

获取html内容

soup = BeautifulSoup(driver.page_source, 'lxml')

现在您将获得tr tag对象

general_list = soup.find("tr")

答案 2 :(得分:0)

似乎requests.get("https://www.worlddata.info/average-income.php")给出了403作为响应,这意味着禁止访问该网页。

我进行了一次谷歌搜索,发现this StackOverflow post。它说某些网页可以拒绝GET不能识别User-Agent的请求。

如果您像这样向requests.get添加标头:

header = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 Safari/537.36'}
page = requests.get("https://www.worlddata.info/average-income.php", headers=header)

然后,GET请求的响应将为200,并且您的代码应按预期工作。

答案 3 :(得分:0)

我还有更多问题

from bs4 import BeautifulSoup
from pymongo import MongoClient
import requests
from selenium import webdriver
from matplotlib import font_manager, rc

client = MongoClient("localhost", 27017)
database = client.datadb
collection = database.datacol

driver = webdriver.Chrome("C:\chromedriver")
driver.get('https://www.worlddata.info/average-income.php')

page = requests.get("https://www.worlddata.info/average-income.php")

soup = BeautifulSoup(driver.page_source, 'lxml')
#soup = BeautifulSoup(page.content, 'html.parser')

general_list = soup.find("tr")

for in_each_tr in general_list:
    list_of_td0 = general_list.find_all("a")
    list_of_td1 = general_list.find_all(class_="right nowrap")[0]
    list_all = collection.insert_one({"country:" + list_of_td0.get_text() + ", income 
    :" + list_of_td1.get_text()})

我有这个错误

selenium.common.exceptions.WebDriverException:消息:无法访问Chrome   (会议信息:chrome = 74.0.3729.169)   (驱动程序信息:chromedriver = 74.0.3729.6(255758eccf3d244491b8a1317aa76e1ce10d57e9-refs / branch-heads / 3729 @ {#29}),platform = Windows NT 10.0.17763 x86_64)