美丽的汤AttributeError:' NoneType'对象没有属性' find'

时间:2017-01-05 01:00:27

标签: python python-3.x web-scraping beautifulsoup

我试图运行一个废弃菜单的Beautiful Soup脚本。它首先得到一个食品清单,然后在一个for循环中上升到树上,找到食物在哪个餐厅供应,以及哪个餐厅供应食物。然后它将信息添加到字典中,食物是关键,餐厅和餐厅是价值。这是代码:

foodDict = {}
foodList = bsObj.findAll("td")
for foodItem in foodList:
    print("foodItems: " +foodItem.getText())
    meal = foodItem.parent.parent.parent.find("h4").getText().lower()
    print("Meal: " +meal)
    diningHall = foodItem.parent.parent.parent.parent.parent.parent.find("h2").getText().lower()
    s = "-"
    seq = (meal, diningHall)
    mealAndHall = s.join(seq)
    foodDict[foodItem.getText().lower().strip()] = mealAndHall
    print(foodDict)

它经历了循环的第一次运行,但当它进入第二次运行时,它会返回错误:

foodItems: Bacon
Meal: breakfast
{'bacon': 'breakfast-chase/duckett'}
foodItems: Hard & Soft Cooked Eggs
Traceback (most recent call last):
  File "menuscrape.py", line 24, in <module>
    meal = foodItem.parent.parent.parent.find("h4").getText().lower()
AttributeError: 'NoneType' object has no attribute 'find'

有人可以解释为什么我会收到错误吗? NoneType的对象是foodItem吗?为什么我的代码会在第一次运行时获得我需要的信息,但在后续运行时却没有?我不完全明白。此外,如果有人有关于如何改变重复的父母的任何提示,那将是很好的。父母。父母。父母......我还在学习,所以如果你更愿意隐瞒这些信息。很好。提前谢谢。

编辑:

以下是来源:

url = "https://www.smith.edu/diningservices/menu_poc/cbord_menus.php"
response = requests.get(url)
bsObj = BeautifulSoup(response.content, "html.parser")

想要输出:

{'bacon': 'breakfast-chase/duckett', 'hard & soft cooked eggs': 'breakfast-chase/duckett', 'fried eggs': 'breakfast-chase/duckett', 'morning glory muffins': 'breakfast-chase/duckett', 'rolled oats': 'breakfast-chase/duckett', 'red grapes': 'breakfast-chase/duckett', 'red grapes': 'breakfast-chase/duckett', 'fresh pineapple': 'breakfast-chase/duckett', 'crudites & dip': 'lunch-chase/duckett', 'vegan pesto pizza': 'lunch-chase-duckett', 'pepperoni pizza': 'lunch-chase/duckett', 'extra cheese pizza': 'lunch-chase/duckett', 'caesar salad': 'lunch-chase/duckett', 'chocolate chip bars': 'lunch-chase/duckett', 'assorted fruit': 'dinner-chase/duckett', 'london broil': 'dinner-chase/duckett', 'vegan mushroom tofu': 'dinner-chase/duckett', 'oven-browned red potatoes': 'dinner-chase/duckett', 'baby carrots w/ parsley': 'dinner-chase/duckett', 'hummingbird cake w/ frosting': 'dinner-chase/duckett'}

1 个答案:

答案 0 :(得分:0)

foodDict = {}
Chase = soup.select_one('.context')
h2 = Chase.h2.text.lower()
for div in Chase.select('.col-xs-4'):
    h4 = div.h4.text.lower()
    value = '-'.join((h4,h2))

    for food in div('td'):
        key = food.text.strip().lower()
        foodDict[key] = value

出:

{'assorted fruit': 'dinner-chase/duckett',
 'baby carrots w/ parsley': 'dinner-chase/duckett',
 'bacon': 'breakfast-chase/duckett',
 'caesar salad': 'lunch-chase/duckett',
 'chocolate chip bars': 'lunch-chase/duckett',
 'crudites & dip': 'lunch-chase/duckett',
 'extra cheese pizza': 'lunch-chase/duckett',
 'fresh pineapple': 'breakfast-chase/duckett',
 'fried eggs': 'breakfast-chase/duckett',
 'hard & soft cooked eggs': 'breakfast-chase/duckett',
 'hummingbird cake w/ frosting': 'dinner-chase/duckett',
 'london broil': 'dinner-chase/duckett',
 'morning glory muffins': 'breakfast-chase/duckett',
 'oven-browned red potatoes': 'dinner-chase/duckett',
 'pepperoni pizza': 'lunch-chase/duckett',
 'red grapes': 'breakfast-chase/duckett',
 'rolled oats': 'breakfast-chase/duckett',
 'vegan mushroom tofu': 'dinner-chase/duckett',
 'vegan pesto pizza': 'lunch-chase/duckett'}