我试图运行一个废弃菜单的Beautiful Soup脚本。它首先得到一个食品清单,然后在一个for循环中上升到树上,找到食物在哪个餐厅供应,以及哪个餐厅供应食物。然后它将信息添加到字典中,食物是关键,餐厅和餐厅是价值。这是代码:
foodDict = {}
foodList = bsObj.findAll("td")
for foodItem in foodList:
print("foodItems: " +foodItem.getText())
meal = foodItem.parent.parent.parent.find("h4").getText().lower()
print("Meal: " +meal)
diningHall = foodItem.parent.parent.parent.parent.parent.parent.find("h2").getText().lower()
s = "-"
seq = (meal, diningHall)
mealAndHall = s.join(seq)
foodDict[foodItem.getText().lower().strip()] = mealAndHall
print(foodDict)
它经历了循环的第一次运行,但当它进入第二次运行时,它会返回错误:
foodItems: Bacon
Meal: breakfast
{'bacon': 'breakfast-chase/duckett'}
foodItems: Hard & Soft Cooked Eggs
Traceback (most recent call last):
File "menuscrape.py", line 24, in <module>
meal = foodItem.parent.parent.parent.find("h4").getText().lower()
AttributeError: 'NoneType' object has no attribute 'find'
有人可以解释为什么我会收到错误吗? NoneType的对象是foodItem吗?为什么我的代码会在第一次运行时获得我需要的信息,但在后续运行时却没有?我不完全明白。此外,如果有人有关于如何改变重复的父母的任何提示,那将是很好的。父母。父母。父母......我还在学习,所以如果你更愿意隐瞒这些信息。很好。提前谢谢。
编辑:
以下是来源:
url = "https://www.smith.edu/diningservices/menu_poc/cbord_menus.php"
response = requests.get(url)
bsObj = BeautifulSoup(response.content, "html.parser")
想要输出:
{'bacon': 'breakfast-chase/duckett', 'hard & soft cooked eggs': 'breakfast-chase/duckett', 'fried eggs': 'breakfast-chase/duckett', 'morning glory muffins': 'breakfast-chase/duckett', 'rolled oats': 'breakfast-chase/duckett', 'red grapes': 'breakfast-chase/duckett', 'red grapes': 'breakfast-chase/duckett', 'fresh pineapple': 'breakfast-chase/duckett', 'crudites & dip': 'lunch-chase/duckett', 'vegan pesto pizza': 'lunch-chase-duckett', 'pepperoni pizza': 'lunch-chase/duckett', 'extra cheese pizza': 'lunch-chase/duckett', 'caesar salad': 'lunch-chase/duckett', 'chocolate chip bars': 'lunch-chase/duckett', 'assorted fruit': 'dinner-chase/duckett', 'london broil': 'dinner-chase/duckett', 'vegan mushroom tofu': 'dinner-chase/duckett', 'oven-browned red potatoes': 'dinner-chase/duckett', 'baby carrots w/ parsley': 'dinner-chase/duckett', 'hummingbird cake w/ frosting': 'dinner-chase/duckett'}
答案 0 :(得分:0)
foodDict = {}
Chase = soup.select_one('.context')
h2 = Chase.h2.text.lower()
for div in Chase.select('.col-xs-4'):
h4 = div.h4.text.lower()
value = '-'.join((h4,h2))
for food in div('td'):
key = food.text.strip().lower()
foodDict[key] = value
出:
{'assorted fruit': 'dinner-chase/duckett',
'baby carrots w/ parsley': 'dinner-chase/duckett',
'bacon': 'breakfast-chase/duckett',
'caesar salad': 'lunch-chase/duckett',
'chocolate chip bars': 'lunch-chase/duckett',
'crudites & dip': 'lunch-chase/duckett',
'extra cheese pizza': 'lunch-chase/duckett',
'fresh pineapple': 'breakfast-chase/duckett',
'fried eggs': 'breakfast-chase/duckett',
'hard & soft cooked eggs': 'breakfast-chase/duckett',
'hummingbird cake w/ frosting': 'dinner-chase/duckett',
'london broil': 'dinner-chase/duckett',
'morning glory muffins': 'breakfast-chase/duckett',
'oven-browned red potatoes': 'dinner-chase/duckett',
'pepperoni pizza': 'lunch-chase/duckett',
'red grapes': 'breakfast-chase/duckett',
'rolled oats': 'breakfast-chase/duckett',
'vegan mushroom tofu': 'dinner-chase/duckett',
'vegan pesto pizza': 'lunch-chase/duckett'}