由于某种原因,我无法从这个简单的html表中提取表。
from bs4 import BeautifulSoup
import requests
def main():
html_doc = requests.get(
'http://www.wolfson.cam.ac.uk/old-site/cgi/catering-menu?week=0;style=/0,vertical')
soup = BeautifulSoup(html_doc.text, 'html.parser')
table = soup.find('table')
print table
if __name__ == '__main__':
main()
我有桌子,但我不能很好地理解beautifulsoup文档,知道如何提取数据。数据位于tr
标记中。
该网站显示了一个简单的HTML食物菜单。
我想输出星期几和当天的菜单:
Monday:
Lunch: some_lunch, Supper: some_food
Tuesday:
Lunch: some_lunch, Supper: some_supper
等一周的所有日子。 '正式礼堂'可以忽略。
如何迭代tr
标签以便我可以创建此输出?
答案 0 :(得分:1)
我通常不提供直接的解决方案。您应该尝试一些代码,如果您遇到任何问题,请在此处发布。但无论如何,这就是我所写的,它应该有助于你的先发制人。
soup = BeautifulSoup(r.content) rows = soup.findAll("tr") for i in xrange(1,8): row = rows[i] print row.find("th").text for j in xrange(0,2): print rows[0].findAll("th")[j+1].text.strip(), ": ", td = row.findAll("td")[j] for p in td.findAll("p"): print p.text, ",", print print
输出看起来像这样:
Monday Lunch: Leek and Potato Soup, Spaghetti Bolognese with Garlic Bread, Red Pepper and Chickpea Stroganoff with Brown Rice, Chicken Goujons with Garlic Mayonnaise Dip, Vegetable Grills with Sweet Chilli Sauce, Coffee and Walnut Sponge with Custard, Supper: Leek and Potato Soup, Breaded Haddock with Lemon and Tartare Sauce, Vegetable Samosa with Lentil Dahl, Chilli Beef Wraps, Steamed Strawberry Sponge with Custard, Tuesday Lunch: Tomato and Basil Soup, Pan-fried Harrisa Spiced Chicken with Roasted Vegetables, Vegetarian Spaghetti Bolognese with Garlic Bread, Jacket Potato with Various Fillings, Apple and Plum Pie with Custard, Supper: Tomato and Basil Soup, Lamb Tagine with Fruit Couscous, Vegetable Biryani with Naan Bread, Pan-fried Turkey Escalope, Raspberry Shortbread,