如何从简单的html表中提取行?

时间:2015-11-20 04:06:24

标签: python html beautifulsoup

由于某种原因,我无法从这个简单的html表中提取表。

from bs4 import BeautifulSoup
import requests

def main():
    html_doc = requests.get(
    'http://www.wolfson.cam.ac.uk/old-site/cgi/catering-menu?week=0;style=/0,vertical')

    soup = BeautifulSoup(html_doc.text, 'html.parser')
    table = soup.find('table')
    print table


if __name__ == '__main__':
    main()

我有桌子,但我不能很好地理解beautifulsoup文档,知道如何提取数据。数据位于tr标记中。

该网站显示了一个简单的HTML食物菜单。

我想输出星期几和当天的菜单:

Monday: 
    Lunch: some_lunch, Supper: some_food
Tuesday:
    Lunch: some_lunch, Supper: some_supper

等一周的所有日子。 '正式礼堂'可以忽略。

如何迭代tr标签以便我可以创建此输出?

1 个答案:

答案 0 :(得分:1)

我通常不提供直接的解决方案。您应该尝试一些代码,如果您遇到任何问题,请在此处发布。但无论如何,这就是我所写的,它应该有助于你的先发制人。

soup = BeautifulSoup(r.content)

rows = soup.findAll("tr")

for i in xrange(1,8):
    row = rows[i]
    print row.find("th").text
    for j in xrange(0,2):
        print rows[0].findAll("th")[j+1].text.strip(), ": ",
        td = row.findAll("td")[j]
        for p in td.findAll("p"):
            print p.text, ",",
        print
    print

输出看起来像这样:

Monday
Lunch:  Leek and Potato Soup, Spaghetti Bolognese with Garlic Bread, Red Pepper and Chickpea Stroganoff with Brown Rice, Chicken Goujons with Garlic Mayonnaise Dip, Vegetable Grills with Sweet Chilli Sauce, Coffee and Walnut Sponge with Custard,
Supper:  Leek and Potato Soup, Breaded Haddock with Lemon and Tartare Sauce, Vegetable Samosa with Lentil Dahl, Chilli Beef Wraps, Steamed Strawberry Sponge with Custard,

Tuesday
Lunch:  Tomato and Basil Soup, Pan-fried Harrisa Spiced Chicken with Roasted Vegetables, Vegetarian Spaghetti Bolognese with Garlic Bread, Jacket Potato with Various Fillings, Apple and Plum Pie with Custard,
Supper:  Tomato and Basil Soup, Lamb Tagine with Fruit Couscous, Vegetable Biryani with Naan Bread, Pan-fried Turkey Escalope, Raspberry Shortbread,