我可以使用以下代码抓取单个菜单项:
import requests, bs4, re
aiMenu = requests.get('http://138.23.12.141/foodpro/shortmenu.asp?sName=University+of+California%2C+Riverside+Dining+Services&locationNum=03&locationName=A+-+I+Residential+Restaurant&naFlag=1&WeeksMenus=This+Week%27s+Menus&myaction=read&dtdate=1%2F4%2F2016')
aiMenuSoup = bs4.BeautifulSoup(aiMenu.text)
rawBreakfast = aiMenuSoup.select('.shortmenurecipes')
breakfast = str(rawBreakfast)
menuItems = 14
for i in range(len(breakfast)):
printVersion = rawBreakfast[i].getText()
print(printVersion)
我想要做的就是让他们在用餐时间和标题下。我如何使用bs4来做到这一点?
答案 0 :(得分:0)
你需要更复杂的东西来获得它。
我使用lxml
代替bs
来使用xpath
import requests
import lxml, lxml.html
url = 'http://138.23.12.141/foodpro/shortmenu.asp?sName=University+of+California%2C+Riverside+Dining+Services&locationNum=03&locationName=A+-+I+Residential+Restaurant&naFlag=1&WeeksMenus=This+Week%27s+Menus&myaction=read&dtdate=1%2F4%2F2016'
response = requests.get(url)
html = lxml.html.fromstring(response.text)
columns = html.xpath('//td[@width="30%"]')
for col in columns:
meal_times = col.xpath('*//div[@class="shortmenumeals"]/text()')
for x in meal_times:
print(x)
rows = col.xpath('*//tr')
for row in rows:
headers = row.xpath('*/div[@class="shortmenucats"]/span/text()')
for x in headers:
print(x)
titles = row.xpath('*/div[@class="shortmenurecipes"]/span/a/text()')
for x in titles:
print(x)