我正在尝试获取deli
title
,然后在deli
title
下获取两个菜单项Made to Order Deli Core
和Turkey Chipotle Petite Wrap
?我使用美丽的汤4来做这个,它不起作用。主菜时间也是如此吗?
<html>
<head>
<title></title>
</head>
<body>
<table class="dayinner">
<tr class="lun">
<td class="mealname" colspan="3">LUNCH</td>
</tr>
<tr class="lun">
<td class="station"> Deli</td>
<td class="menuitem">
<div class="menuitem">
<input class="chk" id="S1L0000010000047598_35356" onclick=
"rptlist(this);" onmouseout="wschk(0);" onmouseover=
"wschk(1);" type="checkbox"> <span class="ul" onclick=
"nf('0000047598_35356');" onmouseout="pcls(this);"
onmouseover="ws(this);">Made to Order Deli Core</span>
</div>
</td>
<td class="price"></td>
</tr>
<tr class="lun">
<td class="station"> </td>
<td class="menuitem">
<div class="menuitem">
<input class="chk" id="S1L0000020000047933_06835" onclick=
"rptlist(this);" onmouseout="wschk(0);" onmouseover=
"wschk(1);" type="checkbox"> <span class="ul" onclick=
"nf('0000047933_06835');" onmouseout="pcls(this);"
onmouseover="ws(this);">Turkey Chipotle Petite Wrap</span>
</div>
</td>
<td class="price"></td>
</tr>
<tr class="lun">
<td colspan="3" style="height:3px;"></td>
</tr>
<tr class="lun">
<td colspan="3" style="background-color:#c0c0c0; height:1px;"></td>
</tr>
<tr class="lun">
<td class="station"> Entrée</td>
<td class="menuitem">
<div class="menuitem"><input class="chk" id=
"S1L0000030000044794_08943" onclick="rptlist(this);"
onmouseout="wschk(0);" onmouseover="wschk(1);" type="checkbox">
<span class="ul" onclick="nf('0000044794_08943');" onmouseout=
"pcls(this);" onmouseover="ws(this);">Steamed
Corn</span><img alt="Vegan" class="icon" src=
"images/g_062.gif"><img alt="Mindful Item" class="icon" src=
"images/m_051.gif"></div>
</td>
<td class="price"></td>
</tr>
<tr class="lun">
<td class="station"> </td>
<td class="menuitem">
<div class="menuitem">
<input class="chk" id="S1L0000040000033087_22244" onclick=
"rptlist(this);" onmouseout="wschk(0);" onmouseover=
"wschk(1);" type="checkbox"> <span class="ul" onclick=
"nf('0000033087_22244');" onmouseout="pcls(this);"
onmouseover="ws(this);">Cuban Mojo Roasted Pork Loin</span>
</div>
</td>
<td class="price"></td>
</tr>
</table>
</body>
</html>
或者如果我可以将它变成这样的XML格式:
<counter name="Deli">
<dish>
<name>Made to Order Deli Core</name>
</dish>
<dish>
<name>Turkey Chipotle Petite Wrap</name>
</dish>
</counter>
非常感谢你,我真的很感谢你花时间帮助我。
答案 0 :(得分:1)
实际上我使用了美丽的汤和元素树(用于xml解析)
获取<span>
# -*- coding: UTF-8 -*-
from bs4 import *
import xml.etree.ElementTree as ET
html='''<html>
<head>
<title></title>
</head>
<body>
<table class="dayinner">
<tr class="lun">
<td class="mealname" colspan="3">LUNCH</td>
</tr>
<tr class="lun">
<td class="station"> Deli</td>
<td class="menuitem">
<div class="menuitem">
<input class="chk" id="S1L0000010000047598_35356" onclick=
"rptlist(this);" onmouseout="wschk(0);" onmouseover=
"wschk(1);" type="checkbox"> <span class="ul" onclick=
"nf('0000047598_35356');" onmouseout="pcls(this);"
onmouseover="ws(this);">Made to Order Deli Core</span>
</div>
</td>
<td class="price"></td>
</tr>
<tr class="lun">
<td class="station"> </td>
<td class="menuitem">
<div class="menuitem">
<input class="chk" id="S1L0000020000047933_06835" onclick=
"rptlist(this);" onmouseout="wschk(0);" onmouseover=
"wschk(1);" type="checkbox"> <span class="ul" onclick=
"nf('0000047933_06835');" onmouseout="pcls(this);"
onmouseover="ws(this);">Turkey Chipotle Petite Wrap</span>
</div>
</td>
<td class="price"></td>
</tr>
<tr class="lun">
<td colspan="3" style="height:3px;"></td>
</tr>
<tr class="lun">
<td colspan="3" style="background-color:#c0c0c0; height:1px;"></td>
</tr>
<tr class="lun">
<td class="station"> Entrée</td>
<td class="menuitem">
<div class="menuitem"><input class="chk" id=
"S1L0000030000044794_08943" onclick="rptlist(this);"
onmouseout="wschk(0);" onmouseover="wschk(1);" type="checkbox">
<span class="ul" onclick="nf('0000044794_08943');" onmouseout=
"pcls(this);" onmouseover="ws(this);">Steamed
Corn</span><img alt="Vegan" class="icon" src=
"images/g_062.gif"><img alt="Mindful Item" class="icon" src=
"images/m_051.gif"></div>
</td>
<td class="price"></td>
</tr>
<tr class="lun">
<td class="station"> </td>
<td class="menuitem">
<div class="menuitem">
<input class="chk" id="S1L0000040000033087_22244" onclick=
"rptlist(this);" onmouseout="wschk(0);" onmouseover=
"wschk(1);" type="checkbox"> <span class="ul" onclick=
"nf('0000033087_22244');" onmouseout="pcls(this);"
onmouseover="ws(this);">Cuban Mojo Roasted Pork Loin</span>
</div>
</td>
<td class="price"></td>
</tr>
</table>
</body>
</html> '''
soup = BeautifulSoup(html)
counter = ET.Element('counter')
counter.set("name", "#Deli")
for i in soup.findAll('span'):
dish = ET.SubElement(counter, 'dish')
name = ET.SubElement(dish, 'name')
name.text= i.text.replace('\n',' ')
print ET.dump(counter)
答案 1 :(得分:1)
你可以这样:
# -*- coding: utf-8 -*-
soup = BeautifulSoup(html)
title = soup.find('td', class_='station').text.strip()
spans = soup.find_all('span', class_='ul')
# create the root of the XML file
root = ET.Element("counter")
root.set("name", title)
for item in spans:
# retrieve the text inside the <td class="station">
text = list(list(item.parents)[2].previous_siblings)[1].text.strip()
if text == u'Entrée':
break
dish = ET.SubElement(root, 'dish')
name = ET.SubElement(dish, 'name')
name.text = item.text.rstrip()
tree = ET.ElementTree(root)
tree.write("filename.xml")
这是所需xml文件的内容:
<counter name="Deli">
<dish>
<name>Made to Order Deli Core</name>
</dish>
<dish>
<name>Turkey Chipotle Petite Wrap</name>
</dish>
</counter>
非常重要的是在文件开头的上方包含以下行# -*- coding: utf-8 -*-
以避免重音出现问题,有关详细信息,请参阅SyntaxError: Non-ASCII character '\xa3' in file when function returns '£'。