我试图使用正则表达式来匹配标签与class =" calendar-days-list2"但不是class =" calendar-days-list2 prev-next-month"。我加载了一个带有包含两个选项的标签的HTML示例。
当我使用re.findall()搜索示例HTML时,正则表达式匹配我想要的。当我在beautifulsoup中使用该样本正则表达式时,它返回所需类和不需要的类。我不明白为什么会这样,有什么想法吗?谢谢!
let newView = self.storyboard!.instantiateViewControllerWithIdentifier("NewViewController") as! NewViewController
newView.modalPresentationStyle = UIModalPresentationStyle.OverFullScreen
self.presentViewController(newView, animated: true, completion: nil)
输出:
html = '''<td id="pagestructure_0_pagecontent_0_calendar1_2016_1_7_0" class="calendar-days-list2" width="14%">
<span class="date-number">7</span>
<p>
<img src="/wac/wacassets/images/icons/h1.gif" border="0">
<a href="http://www.woodruffcenter.org/Commerce/MuseumAdmissions?performanceId=86514">Special Exhibitions</a>
10:00 AM
</p>
<td id="pagestructure_0_pagecontent_0_calendar1_2015_11_29_1" class="calendar-days-list2 prev-next-month" width="14%"></td>
'''
soup = BeautifulSoup(html)
# WORKS
print re.findall(r"(calendar\-days\-list2)(?!\sprev\-next\-month)",html), "\n\n"
regex = re.compile(r"(calendar\-days\-list2)(?!\sprev\-next\-month)")
# DOESN'T WORK
tds = soup.find_all("td", {"class": regex})
print tds
`
答案 0 :(得分:1)
regex = re.compile(r"(calendar\-days\-list2)(?!\sprev\-next\-month)")
# DOESN'T WORK
tds = soup.find_all("td", {"class": regex})
这不起作用,因为正则表达式分别应用于每个类值而不是整个属性值。这是因为 class
是一个特殊的多值属性。最近有几个与问题相关的帖子:
最简单的方法可能是使用CSS selector进行完整class
属性匹配:
soup.select('[class="calendar-days-list2"]')