正则表达式使用beautifulsoup赢得了预期的匹配

时间:2015-12-24 06:51:27

标签: python regex beautifulsoup

我试图使用正则表达式来匹配标签与class =" calendar-days-list2"但不是class =" calendar-days-list2 prev-next-month"。我加载了一个带有包含两个选项的标签的HTML示例。

当我使用re.findall()搜索示例HTML时,正则表达式匹配我想要的。当我在beautifulsoup中使用该样本正则表达式时,它返回所需类和不需要的类。我不明白为什么会这样,有什么想法吗?谢谢!

let newView = self.storyboard!.instantiateViewControllerWithIdentifier("NewViewController") as! NewViewController

newView.modalPresentationStyle = UIModalPresentationStyle.OverFullScreen

self.presentViewController(newView, animated: true, completion: nil)

输出:

html = '''<td id="pagestructure_0_pagecontent_0_calendar1_2016_1_7_0" class="calendar-days-list2" width="14%">
       <span class="date-number">7</span>
            <p>
              <img src="/wac/wacassets/images/icons/h1.gif" border="0">
              <a href="http://www.woodruffcenter.org/Commerce/MuseumAdmissions?performanceId=86514">Special Exhibitions</a>
              10:00 AM
            </p>

          <td id="pagestructure_0_pagecontent_0_calendar1_2015_11_29_1"    class="calendar-days-list2 prev-next-month" width="14%"></td>
       '''

soup = BeautifulSoup(html)
# WORKS
print re.findall(r"(calendar\-days\-list2)(?!\sprev\-next\-month)",html), "\n\n"

regex = re.compile(r"(calendar\-days\-list2)(?!\sprev\-next\-month)")
# DOESN'T WORK
tds = soup.find_all("td", {"class": regex})
print tds

`

1 个答案:

答案 0 :(得分:1)

regex = re.compile(r"(calendar\-days\-list2)(?!\sprev\-next\-month)")
# DOESN'T WORK
tds = soup.find_all("td", {"class": regex})

这不起作用,因为正则表达式分别应用于每个类值而不是整个属性值。这是因为 class是一个特殊的多值属性。最近有几个与问题相关的帖子:

最简单的方法可能是使用CSS selector进行完整class属性匹配:

soup.select('[class="calendar-days-list2"]')