Question

所以我的Django（v 1.5）模型有一个功能，它接受一个文本体并找到我的所有标签，例如并为用户转换正确的标签并删除所有其他标签。

以下功能目前有效，但要求我使用note_tags =＆＃39;。*？\ r \ n＆＃39;因为标签组0找到所有标签，无论用户的昵称是否在那里。好奇我将如何使用这些组，以便我可以删除所有无用的标签，而无需修改RegEx。

def format_for_user(self, user):
    body = self.body
    note_tags = '<note .*?>.*?</note>\r\n'
    user_msg = False
    if not user is None:
        user_tags = '(<note %s>).*?</note>' % user.nickname
        user_tags = re.compile(user_tags)
        for tag in user_tags.finditer(body):
            if tag.groups(1):
                replacement = str(tag.groups(1)[0])
                body = body.replace(replacement, '<span>')
                replacement = str(tag.group(0)[-7:])
                body = body.replace(replacement, '</span>')
                user_msg = True
                note_tags = '<note .*?>.*?</span>\r\n'
    note_tags = re.compile(note_tags)
    for tag in note_tags.finditer(body):
        body = body.replace(tag.group(0), '')
    return (body, user_msg)

Answer 1

所以abarnert是正确的，我不应该使用正则表达式来解析我的Html，而是应该使用BeautifulSoup的一些东西。

所以我使用了BeautifulSoup，这是生成的代码，解决了Regex所遇到的很多问题。

def format_for_user(self, user):
    body = self.body
    soup = BeautifulSoup(body)
    user_msg = False
    if not user is None:
        user_tags = soup.findAll('note', {"class": "%s" % user.nickname})
        for tag in user_tags:
            tag.name = 'span'
    all_tags = soup.findAll('note')
    for tag in all_tags:
        tag.decompose()
    soup = soup.prettify()
    return (soup, user_msg)

修改正则表达式匹配中的组

1 个答案: