正则表达式或任何顺序的声明

时间:2012-05-11 23:33:00

标签: python regex

Python正则表达式我有一个包含关键字的字符串,但有时关键字不存在,并且它们不在任何特定的oder中。我需要正则表达式的帮助。

关键字是:

Up-to-date
date added
date trained

这些是我需要在许多其他关键字中找到的关键字,它们可能不存在,并且可以按任何顺序排列。

刺痛的样子

<div>
<h2 class='someClass'>text</h2>

 blah blah blah Up-to-date blah date added blah

</div>

我尝试了什么:

regex = re.compile('</h2>.*(Up\-to\-date|date\sadded|date\strained)*.*</div>') 

regex = re.compile('</h2>.*(Up\-to\-date?)|(date\sadded?)|(date\strained?).*</div>')

re.findall(regex,string) 

我正在寻找的结果将是:

If all exists
['Up-to-date','date added','date trained']

If some exists
['Up-to-date','','date trained']

2 个答案:

答案 0 :(得分:0)

它必须是正则表达式吗?如果没有,您可以使用find

In [12]: sentence = 'hello world cat dog'

In [13]: words = ['cat', 'bear', 'dog']

In [15]: [w*(sentence.find(w)>=0) for w in words]
Out[15]: ['cat', '', 'dog']

答案 1 :(得分:0)

这段代码可以满足您的需求,但它有点气味:

import re

def check(the_str):
    output_list = []
    u2d = re.compile('</h2>.*Up\-to\-date*.*</div>') 
    da = re.compile('</h2>.*date\sadded*.*</div>')
    dt = re.compile('</h2>.*date\strained*.*</div>')
    if re.match(u2d, the_str):
        output_list.append("Up-to-date")
    if re.match(da, the_str):
        output_list.append("date added")
    if re.match(dt, the_str):
        output_list.append("date trained")

    return output_list

the_str = "</h2>My super cool string with the date added and then some more text</div>"
print check(the_str)
the_str2 = "</h2>My super cool string date added with the date trained and then some more text</div>"
print check(the_str2)
the_str3 = "</h2>My super cool string date added with the date trained and then Up-to-date some more text</div>"
print check(the_str3)