如何将特定单词封装在括号中?

时间:2016-04-12 17:31:05

标签: python regex

我是Python的新手,这是我的问题。我有一套词:

entities = ['blab', 'r1', 'zss']

我想检测它们,我想在没有

的情况下封装它们

For Instance:

this r1 is about zsse - > this [r1] is about [zsse]

如果已经封装,我将不会更改任何内容,例如,[ blablab r1 blabala ]仍然是相同的。

我尝试了一些东西,但它不起作用:

for s in sentences:
    for e in entities:
        if re.search(r"\[\[%s\]\]" % e, s):
            pass
        else:
            s=s.replace(e,'[['+e+']]')

        New_sentences.append(s)

4 个答案:

答案 0 :(得分:2)

我就是这样做的。请注意,我使用两种不同的正则表达式:

  • (\[.*?])确定括号内的区域
  • '({})'.format('|'.join(entities))匹配非括号内区域内的任何实体。
import re

brackets = re.compile(r'(\[.*?])')
def rewrite(sentence, entities):
    sentence = brackets.split(sentence)
    entities = re.compile('({})'.format('|'.join(entities)))
    for i, phrase in enumerate(sentence):
        if not phrase.startswith('['):
            sentence[i] = entities.sub(r'[\1]', phrase)
    sentence = ''.join(sentence)
    return sentence

print rewrite('this r1 is about zsse', ['blab', 'r1', 'zss'])
print rewrite('[ blablab r1 blabala ]', ['blab', 'r1', 'zss'])

结果:

$ python x.py 
this [r1] is about [zss]e
[ blablab r1 blabala ]

答案 1 :(得分:0)

如果您只关心entities是句子中单词的子串,那么您就不必使用正则表达式。

sentences = ['this r1 is about zsse', '[blablab r1 blabala]']
entities = ['blab', 'r1', 'zss']
new_sentences = []

for sentence in sentences:
    if sentence.startswith('[') and sentence.endswith(']'):
        new_sentences.append(sentence)
        continue

    sentence = sentence.split(' ')

    for index, word in enumerate(sentence):
        for entity in entities:
            if entity in word:
                sentence[index] = '[{w}]'.format(w=word)

    new_sentences.append(' '.join(sentence))

print new_sentences
>>> ['this [r1] is about [zsse]', '[blablab r1 blabala]']

答案 2 :(得分:0)

试试这段代码:

import re

entities = ['blab', 'r1', 'zss']
sentences = ['this r1 is about zsse', 'this [r1] is about [zss]e']
new_sentences = []

for s in sentences:
    for e in entities:
        if re.search(r"\[%s\]" % e, s):
            pass
        else:
            s=s.replace(e,'['+e+']')
    new_sentences.append(s)

print(new_sentences)
# >>> ['this [r1] is about [zss]e', 'this [r1] is about [zss]e']

Demo on Ideone

您的代码唯一的问题是new_sentences.append(s)缩进太多了。每次new_sentences循环播放时,.append都被entities版。因此,对于每个sentence,有3 new_sentences

您似乎还拥有[[]]\[\[\]\]而非[]\[\]

的所有内容

答案 3 :(得分:-1)

你可以试试这个:

In [48]: eentities = ['blab', 'r1', 'zss']

In [49]: s = 'this r1 is about zsse'

In [50]: import re

In [52]: rs = re.compile('|'.join(eentities))

In [60]: sl = list()

for se in s.split():
    if(rs.match(se)):
        sl.append('[{0}]'.format(se))
    else:
        sl.append(se)

In [62]: sl
Out[62]: ['this', '[r1]', 'is', 'about', '[zsse]']

In [63]: ' '.join(sl)
Out[63]: 'this [r1] is about [zsse]'