我是Python的新手,这是我的问题。我有一套词:
entities = ['blab', 'r1', 'zss']
我想检测它们,我想在没有
的情况下封装它们For Instance:
this r1 is about zsse
- > this [r1] is about [zsse]
如果已经封装,我将不会更改任何内容,例如,[ blablab r1 blabala ]
仍然是相同的。
我尝试了一些东西,但它不起作用:
for s in sentences:
for e in entities:
if re.search(r"\[\[%s\]\]" % e, s):
pass
else:
s=s.replace(e,'[['+e+']]')
New_sentences.append(s)
答案 0 :(得分:2)
我就是这样做的。请注意,我使用两种不同的正则表达式:
(\[.*?])
确定括号内的区域'({})'.format('|'.join(entities))
匹配非括号内区域内的任何实体。import re
brackets = re.compile(r'(\[.*?])')
def rewrite(sentence, entities):
sentence = brackets.split(sentence)
entities = re.compile('({})'.format('|'.join(entities)))
for i, phrase in enumerate(sentence):
if not phrase.startswith('['):
sentence[i] = entities.sub(r'[\1]', phrase)
sentence = ''.join(sentence)
return sentence
print rewrite('this r1 is about zsse', ['blab', 'r1', 'zss'])
print rewrite('[ blablab r1 blabala ]', ['blab', 'r1', 'zss'])
结果:
$ python x.py
this [r1] is about [zss]e
[ blablab r1 blabala ]
答案 1 :(得分:0)
如果您只关心entities
是句子中单词的子串,那么您就不必使用正则表达式。
sentences = ['this r1 is about zsse', '[blablab r1 blabala]']
entities = ['blab', 'r1', 'zss']
new_sentences = []
for sentence in sentences:
if sentence.startswith('[') and sentence.endswith(']'):
new_sentences.append(sentence)
continue
sentence = sentence.split(' ')
for index, word in enumerate(sentence):
for entity in entities:
if entity in word:
sentence[index] = '[{w}]'.format(w=word)
new_sentences.append(' '.join(sentence))
print new_sentences
>>> ['this [r1] is about [zsse]', '[blablab r1 blabala]']
答案 2 :(得分:0)
试试这段代码:
import re
entities = ['blab', 'r1', 'zss']
sentences = ['this r1 is about zsse', 'this [r1] is about [zss]e']
new_sentences = []
for s in sentences:
for e in entities:
if re.search(r"\[%s\]" % e, s):
pass
else:
s=s.replace(e,'['+e+']')
new_sentences.append(s)
print(new_sentences)
# >>> ['this [r1] is about [zss]e', 'this [r1] is about [zss]e']
您的代码唯一的问题是new_sentences.append(s)
缩进太多了。每次new_sentences
循环播放时,.append
都被entities
版。因此,对于每个sentence
,有3 new_sentences
。
您似乎还拥有[[]]
或\[\[\]\]
而非[]
或\[\]
答案 3 :(得分:-1)
你可以试试这个:
In [48]: eentities = ['blab', 'r1', 'zss']
In [49]: s = 'this r1 is about zsse'
In [50]: import re
In [52]: rs = re.compile('|'.join(eentities))
In [60]: sl = list()
for se in s.split():
if(rs.match(se)):
sl.append('[{0}]'.format(se))
else:
sl.append(se)
In [62]: sl
Out[62]: ['this', '[r1]', 'is', 'about', '[zsse]']
In [63]: ' '.join(sl)
Out[63]: 'this [r1] is about [zsse]'