我是python&试图做一些新的东西。我在字典中有两个列表。让我们说,
List1: List2:
Anterior cord
cuneate nucleus Medulla oblongata
nucleus Spinal cord
Intermediolateral nucleus Spinal
sksdsj
british 7
我有一些文字行如下:
<s id="5239778-2">The name refers collectively to the cuneate nucleus and gracile nucleus, which are present at the junction between the spinal cord and the medulla oblongata.</s>
<s id="3691284-1">In the medulla oblongata, the arcuate nucleus is a group of neurons located on the anterior surface of the medullary pyramids.</s>
<s id="21120-99">Anterior horn cells, motoneurons located in the spinal.</s>
<s id="1053949-16">The Anterior cord syndrome results from injury to the anterior part of the spinal cord, causing weakness and loss of pain and thermal sensations below the injury site but preservation of proprioception that is usually carried in the posterior part of the spinal cord.</s>
<s id="69-7">...Meanwhile is the studio 7 album by British pop band 10cc.</s>
我必须从list1&amp;中返回那些属于字符串的行。 list2.So,我尝试过以下代码:
result = ""
if list1 in line and list2 in line:
i1 = re.sub('(?i)(\s+)(%s)(\s+)'%list1, '\\1<e1>\\2</e1>\\3', line)
i2 = re.sub('(?i)(\s+)(%s)(\s+)'%list2, '\\1<e2>\\2</e2>\\3', i1)
result = result + i2 + "\n"
continue
但我得到以下结果:
<s id="5239778-2">The name refers collectively to the <e1>cuneate nucleus</e1> and gracile nucleus, which are present at the junction between the spinal cord and the medulla oblongata.</s>
<s id="3691284-1">In the medulla oblongata, the arcuate <e1>nucleus</e1> is a group of neurons located on the anterior surface of the medullary pyramids.</s>
<s id="21120-99">Anterior horn cells, motoneurons located in the spinal.</s>
<s id="1053949-16">The <e1>Anterior</e1> <e2>cord</e2> syndrome results from injury to the <e1>anterior</e1> part of the spinal cord, causing weakness and loss of pain and thermal sensations below the injury site but preservation of proprioception that is usually carried in the posterior part of the spinal cord.</s>
<s id="69-7">...Meanwhile is the studio 7 album by British pop band 10cc.</s>
这里,只有结果第4行,我得到了匹配两个列表中的字符串,这是我想要的。但是,我不想得到那些只匹配一个字符串或没有字符串的行(例如结果行 - 1&amp; 3)。另外,如果匹配两个列表中的字符串,它应该标记它们(例如结果行-2)。
非常感谢任何形式的帮助。
答案 0 :(得分:5)
基本上,您希望在<e1>
标记中添加一些单词,在<e2>
标记中添加其他单词。是吗?
如果是这样,那么这样的事情就会:
#!/usr/bin/python
from __future__ import print_function
import re
text = '''\
<s id="5239778-2">The name refers collectively to the cuneate nucleus and gracile nucleus, which are present at the junction between the spinal cord and the medulla oblongata.</s>
<s id="3691284-1">In the medulla oblongata, the arcuate nucleus is a group of neurons located on the anterior surface of the medullary pyramids.</s>
<s id="21120-99">Anterior horn cells, motoneurons located in the spinal cord.</s>
<s id="1053949-16">The Anterior cord syndrome results from injury to the anterior part of the spinal cord, causing weakness and loss of pain and thermal sensations below the injury site but preservation of proprioception that is usually carried in the posterior part of the spinal cord.</s>'''
list1 = ('Anterior', 'cuneate nucleus', 'Intermediolateral nucleus')
list2 = ('cord', 'Medulla oblongata', 'Spinal cord')
# put phrases in \b so that they match the whole words
re1 = re.compile("(%s)" % "|".join(r"\b%s\b" % i for i in list1), re.IGNORECASE)
re2 = re.compile("(%s)" % "|".join(r"\b%s\b" % i for i in list2), re.IGNORECASE)
for line in text.split("\n"):
line = re1.sub(r"<e1>\1</e1>", line)
line = re2.sub(r"<e2>\1</e2>", line)
print(line)
输出:
<s id="5239778-2">The name refers collectively to the <e1>cuneate nucleus</e1> and gracile nucleus, which are present at the junction between the <e2>spinal cord</e2> and the <e2>medulla oblongata</e2>.</s>
<s id="3691284-1">In the <e2>medulla oblongata</e2>, the arcuate nucleus is a group of neurons located on the <e1>anterior</e1> surface of the medullary pyramids.</s>
<s id="21120-99"><e1>Anterior</e1> horn cells, motoneurons located in the <e2>spinal cord</e2>.</s>
<s id="1053949-16">The <e1>Anterior</e1> <e2>cord</e2> syndrome results from injury to the <e1>anterior</e1> part of the <e2>spinal cord</e2>, causing weakness and loss of pain and thermal sensations below the injury site but preservation of proprioception that is usually carried in the posterior part of the <e2>spinal cord</e2>.</s>
答案 1 :(得分:1)
这个怎么样:
result = ""
lines = ['<s id="5239778-2">The name refers collectively to the cuneate nucleus and gracile nucleus, which are present at the junction between the spinal cord and the medulla oblongata.</s>',
'<s id="3691284-1">In the medulla oblongata, the arcuate nucleus is a group of neurons located on the anterior surface of the medullary pyramids.</s>',
'<s id="21120-99">Anterior horn cells, motoneurons located in the spinal cord.</s>',
'<s id="1053949-16">The Anterior cord syndrome results from injury to the anterior part of the spinal cord, causing weakness and loss of pain and thermal sensations below the injury site but preservation of proprioception that is usually carried in the posterior part of the spinal cord.</s>']
for line in lines:
for item1 in list1:
if line.find(item1) != -1:
for item2 in list2:
if line.find(item2) != -1:
result = result + line + '\n'
break
break
print result