我试图获得<
和>
的真实tag
的位置,当它们嵌入像<tag "510270">calculate</>
这样的内容时。
我有这样的句子:
sentence = "After six weeks and seventeen tentative approaches the only serious
tender came from Daniel. He had offered a paltry #2 a week for the one-time
woodman's home, sane enough in this, at least, to <tag "510270">calculate</>
safety to the nearest new penny piece. "
sentence2 = "After six weeks and seventeen tentative approaches the only serious
tender came from Daniel. He had offered a paltry #2 a week for the one-time
woodman's < home, sane enough in this, at least, to <tag "510270">calculate</>
safety to the nearest new penny > piece. "
sentence3 = "After six weeks and seventeen tentative approaches the only serious
tender came from Daniel. He had offered a paltry #2 a week for the one-time
woodman's > home, sane enough in this, at least, to <tag "510270">calculate</>
safety to the nearest new penny < piece. "
我需要cfrom和incfrom成为<
中第1个和第2个<tag "XXXX">...</>
的位置,我需要cto和incto成为第2个和第1个>
的位置<tag "XXXX">...</>
我怎样才能对句子2和句子3这样的句子进行处理,其中<
或>
出现在<tag "XXXX">...</>
之外?
对于sentence1,我可以这样做:
cfrom,cto = 0,0
for i,c in enumerate(sentence1):
if c == "<":
cfrom == i
break
for i,c in enumerate(sentence1.reverse):
if c == ">":
cto == len(sentence)-i
break
incfrom incto = 0,0
fromtrigger, totrigger = False, False
for i,c in enumerate(sentence1[cfrom:]):
if c == ">":
incfrom = cfrom+i
break
for i,c in enumerate(sentence1[incfrom:cto]):
if c == "<":
incto = i
break
答案 0 :(得分:1)
如下所示,您可以在找到标签时跟踪您的位置:
def parseSentence(sentence):
cfrom, cto, incfrom, incto = 0, 0, 0, 0
place = '' #to keep track of where we are
for i in range(len(sentence)):
c = sentence[i]
if (c == '<'):
#check for 'cfrom'
if (sentence[i : i + 4] == '<tag'):
cfrom = i
place = 'botag' #begin-open-tag
#check for 'incfrom'
elif (sentence[i + 1] == '/' and place == 'intag'):
incfrom = i
place = 'bctag' #begin-close-tag
elif (c == '>'):
#check for 'cto'
if (place == 'botag'): #just after '<tag...'
cto = i
place = 'intag' #now within the XML tag
#check for 'incto'
elif (place == 'bctag'):
incto = i
place = ''
yield (cfrom, cto, incfrom, incto)
这应该适用于你的所有句子,但请注意,如果你的句子中只有一个 <tag>...</>
,它将真正起作用。如果有多个,它将返回最后<tag>...</>
的位置。
修改:如果您在函数中添加yield
,如果您有多个<tag>...</>
,它将迭代句子中所有{{1}}个标记的位置(参见上文) )。
答案 1 :(得分:0)
如果我理解正确,这应该有用(假设你不改变变量i ,c
)
cfrom,cto = 0,0
for i,c in enumerate(sentence1):
if c == "<tag":
cfrom == i
break
for i,c in enumerate(sentence1):
if c == ">":
cto == i \\going forward from cfrom
break
incfrom incto = 0,0
fromtrigger, totrigger = False, False
for i,c in enumerate(sentence1[cto:]):\\after the tag is opened, look for the start of closing tag
if c == "</":
incfrom = i
break
for i,c in enumerate(sentence1[cto:]):
if c == ">":
incto = i
break