使用偏移量和长度更改具有多个值的子字符串

时间:2015-11-18 01:02:19

标签: python arrays string offset opencalais

我正在努力从OpenCalais API中提取数据,以下是详细信息:

输入:一段(一个字符串,例如“Barack Obama是美国总统。”此外,返回的是一些具有偏移量和长度的实例变量,但不一定按发生顺序。

输出(我想要):相同的字符串,但带有超链接的标识实体实例(也是一个字符串),即

output="<a href="https://en.wikipedia.org/Barack_Obama"> Barack Obama </a> is the President of ""<a href="https://en.wikipedia.org/United_States"> United States. </a>"

但这真的是一个韵母问题。

这就是我所拥有的

#API CALLS ABOVE WHICH IS NOT RELEVANT. 

output=input
for x in range(0,result.print_entities()):
    print len(result.entities[x]["instances"])
    previdx=0
    idx=0
    for y in range(0,len(result.entities[x]["instances"])):

        try: 
            url= "https://permid.org/1-" + result.entities[x]['resolutions'][0]['permid']

        except:
            url="https://en.wikipedia.org/wiki/"+result.entities[x]    ["name"].replace(" ", "_")

        print "Generating wiki page link"
        print url+"\n"

 #THE PROBLEM STARTS HERE

         offsetstr=result.entities[x]["instances"][y]["offset"]
         lenstr=result.entities[x]["instances"][y]["length"]

         output=output[:offsetstr]+"<a href=" + url + ">" +   output[offsetstr:offsetstr+lenstr] + "</a>" + output[offsetstr+lenstr:]

print output

现在的问题是,如果你正确地阅读了代码,你就会知道在第一次迭代之后,输出字符串会改变 - 因此对于后续迭代,偏移值不再以相同的方式应用。所以,我无法做出预期的改变。

基本上试图获得:

input = "Barack Obama is the President of United States"

output= "<a href="https://en.wikipedia.org/Barack_Obama"> Barack Obama </a> is the President of ""<a href="https://en.wikipedia.org/United_States"> United States. </a>." 

我想知道怎么办呢?尝试拼接切割,但字符串只是乱码。

2 个答案:

答案 0 :(得分:0)

尝试使用另一个var来存储结果

output=input
res,preOffsetstr  = [],0
for x in range(0,result.print_entities()):
    print len(result.entities[x]["instances"])
    previdx=0
    idx=0
    for y in range(0,len(result.entities[x]["instances"])):

        try: 
            url= "https://permid.org/1-" + result.entities[x]['resolutions'][0]['permid']

        except:
            url="https://en.wikipedia.org/wiki/"+result.entities[x]    ["name"].replace(" ", "_")

        print "Generating wiki page link"
        print url+"\n"

 #THE PROBLEM STARTS HERE

         offsetstr=result.entities[x]["instances"][y]["offset"]
         lenstr=result.entities[x]["instances"][y]["length"]

         res.append(output[preOffsetstr :offsetstr]+"<a href=" + url + ">" +      output[offsetstr:offsetstr+lenstr] + "</a>" + output[offsetstr+lenstr:])


         preOffsetstr = offsetstr
print '\n'.join(res)

答案 1 :(得分:0)

我终于解决了它。采取一些主要的数学逻辑,但作为我最后的评论直觉 - “也许一个解决方案可以将{offset,length}元组存储在一个数组中,然后对偏移值进行排序,然后运行循环。任何帮助制作那个结构?“ - 这就是诡计。

output=input
l=[]
for x in range(0,result.print_entities()):
    print len(result.entities[x]["instances"])

    for y in range(0,len(result.entities[x]["instances"])):

        try: 
            url=r'"'+ "https://permid.org/1-" + result.entities[x]['resolutions'][0]['permid'] + r'"'

        except:
            url=r'"'+"https://en.wikipedia.org/wiki/"+result.entities[x]["name"].replace(" ", "_") + r'"'

        print "Generating wiki page link"

 #THE PROBLEM WAS HERE 

        offsetstr=result.entities[x]["instances"][y]["offset"]
        lenstr=result.entities[x]["instances"][y]["length"]

#The KEY TO THE SOLUTION IS HERE
        l.append((offsetstr,lenstr,url))
       # res.append(output[preOffsetstr:offsetstr]+"<a href=" + url + ">" +      output[offsetstr:offsetstr+lenstr] + "</a>" + output[offsetstr+lenstr:])

print l

def getKey(item):
    return item[0]

l_sorted=sorted(l, key=getKey)


a=[]
o=[]
x=0
p=0
#And then simply run a for loop

for x in range(0,len(l_sorted)):
    p=x+1
    try:
        o=output[l_sorted[x][0]+l_sorted[x][1]:l_sorted[x][0]] + "<a href=" + str(l_sorted[x][2]) + ">" +  output[l_sorted[x][0]:(l_sorted[x][0]+l_sorted[x][1])] + "</a>" + output[l_sorted[x][0]+l_sorted[x][1]:(l_sorted[p][0]-1)]
        a.append(o)
    except:
        print ""

#+ output[l_sorted[x][0]+l_sorted[x][1]:]
#a.append(output[l_sorted[len(l_sorted)][0]] + l_sorted[len(l_sorted)][1]:l_sorted[len(l_sorted)][0]] + "<a href=" + str(l_sorted[len(l_sorted)][2]) + ">" + output[l_sorted[len(l_sorted)][0]:(l_sorted[len(l_sorted)][0]+l_sorted[len(l_sorted)][1])] + "</a>" + output[l_sorted[len(l_sorted)][0]+l_sorted[len(l_sorted)][1]:]
m=output[l_sorted[len(l_sorted)-1][0]+l_sorted[len(l_sorted)-1][1]:l_sorted[len(l_sorted)-1][0]] + "<a href=" + str(l_sorted[len(l_sorted)-1][2]) + ">" +  output[l_sorted[len(l_sorted)-1][0]:(l_sorted[len(l_sorted)-1][0]+l_sorted[len(l_sorted)-1][1])] + "</a>" + output[l_sorted[len(l_sorted)-1][0]+l_sorted[len(l_sorted)-1][1]:]
a.append(m)

print " ".join(a)

和WALLAH!:) - 感谢帮助人员。希望有一天能帮助某人。