我正在努力从OpenCalais API中提取数据,以下是详细信息:
输入:一段(一个字符串,例如“Barack Obama是美国总统。”此外,返回的是一些具有偏移量和长度的实例变量,但不一定按发生顺序。
输出(我想要):相同的字符串,但带有超链接的标识实体实例(也是一个字符串),即
output="<a href="https://en.wikipedia.org/Barack_Obama"> Barack Obama </a> is the President of ""<a href="https://en.wikipedia.org/United_States"> United States. </a>"
但这真的是一个韵母问题。
这就是我所拥有的
#API CALLS ABOVE WHICH IS NOT RELEVANT.
output=input
for x in range(0,result.print_entities()):
print len(result.entities[x]["instances"])
previdx=0
idx=0
for y in range(0,len(result.entities[x]["instances"])):
try:
url= "https://permid.org/1-" + result.entities[x]['resolutions'][0]['permid']
except:
url="https://en.wikipedia.org/wiki/"+result.entities[x] ["name"].replace(" ", "_")
print "Generating wiki page link"
print url+"\n"
#THE PROBLEM STARTS HERE
offsetstr=result.entities[x]["instances"][y]["offset"]
lenstr=result.entities[x]["instances"][y]["length"]
output=output[:offsetstr]+"<a href=" + url + ">" + output[offsetstr:offsetstr+lenstr] + "</a>" + output[offsetstr+lenstr:]
print output
现在的问题是,如果你正确地阅读了代码,你就会知道在第一次迭代之后,输出字符串会改变 - 因此对于后续迭代,偏移值不再以相同的方式应用。所以,我无法做出预期的改变。
基本上试图获得:
input = "Barack Obama is the President of United States"
output= "<a href="https://en.wikipedia.org/Barack_Obama"> Barack Obama </a> is the President of ""<a href="https://en.wikipedia.org/United_States"> United States. </a>."
我想知道怎么办呢?尝试拼接切割,但字符串只是乱码。
答案 0 :(得分:0)
尝试使用另一个var来存储结果
output=input
res,preOffsetstr = [],0
for x in range(0,result.print_entities()):
print len(result.entities[x]["instances"])
previdx=0
idx=0
for y in range(0,len(result.entities[x]["instances"])):
try:
url= "https://permid.org/1-" + result.entities[x]['resolutions'][0]['permid']
except:
url="https://en.wikipedia.org/wiki/"+result.entities[x] ["name"].replace(" ", "_")
print "Generating wiki page link"
print url+"\n"
#THE PROBLEM STARTS HERE
offsetstr=result.entities[x]["instances"][y]["offset"]
lenstr=result.entities[x]["instances"][y]["length"]
res.append(output[preOffsetstr :offsetstr]+"<a href=" + url + ">" + output[offsetstr:offsetstr+lenstr] + "</a>" + output[offsetstr+lenstr:])
preOffsetstr = offsetstr
print '\n'.join(res)
答案 1 :(得分:0)
我终于解决了它。采取一些主要的数学逻辑,但作为我最后的评论直觉 - “也许一个解决方案可以将{offset,length}元组存储在一个数组中,然后对偏移值进行排序,然后运行循环。任何帮助制作那个结构?“ - 这就是诡计。
output=input
l=[]
for x in range(0,result.print_entities()):
print len(result.entities[x]["instances"])
for y in range(0,len(result.entities[x]["instances"])):
try:
url=r'"'+ "https://permid.org/1-" + result.entities[x]['resolutions'][0]['permid'] + r'"'
except:
url=r'"'+"https://en.wikipedia.org/wiki/"+result.entities[x]["name"].replace(" ", "_") + r'"'
print "Generating wiki page link"
#THE PROBLEM WAS HERE
offsetstr=result.entities[x]["instances"][y]["offset"]
lenstr=result.entities[x]["instances"][y]["length"]
#The KEY TO THE SOLUTION IS HERE
l.append((offsetstr,lenstr,url))
# res.append(output[preOffsetstr:offsetstr]+"<a href=" + url + ">" + output[offsetstr:offsetstr+lenstr] + "</a>" + output[offsetstr+lenstr:])
print l
def getKey(item):
return item[0]
l_sorted=sorted(l, key=getKey)
a=[]
o=[]
x=0
p=0
#And then simply run a for loop
for x in range(0,len(l_sorted)):
p=x+1
try:
o=output[l_sorted[x][0]+l_sorted[x][1]:l_sorted[x][0]] + "<a href=" + str(l_sorted[x][2]) + ">" + output[l_sorted[x][0]:(l_sorted[x][0]+l_sorted[x][1])] + "</a>" + output[l_sorted[x][0]+l_sorted[x][1]:(l_sorted[p][0]-1)]
a.append(o)
except:
print ""
#+ output[l_sorted[x][0]+l_sorted[x][1]:]
#a.append(output[l_sorted[len(l_sorted)][0]] + l_sorted[len(l_sorted)][1]:l_sorted[len(l_sorted)][0]] + "<a href=" + str(l_sorted[len(l_sorted)][2]) + ">" + output[l_sorted[len(l_sorted)][0]:(l_sorted[len(l_sorted)][0]+l_sorted[len(l_sorted)][1])] + "</a>" + output[l_sorted[len(l_sorted)][0]+l_sorted[len(l_sorted)][1]:]
m=output[l_sorted[len(l_sorted)-1][0]+l_sorted[len(l_sorted)-1][1]:l_sorted[len(l_sorted)-1][0]] + "<a href=" + str(l_sorted[len(l_sorted)-1][2]) + ">" + output[l_sorted[len(l_sorted)-1][0]:(l_sorted[len(l_sorted)-1][0]+l_sorted[len(l_sorted)-1][1])] + "</a>" + output[l_sorted[len(l_sorted)-1][0]+l_sorted[len(l_sorted)-1][1]:]
a.append(m)
print " ".join(a)
和WALLAH!:) - 感谢帮助人员。希望有一天能帮助某人。