我有一个问题。
我想编写一个查询,检索相似的值(给定相似函数,如Lev)到给定字符串“Londn”,以便与谓词“RDFS:label”进行比较。 DBpedia中。例如,在输出中,我想获得“伦敦”的价值。 我已经读过一个可用的方法可能是使用iSPARQL(“不精确的SPARQL”),尽管它在文献中没有被广泛使用。
我可以使用iSPARQL还是有一些SPARQL方法来执行相同的操作?
答案 0 :(得分:5)
您可以使用这样的查询来查找名称类似于" Londn"的城市,并按相似性(一种衡量标准)排序。答案的其余部分解释了 的工作原理:
select ?city ?percent where {
?city a dbpedia-owl:City ;
rdfs:label ?label .
filter langMatches( lang(?label), 'en' )
bind( replace( concat( 'x', str(?label) ), "^x[^Londn]*([L]?)[^ondn]*([o]?)[^ndn]*([n]?)[^dn]*([d]?)[^n]*([n]?).*$", '$1$2$3$4$5' ) as ?match )
bind( xsd:float(strlen(?match))/strlen(str(?label)) as ?percent )
}
order by desc(?percent)
limit 100
city percent
----------------------------------------------
http://dbpedia.org/resource/London 0.833333
http://dbpedia.org/resource/Bonn 0.75
http://dbpedia.org/resource/Loudi 0.6
http://dbpedia.org/resource/Ladnu 0.6
http://dbpedia.org/resource/Lonar 0.6
http://dbpedia.org/resource/Longnan 0.571429
http://dbpedia.org/resource/Longyan 0.571429
http://dbpedia.org/resource/Luoding 0.571429
http://dbpedia.org/resource/Lodhran 0.571429
http://dbpedia.org/resource/Lom%C3%A9 0.5
http://dbpedia.org/resource/Andong 0.5
注意:本部分答案中的代码适用于Apache Jena。实际上是一个边缘情况导致这在Virtuoso中(正确地)失败。最后的更新解决了这个问题。
SPARQL中没有内置计算字符串匹配距离,但您可以使用SPARQL中的正则表达式替换机制来完成其中的一些操作。假设你想匹配序列" cat"在某些字符串中。然后你可以使用这样的查询来计算序列中存在多少给定字符串" cat":
select ?string ?match where {
values ?string { "cart" "concatenate" "hat" "pot" "hop" }
bind( replace( ?string, "^[^cat]*([c]?)[^at]*([a]?)[^t]*([t]?).*$", "$1$2$3" ) as ?match )
}
-------------------------
| string | match |
=========================
| "cart" | "cat" |
| "concatenate" | "cat" |
| "hat" | "at" |
| "pot" | "t" |
| "hop" | "" |
-------------------------
通过检查字符串和匹配的长度,您应该能够计算一些不同的相似性度量。作为使用" Londn"的一个更复杂的例子。输入你提到的。百分比列是与输入匹配的字符串的百分比。
select ?input
?string
(strlen(?match)/strlen(?string) as ?percent)
where {
values ?string { "London" "Londn" "London Fog" "Lando" "Land Ho!"
"concatenate" "catnap" "hat" "cat" "chat" "chart" "port" "part" }
values (?input ?pattern ?replacement) {
("cat" "^[^cat]*([c]?)[^at]*([a]?)[^t]*([t]?).*$" "$1$2$3")
("Londn" "^[^Londn]*([L]?)[^ondn]*([o]?)[^ndn]*([n]?)[^dn]*([d]?)[^n]*([n]?).*$" "$1$2$3$4$5")
}
bind( replace( ?string, ?pattern, ?replacement) as ?match )
}
order by ?pattern desc(?percent)
--------------------------------------------------------
| input | string | percent |
========================================================
| "Londn" | "Londn" | 1.0 |
| "Londn" | "London" | 0.833333333333333333333333 |
| "Londn" | "Lando" | 0.6 |
| "Londn" | "London Fog" | 0.5 |
| "Londn" | "Land Ho!" | 0.375 |
| "Londn" | "concatenate" | 0.272727272727272727272727 |
| "Londn" | "port" | 0.25 |
| "Londn" | "catnap" | 0.166666666666666666666666 |
| "Londn" | "cat" | 0.0 |
| "Londn" | "chart" | 0.0 |
| "Londn" | "chat" | 0.0 |
| "Londn" | "hat" | 0.0 |
| "Londn" | "part" | 0.0 |
| "cat" | "cat" | 1.0 |
| "cat" | "chat" | 0.75 |
| "cat" | "hat" | 0.666666666666666666666666 |
| "cat" | "chart" | 0.6 |
| "cat" | "part" | 0.5 |
| "cat" | "catnap" | 0.5 |
| "cat" | "concatenate" | 0.272727272727272727272727 |
| "cat" | "port" | 0.25 |
| "cat" | "Lando" | 0.2 |
| "cat" | "Land Ho!" | 0.125 |
| "cat" | "Londn" | 0.0 |
| "cat" | "London" | 0.0 |
| "cat" | "London Fog" | 0.0 |
--------------------------------------------------------
上面的代码在Apache Jena中有效,但在Virtuoso中失败,因为模式可以匹配空字符串。例如,如果您在DBpedia的端点(由Virtuoso提供支持)上尝试以下查询,您将收到以下错误:
select (replace( "foo", ".*", "x" ) as ?bar) where {}
Virtuoso 22023错误基于正则表达式的XPATH / XQuery / SPARQL替换() 函数无法搜索即使在一个中也可以找到的模式 空字符串
这令我感到惊讶,但replace的规范说它基于XPath fn:replace。 fn:replace的文档说:
如果模式匹配零长度,则会引发错误[err:FORX0003] 字符串,即表达式fn:匹配("",$ pattern,$ flags) 返回true。但是,如果捕获的子字符串是,则不是错误 零长度。
但是,我们可以通过在模式和字符串的开头添加一个字符来解决这个问题:
select ?input
?string
(strlen(?match)/strlen(?string) as ?percent)
where {
values ?string { "London" "Londn" "London Fog" "Lando" "Land Ho!"
"concatenate" "catnap" "hat" "cat" "chat" "chart" "port" "part" }
values (?input ?pattern ?replacement) {
("cat" "^x[^cat]*([c]?)[^at]*([a]?)[^t]*([t]?).*$" "$1$2$3")
("Londn" "^x[^Londn]*([L]?)[^ondn]*([o]?)[^ndn]*([n]?)[^dn]*([d]?)[^n]*([n]?).*$" "$1$2$3$4$5")
}
bind( replace( concat('x',?string), ?pattern, ?replacement) as ?match )
}
order by ?pattern desc(?percent)
--------------------------------------------------------
| input | string | percent |
========================================================
| "Londn" | "Londn" | 1.0 |
| "Londn" | "London" | 0.833333333333333333333333 |
| "Londn" | "Lando" | 0.6 |
| "Londn" | "London Fog" | 0.5 |
| "Londn" | "Land Ho!" | 0.375 |
| "Londn" | "concatenate" | 0.272727272727272727272727 |
| "Londn" | "port" | 0.25 |
| "Londn" | "catnap" | 0.166666666666666666666666 |
| "Londn" | "cat" | 0.0 |
| "Londn" | "chart" | 0.0 |
| "Londn" | "chat" | 0.0 |
| "Londn" | "hat" | 0.0 |
| "Londn" | "part" | 0.0 |
| "cat" | "cat" | 1.0 |
| "cat" | "chat" | 0.75 |
| "cat" | "hat" | 0.666666666666666666666666 |
| "cat" | "chart" | 0.6 |
| "cat" | "part" | 0.5 |
| "cat" | "catnap" | 0.5 |
| "cat" | "concatenate" | 0.272727272727272727272727 |
| "cat" | "port" | 0.25 |
| "cat" | "Lando" | 0.2 |
| "cat" | "Land Ho!" | 0.125 |
| "cat" | "Londn" | 0.0 |
| "cat" | "London" | 0.0 |
| "cat" | "London Fog" | 0.0 |
--------------------------------------------------------