将一个字符串分成三元组(三个字段的元组),主语谓词和对象。

时间:2014-04-21 06:36:59

标签: python algorithm rdf

例如:

示例RDF字符串。

< Tom_Wilkinson_(演员)GT; < actedIn> “In_the_Bedroom”,“The_Patriot_(2000_film)”,“Black_Knight_(电影)”,“The_Last_Kiss”,“Cassandras_Dream”; < bornOnDate> “1948年12月12日”; < isCalled> “Tom Wilkinson(Schauspieler)”,“טוםוילקינסון”,“トム·ウィルキンソン”,“Tom Wilkinson”,“וםוילקינסון”,“ム·ウィルキンソン”;

给定字符串的三元组 -

<Tom_Wilkinson_(actor)> <actedIn> "In_the_Bedroom"     
<Tom_Wilkinson_(actor)> <actedIn> "The_Patriot_(2000_film)" 
<Tom_Wilkinson_(actor)> <actedIn> "Black_Knight_(film)" 
<Tom_Wilkinson_(actor)> <actedIn> "The_Last_Kiss" 
<Tom_Wilkinson_(actor)> <actedIn> "Cassandras_Dream"
<Tom_Wilkinson_(actor)> <bornOnDate> "1948-12-12"
<Tom_Wilkinson_(actor)> <isCalled> "Tom Wilkinson (Schauspieler)"

注意 - 对象之间可以有空格。比方说“Tom Wilkinson(Schauspieler)”是一个包含空格的对象。

2 个答案:

答案 0 :(得分:5)

您提供的输入实际上已经是某些RDF的Turtle(或N3)序列化。它的格式通常是这样的,指定了一些@base

@base <http://stackoverflow.com/q/23192184/1281433> .

<Tom_Wilkinson_(actor)> <actedIn> "In_the_Bedroom" , "The_Patriot_(2000_film)" ,
                                  "Black_Knight_(film)" , "The_Last_Kiss" ,
                                  "Cassandras_Dream";
                        <bornOnDate> "1948-12-12";
                        <isCalled> "Tom Wilkinson (Schauspieler)" ,
                                   "טום וילקינסון" , "トム・ウィルキンソン" ,
                                   "Tom Wilkinson" , "ום וילקינסון" ,
                                   "ム・ウィルキンソン" .

如果添加适当的@base声明,则可以使用任何可以读取Turtle并在N-Triples中序列化的库来读取输入并写入输出。例如,使用Jena的rdfcat,您可以转换为多种不同的格式,包括N-Triples:

$ rdfcat -out N-TRIPLES input.ttl
<http://stackoverflow.com/q/23192184/Tom_Wilkinson_(actor)> <http://stackoverflow.com/q/23192184/actedIn> "Black_Knight_(film)" .
<http://stackoverflow.com/q/23192184/Tom_Wilkinson_(actor)> <http://stackoverflow.com/q/23192184/isCalled> "ム・ウィルキンソン" .
<http://stackoverflow.com/q/23192184/Tom_Wilkinson_(actor)> <http://stackoverflow.com/q/23192184/isCalled> "トム・ウィルキンソン" .
<http://stackoverflow.com/q/23192184/Tom_Wilkinson_(actor)> <http://stackoverflow.com/q/23192184/isCalled> "Tom Wilkinson (Schauspieler)" .
<http://stackoverflow.com/q/23192184/Tom_Wilkinson_(actor)> <http://stackoverflow.com/q/23192184/isCalled> "ום וילקינסון" .
<http://stackoverflow.com/q/23192184/Tom_Wilkinson_(actor)> <http://stackoverflow.com/q/23192184/isCalled> "טום וילקינסון" .
<http://stackoverflow.com/q/23192184/Tom_Wilkinson_(actor)> <http://stackoverflow.com/q/23192184/actedIn> "The_Last_Kiss" .
<http://stackoverflow.com/q/23192184/Tom_Wilkinson_(actor)> <http://stackoverflow.com/q/23192184/bornOnDate> "1948-12-12" .
<http://stackoverflow.com/q/23192184/Tom_Wilkinson_(actor)> <http://stackoverflow.com/q/23192184/actedIn> "The_Patriot_(2000_film)" .
<http://stackoverflow.com/q/23192184/Tom_Wilkinson_(actor)> <http://stackoverflow.com/q/23192184/actedIn> "In_the_Bedroom" .
<http://stackoverflow.com/q/23192184/Tom_Wilkinson_(actor)> <http://stackoverflow.com/q/23192184/isCalled> "Tom Wilkinson" .
<http://stackoverflow.com/q/23192184/Tom_Wilkinson_(actor)> <http://stackoverflow.com/q/23192184/actedIn> "Cassandras_Dream" .

由于您使用Python标记了这一点,您可能会发现RDFlib比Jena更有用,但这里真正的问题应该是如何进行转换,而不是库请求(因为库请求不在于堆栈主题)溢出)。

答案 1 :(得分:-2)

尝试使用RDFLib。 看起来他们有examples on parsing ntriples

编辑:格式实际为n3。请参阅parse()

上的these docs