我需要创建一组元组,然后再进行一些匹配操作。这是我的df: 要获得这些对,我需要根据特定条件从数据框的名为diff的“主题”列中提取它们:
然后,对于每个匹配的术语,我要创建候选列表。例如,对于注释为“某物的一部分”的所有术语,创建一个列表,例如[Book,Booklet,Article]或任何合适的数据结构,因为那时我想使用nltk.wordnet查找它们是否相似。 / p>
代码如下:
import re
ont1 = pd.read_csv('1.tsv',sep='\t',names=['subject','predicate','object'])
ont2 = pd.read_csv('2.tsv',sep='\t',names=['subject','predicate','object'])
#create the diff df that contains the rows not shared in the dfs
diff = pd.concat([ont1,ont2]).drop_duplicates(keep=False)
#create a list of the unique values to act as a condition to create the set of matching candidates
terms = list(ont_diff['object'].unique())
for term,index in enumerate(terms):
maps = []
uri=str(diff.loc[(diff['object']==terms[0]) & (diff['predicate']== 'http://www.w3.org/2000/01/rdf-schema#comment'),'subject'])
#use re.sub to clean the url for easiness
uri=re.sub(' http://oaei.ontologymatching.org/tests/101/','',uri)
#remove the first item fo the terms list to advance a new term in the look up above
terms.pop(0)
maps.append(uri)
print(maps)