Question

我需要创建一组元组，然后再进行一些匹配操作。这是我的df：要获得这些对，我需要根据特定条件从数据框的名为diff的“主题”列中提取它们：

“对象”列中的值必须属于从diff数据框中提取的值的列表
“谓词”列必须等于

http://www.w3.org/2000/01/rdf-schema#comment

然后，对于每个匹配的术语，我要创建候选列表。例如，对于注释为“某物的一部分”的所有术语，创建一个列表，例如[Book，Booklet，Article]或任何合适的数据结构，因为那时我想使用nltk.wordnet查找它们是否相似。 / p>

代码如下：

    import re
    ont1 = pd.read_csv('1.tsv',sep='\t',names=['subject','predicate','object'])
    ont2 = pd.read_csv('2.tsv',sep='\t',names=['subject','predicate','object'])
#create the diff df that contains the rows not shared in the dfs    
diff = pd.concat([ont1,ont2]).drop_duplicates(keep=False)
#create a list of the unique values to act as a condition to create the set of matching candidates
    terms = list(ont_diff['object'].unique())
    for term,index in enumerate(terms):
        maps = []
        uri=str(diff.loc[(diff['object']==terms[0]) & (diff['predicate']== 'http://www.w3.org/2000/01/rdf-schema#comment'),'subject'])
#use re.sub to clean the url for easiness
        uri=re.sub(' http://oaei.ontologymatching.org/tests/101/','',uri)
#remove the first item fo the terms list to advance a new term in the look up above        
terms.pop(0)
        maps.append(uri)
        print(maps)

如何根据特定值选择DataFrame行并从中创建列表列表

0 个答案: