如何根据特定值选择DataFrame行并从中创建列表列表

时间:2019-02-24 16:08:33

标签: python pandas

我需要创建一组元组,然后再进行一些匹配操作。这是我的df: enter image description here 要获得这些对,我需要根据特定条件从数据框的名为diff的“主题”列中提取它们:

  1. “对象”列中的值必须属于从diff数据框中提取的值的列表
  2. “谓词”列必须等于
  

http://www.w3.org/2000/01/rdf-schema#comment

然后,对于每个匹配的术语,我要创建候选列表。例如,对于注释为“某物的一部分”的所有术语,创建一个列表,例如[Book,Booklet,Article]或任何合适的数据结构,因为那时我想使用nltk.wordnet查找它们是否相似。 / p>

代码如下:

    import re
    ont1 = pd.read_csv('1.tsv',sep='\t',names=['subject','predicate','object'])
    ont2 = pd.read_csv('2.tsv',sep='\t',names=['subject','predicate','object'])
#create the diff df that contains the rows not shared in the dfs    
diff = pd.concat([ont1,ont2]).drop_duplicates(keep=False)
#create a list of the unique values to act as a condition to create the set of matching candidates
    terms = list(ont_diff['object'].unique())
    for term,index in enumerate(terms):
        maps = []
        uri=str(diff.loc[(diff['object']==terms[0]) & (diff['predicate']== 'http://www.w3.org/2000/01/rdf-schema#comment'),'subject'])
#use re.sub to clean the url for easiness
        uri=re.sub(' http://oaei.ontologymatching.org/tests/101/','',uri)
#remove the first item fo the terms list to advance a new term in the look up above        
terms.pop(0)
        maps.append(uri)
        print(maps)

0 个答案:

没有答案