根据每个句子检查字典中是否存在两个列表元素?

时间:2019-03-12 09:11:46

标签: python json nlp

我有一个JSON文件...

  "1": {"address": "1",
          "ctag": "Ne",
          "feats": "_",
          "head": "6",
          "lemma": "Ghani",
          "rel": "SBJ",
          "tag": "Ne",
          "word": "Ghani"},
    "2": {"address": "2",
          "ctag": "AJ",
          "feats": "_",
          "head": "1",
          "lemma": "born",
          "rel": "NPOSTMOD",
          "tag": "AJ",
          "word": "born"},
    "3": {"address": "3",
          "ctag": "P",
          "feats": "_",
          "head": "6",
          "lemma": "in",
          "rel": "ADV",
          "tag": "P",
          "word": "in"},
    "4": {"address": "4",
          "ctag": "N",
          "feats": "_",
          "head": "3",
          "lemma": "Kabul",
          "rel": "POSDEP",
          "tag": "N",
          "word": "Kabul"},
  "5": {"address": "5",
          "ctag": "PUNC",
          "feats": "_",
          "head": "6",
          "lemma": ".",
          "rel": "PUNC",
          "tag": "PUNC",
          "word": "."},

我读取了JSON文件并存储在字典中。

import json

# read file
with open('../data/data.txt', 'r') as JSON_file:
     obj = json.load(JSON_file)

d = dict(obj) # stored it in a dict

我从此dict中提取了两个列表,每个列表包含文本中的relationentities,如下所示:

 entities(d) = ['Ghani', 'Kabul', 'Afghanistan'....]
 relation(d) = ['president', 'capital', 'located'...]

现在,我要检出字典d的每个句子,如果entities(d)relation(d)的任何元素存在,则应将其存储到另一个列表中。 我做了什么?

to_match = set(relation(d) + entities(d))
entities_and_relation = [[j for j in to_match if j in i] 
                    for i in ''.join(d).split('.')[:-1]]
print(entities_and_relation)

但这给我返回了一个空列表。你能告诉我这里有什么问题吗?

输出应类似于:      [阿富汗总统加尼] ...

1 个答案:

答案 0 :(得分:0)

我在这里解决了这个问题,但是我不知道如何为每个相关实体指定特定的格式。

for i in d.values():
if i['word'].split('.')[-1] in to_match:
    print('{: ^10}'.format(i['word']))

输出:

 Ghani
 Kabul
 Born
 Kabul
 Captial
 Afghanistan

我的预期输出:

 (Ghani, born, Kabul), (Kabul, capital, Afghanistan) or ...
 Born_in(Ghani, Kabul), Capital_of(Kabul, Afghanistan)

我不知道要映射它或设计它来给我预期的输出。