我有一个JSON文件...
"1": {"address": "1",
"ctag": "Ne",
"feats": "_",
"head": "6",
"lemma": "Ghani",
"rel": "SBJ",
"tag": "Ne",
"word": "Ghani"},
"2": {"address": "2",
"ctag": "AJ",
"feats": "_",
"head": "1",
"lemma": "born",
"rel": "NPOSTMOD",
"tag": "AJ",
"word": "born"},
"3": {"address": "3",
"ctag": "P",
"feats": "_",
"head": "6",
"lemma": "in",
"rel": "ADV",
"tag": "P",
"word": "in"},
"4": {"address": "4",
"ctag": "N",
"feats": "_",
"head": "3",
"lemma": "Kabul",
"rel": "POSDEP",
"tag": "N",
"word": "Kabul"},
"5": {"address": "5",
"ctag": "PUNC",
"feats": "_",
"head": "6",
"lemma": ".",
"rel": "PUNC",
"tag": "PUNC",
"word": "."},
我读取了JSON文件并存储在字典中。
import json
# read file
with open('../data/data.txt', 'r') as JSON_file:
obj = json.load(JSON_file)
d = dict(obj) # stored it in a dict
我从此dict
中提取了两个列表,每个列表包含文本中的relation
和entities
,如下所示:
entities(d) = ['Ghani', 'Kabul', 'Afghanistan'....]
relation(d) = ['president', 'capital', 'located'...]
现在,我要检出字典d
的每个句子,如果entities(d)
和relation(d)
的任何元素存在,则应将其存储到另一个列表中。
我做了什么?
to_match = set(relation(d) + entities(d))
entities_and_relation = [[j for j in to_match if j in i]
for i in ''.join(d).split('.')[:-1]]
print(entities_and_relation)
但这给我返回了一个空列表。你能告诉我这里有什么问题吗?
输出应类似于: [阿富汗总统加尼] ...
答案 0 :(得分:0)
我在这里解决了这个问题,但是我不知道如何为每个相关实体指定特定的格式。
for i in d.values():
if i['word'].split('.')[-1] in to_match:
print('{: ^10}'.format(i['word']))
输出:
Ghani
Kabul
Born
Kabul
Captial
Afghanistan
我的预期输出:
(Ghani, born, Kabul), (Kabul, capital, Afghanistan) or ...
Born_in(Ghani, Kabul), Capital_of(Kabul, Afghanistan)
我不知道要映射它或设计它来给我预期的输出。