Question

我有三个包含以下数据的列表：

Entities:  ['Ashraf', 'Afghanistan', 'Afghanistan', 'Kabul']
Relations:  ['Born', 'President', 'employee', 'Capital', 'Located', 'Lecturer', 'University']
sentence_list: ['Ashraf','Born', 'in', 'Kabul', '.' 'Ashraf', 'is', 'the', 'president', 'of', 'Afghanistan', '.', ...]

sentence_list是句子列表。在每个句子中，我要检查Entities和Relations中是否有任何单词，应在另一个列表中添加特定单词的组合。例如第一句中的（{Ashraf, born, Kabul）。

我做了什么：

第一个不完整的解决方案：

# read file
with open('../data/parse.txt', 'r') as myfile:
    json_data = json.load(myfile)

for i in range(len(json_data)): # the dataset was in json format
     if json_data[i]['word'] in relation(json_data)[0]: # I extract the relations
         print(json_data[i]['word'])
     if json_data[i]['word'] in entities(json_data)[0]:
         print(json[i]['word'])

输出：(Ashraf, Born, Ashraf)，我要(Ashraf, Born, Kabul)

下一个不完整的解决方案：我将json_data存储到列表中，然后执行此操作：

json_data2 = []
for i in range(len(json_data)):
    json2_data.append(json_data[i]['word'])
print(json_data2)


'''
Now I tried if I can find any element of `Entities` list and `Relations` list
in each sentence of `sentence_list`. And then it should store matched 
entities and relations based on sentence to a list. '''

for line in json_data2:
    for rel in relation(obj):
        for ent in entities(obj):
            match = re.findall(rel,  line['word'])
            if match:
                print('word matched relations: %s ==> word: %s' % (rel,  line['address']))
            match2 = re.findall(ent, line['word'])
            if match2:
                print('word matched entities: %s ==> word: %s' % (ent,  line['address']))

不幸的是，行不通吗？

Answer 1

您可以使用以下list comprehension：

to_match = set(Entities+Relations)
l = [{j for j in to_match if j in i} 
        for i in ' '.join(sentence_list).split('.')[:-1]]

输出

[{'Ashraf', 'Born', 'Kabul'}, {'Afghanistan', 'Ashraf'}]

请注意，我要返回sets的列表以避免重复的值，例如在Entities Afghanistan中出现了两次。

有用的读物：

从三个列表中查找相关实体

1 个答案: