我有三个包含以下数据的列表:
Entities: ['Ashraf', 'Afghanistan', 'Afghanistan', 'Kabul']
Relations: ['Born', 'President', 'employee', 'Capital', 'Located', 'Lecturer', 'University']
sentence_list: ['Ashraf','Born', 'in', 'Kabul', '.' 'Ashraf', 'is', 'the', 'president', 'of', 'Afghanistan', '.', ...]
sentence_list
是句子列表。在每个句子中,我要检查Entities
和Relations
中是否有任何单词,应在另一个列表中添加特定单词的组合。例如第一句中的({Ashraf, born, Kabul
)。
我做了什么:
第一个不完整的解决方案:
# read file
with open('../data/parse.txt', 'r') as myfile:
json_data = json.load(myfile)
for i in range(len(json_data)): # the dataset was in json format
if json_data[i]['word'] in relation(json_data)[0]: # I extract the relations
print(json_data[i]['word'])
if json_data[i]['word'] in entities(json_data)[0]:
print(json[i]['word'])
输出:(Ashraf, Born, Ashraf)
,我要(Ashraf, Born, Kabul)
下一个不完整的解决方案:我将json_data
存储到列表中,然后执行此操作:
json_data2 = []
for i in range(len(json_data)):
json2_data.append(json_data[i]['word'])
print(json_data2)
'''
Now I tried if I can find any element of `Entities` list and `Relations` list
in each sentence of `sentence_list`. And then it should store matched
entities and relations based on sentence to a list. '''
for line in json_data2:
for rel in relation(obj):
for ent in entities(obj):
match = re.findall(rel, line['word'])
if match:
print('word matched relations: %s ==> word: %s' % (rel, line['address']))
match2 = re.findall(ent, line['word'])
if match2:
print('word matched entities: %s ==> word: %s' % (ent, line['address']))
不幸的是,行不通吗?
答案 0 :(得分:1)
您可以使用以下list comprehension:
to_match = set(Entities+Relations)
l = [{j for j in to_match if j in i}
for i in ' '.join(sentence_list).split('.')[:-1]]
输出
[{'Ashraf', 'Born', 'Kabul'}, {'Afghanistan', 'Ashraf'}]
请注意,我要返回sets
的列表以避免重复的值,例如在Entities
Afghanistan
中出现了两次。
有用的读物: