我的.txt文件如下:
正如你可以看到动词之间的几个关系(不关心数字),文件有5000行。
数据在此处:在下载&使用VerbOcean:http://demo.patrickpantel.com/demos/verbocean/
我想要的是每个关系的字典,以便我们可以说例如
similar-to['anger'] = 'energize'
happens-before['X'] = 'Y'
stronger-than ['A'] = 'B'
等等。
所以到目前为止我所做的只是[强于]关系。我应该如何以一种完成所有其他关系的方式扩展它?
import csv
file = open("C:\\Users\\shide\\Desktop\\Independent study\\data.txt")
counter = 1
stronger = {}
strongerverb = []
secondverb = []
term1 = "[stronger-than]" #Look for stronger-than
words = line.split() #split sentence
if term1 in words: #if ['Stronger-than'] exists in the line then add the first word
strongerverb.append(line.split(None, 1)[0]) # add only first verb
secondverb.append(line.split()[2]) #add second verb
if term1 in words: # if ['Stronger-than'] exists in the line then add the first word
strongerverb.append(line.split(None, 1)[0]) # add only first verb
secondverb.append(line.split()[2]) # add second verb
capacity = len(strongerverb)
index = 0
while index!=capacity:
line = strongerverb[index]
for word in line.split():
# print(word)
index = index+1
#print("First verb:",firstverb)
#print("Second verb:",secondverb)
for i in range(len(strongerverb)):
stronger[strongerverb[i]] = secondverb[i]
#Write a CSV file that fist column is containing verbs that is stronger than the second column.
with open('output.csv', 'w') as output:
writer = csv.writer(output, lineterminator='\n')
for secondverb, strongerverb in stronger.items():
writer.writerow([strongerverb, secondverb])
一种方法是对所有其他关系采取相同的方式,但我想这不会是一个聪明的事情。有任何想法吗? 我想要的是每个关系的字典,以便我们可以说:
similar-to['anger'] = 'energize'
happens-before['X'] = 'Y'
stronger-than ['A'] = 'B'
我是python的新手,非常感谢任何帮助。
答案 0 :(得分:0)
这可以使用正则表达式完成:
import re
regexp = re.compile(r'^([^\[\]\s]+)\s*\[([^\[\]\s]+)\]\s*([^\[\]\s]+)\s*.*$', re.MULTILINE)
^
:(在开头)意味着开始在行的开头查找。$
:(最后)意味着表达式应该以行和行结束。[^\[\]\s]+
:捕获非[
,]
或空格的所有字符。 ^
表示不捕获方括号内的以下字符。()
将上述表达式封装起来,将其标记为要使用m.groups()
捕获的组。由于我们想要获得动词及其关系,我们将这三个用()
封装。\s*
捕获所有空格,并使用.*
捕获我们捕获的其余行。两者都被忽略,因为它们没有用()
封装。data = """
invate [happens-beforeg] annex :: ....
annex [similar] invade :: ....
annex [opposite-of] cede :: ....
annex [stronger-than] occupy :: ....
"""
relationships = {}
for m in regexp.finditer(data):
v1,r,v2 = m.groups()
relationships.setdefault(r, {})[v1] = v2
print(relationships)
{'happens-before': {'invate': 'annex'},
'opposite-of': {'annex': 'cede'},
'similar': {'annex': 'invade'},
'stronger-than': {'annex': 'occupy'}}
然后,要获得动词'similar'
的{{1}}关系,请使用:
'annex'
将返回:relationships['similar']['annex']