我想为for循环中的列表中的每个元素运行,并有一个数据框,其中第一列i来自列表,随后的元素来自wordnet。
active active active_voice 不错不错
synonyms = []
list = ["active", "decent"]
for i in list:
for syn in wordnet.synsets(i):
for l in syn.lemmas():
synonyms.append(i)
synonyms.append(l.name())
我没有收到一个列表,其中第一个“活动”元素在循环中运行了两次。
答案 0 :(得分:0)
您的代码基本上是正确的,但是在构造同义词列表时,同义词列表中有很多重复项。我在下面对其进行了修改,以建立要放入数据框的对的列表。
import nltk
wordnet = nltk.wordnet.wordnet
# Words to lookup.
words = ["active", "decent"]
# Build a list of pairs, word:synonym
synonyms = []
for i in words:
for syn in wordnet.synsets(i):
for l in syn.lemmas():
synonyms.append((i, l.name()))
df = pd.DataFrame(sorted(synonyms), columns=('word', 'synonym'))
df.drop_duplicates()
产生
0 active active
17 active active_agent
18 active active_voice
19 active alive
20 active combat-ready
21 active dynamic
22 active fighting
23 active participating
24 decent adequate
25 decent becoming
26 decent comely
27 decent comme_il_faut
28 decent decent
35 decent decently
36 decent decorous
37 decent enough
38 decent in_good_order
39 decent nice
40 decent properly
41 decent right
42 decent seemly
43 decent the_right_way
您可以通过构建集来提前消除重复项
synonyms = {}
for i in words:
entries = set()
for syn in wordnet.synsets(i):
for l in syn.lemmas():
entries.add(l.name())
synonyms[i] = entries
pairs = ((k,v) for k,entries in sorted(synonyms.items())
for v in sorted(entries))
df = pd.DataFrame(pairs, columns=('name', 'synonym'))
df
产生
name synonym
0 active active
1 active active_agent
2 active active_voice
3 active alive
4 active combat-ready
5 active dynamic
6 active fighting
7 active participating
8 decent adequate
9 decent becoming
10 decent comely
11 decent comme_il_faut
12 decent decent
13 decent decently
14 decent decorous
15 decent enough
16 decent in_good_order
17 decent nice
18 decent properly
19 decent right
20 decent seemly
21 decent the_right_way