早上好,
我正在使用NLTK从单词框架中获取同义词。
print(df)
col_1 col_2
Book 5
Pen 5
Pencil 6
def get_synonyms(df, column_name):
df_1 = df["col_1"]
for i in df_1:
syn = wn.synsets(i)
for synset in list(wn.all_synsets('n'))[:2]:
print(i, "-->", synset)
print("-----------")
for lemma in synset.lemmas():
print(lemma.name())
ci = lemma.name()
return(syn)
它确实有效,但我希望获得以下数据框,其中第一个“n”个同义词,“col_1”中的每个单词:
print(df_final)
col_1 synonym
Book booklet
Book album
Pen cage
...
我尝试在synset和lemma循环之前初始化一个空列表,并附加,但在这两种情况下它都不起作用;例如:
synonyms = []
for lemma in synset.lemmas():
print(lemma.name())
ci = lemma.name()
synonyms.append(ci)
答案 0 :(得分:1)
您可以使用:
from nltk.corpus import wordnet
from itertools import chain
def get_synonyms(df, column_name, N):
L = []
for i in df[column_name]:
syn = wordnet.synsets(i)
#flatten all lists by chain, remove duplicates by set
lemmas = list(set(chain.from_iterable([w.lemma_names() for w in syn])))
for j in lemmas[:N]:
#append to final list
L.append([i, j])
#create DataFrame
return (pd.DataFrame(L, columns=['word','syn']))
#add number of filtered synonyms
df1 = get_synonyms(df, 'col_1', 3)
print (df1)
word syn
0 Book record_book
1 Book book
2 Book Word
3 Pen penitentiary
4 Pen compose
5 Pen pen
6 Pencil pencil