熊猫df栏上出现的乱码无法正常工作

时间:2020-06-15 09:53:22

标签: python pandas nlp spacy

如何在Pandas Dataframe列上应用词干

am使用此功能进行词根提取,在字符串上完美运行

xx='kenichan dived times ball managed save 50 rest'

def make_to_base(x):
    x_list = []
    doc = nlp(x)
    for token in doc:
        lemma=str(token.lemma_)
        if lemma=='-PRON-' or lemma=='be':
            lemma=token.text
        x_list.append(lemma)
    print(" ".join(x_list))    
make_to_base(xx)

但是当我在我的pandas dataframe列上应用此功能时,它既不工作也不给出任何错误

x = list(df['text']) #my df column
x = str(x)#converting into string otherwise it is giving error
make_to_base(x)

我尝试了不同的方法,但没有任何效果。像这样

df["texts"] =  df.text.apply(lambda x: make_to_base(x))

make_to_base(df['text'])

我的数据集如下:

df['text'].head()
Out[17]: 
0    Hope you are having a good week. Just checking in
1                              K..give back my thanks.
2          Am also doing in cbe only. But have to pay.
3    complimentary 4 STAR Ibiza Holiday or £10,000 ...
4    okmail: Dear Dave this is your final notice to...
Name: text, dtype: object

1 个答案:

答案 0 :(得分:1)

您需要实际返回在make_to_base方法中获得的值,使用

def make_to_base(x):
    x_list = []
    for token in nlp(x):
        lemma=str(token.lemma_)
        if lemma=='-PRON-' or lemma=='be':
            lemma=token.text
        x_list.append(lemma)
    return " ".join(x_list)

然后使用

df['texts'] =  df['text'].apply(lambda x: make_to_base(x))
相关问题