Question

我有这个数据框

index      sentences                                            category
1          the side effects are terrible !                         SSRI
2          They are killing me,,, I want to stop                   SNRI
3          I need to contact my physicians ?                        SSRI
4          How to stop it.. I am surprised because of its effect.   SSRI

我需要对句子进行标记，然后计算每个类别的标记数。我知道我可以使用以下代码来完成它。但我不知道如何计算令牌的数量。

df['tokenized_sents'] = df.apply(lambda row: nltk.word_tokenize(row['sentences']), axis=1)

任何建议

Answer 1

使用相同的apply方法计算是不是很简单。

df['len_tokens'] = df['tokenized_sents'].apply(lambda x: len(x))

标记句子并计算熊猫数据框中的数字

1 个答案: