我有这个数据框
index sentences category
1 the side effects are terrible ! SSRI
2 They are killing me,,, I want to stop SNRI
3 I need to contact my physicians ? SSRI
4 How to stop it.. I am surprised because of its effect. SSRI
我需要对句子进行标记,然后计算每个类别的标记数。我知道我可以使用以下代码来完成它。但我不知道如何计算令牌的数量。
df['tokenized_sents'] = df.apply(lambda row: nltk.word_tokenize(row['sentences']), axis=1)
任何建议
答案 0 :(得分:2)
使用相同的apply方法计算是不是很简单。
df['len_tokens'] = df['tokenized_sents'].apply(lambda x: len(x))