Question

我有一个由两列组成的数据框：ID和TEXT。假装数据如下：

ID      TEXT
265     The farmer plants grain. The fisher catches tuna.
456     The sky is blue.
434     The sun is bright.
921     I own a phone. I own a book.

我知道所有nltk函数都不适用于数据帧。如何将sent_tokenize应用于上述数据帧？

当我尝试：

df.TEXT.apply(nltk.sent_tokenize)

输出与原始数据帧保持不变。我想要的输出是：

TEXT
The farmer plants grain.
The fisher catches tuna.
The sky is blue.
The sun is bright.
I own a phone.
I own a book.

此外，我想将这个新的（所需）数据框绑定到这样的原始ID数字（进一步文字清理后）：

ID    TEXT
265     'farmer', 'plants', 'grain'
265     'fisher', 'catches', 'tuna'
456     'sky', 'blue'
434     'sun', 'bright'
921     'I', 'own', 'phone'
921     'I', 'own', 'book'

这个问题与我的另一个问题here有关。如果我能提供任何帮助澄清我的问题，请告诉我！

通过Pandas数据帧

0 个答案: