我有一个带有msg列的csv,它包含以下文本
muchloveandhugs
dudeseriously
onemorepersonforthewin
havefreebiewoohoothankgod
thisismybestcategory
yupbabe
didfreebee
heykidforget
hecomplainsaboutit
我知道nltk.corpus.words有很多明智的单词。我的问题是如何在df ['msg']列上进行迭代,以便获得诸如
df[‘msg’]
much love and hugs
dude seriously
one more person for the win
答案 0 :(得分:1)
来自this question的关于在不带空格的情况下将单词分割成字符串并且不太了解您的数据的样子:
import pandas as pd
import wordninja
filename = 'mycsv.csv' # Put your filename here
df = pd.read_csv(filename)
for wordstring in df['msg']:
split = wordninja.split(wordstring)
# Do something with split