在列中分割单词

时间:2018-10-15 14:35:46

标签: python regex nlp nltk

我有一个带有msg列的csv,它包含以下文本

muchloveandhugs                                  
dudeseriously                                    
onemorepersonforthewin                           
havefreebiewoohoothankgod                        
thisismybestcategory                             
yupbabe                                          
didfreebee                                       
heykidforget                                     
hecomplainsaboutit                               

我知道nltk.corpus.words有很多明智的单词。我的问题是如何在df ['msg']列上进行迭代,以便获得诸如

df[‘msg’]
much love and hugs
dude seriously
one more person for the win

1 个答案:

答案 0 :(得分:1)

来自this question的关于在不带空格的情况下将单词分割成字符串并且不太了解您的数据的样子:

import pandas as pd
import wordninja

filename = 'mycsv.csv' # Put your filename here

df = pd.read_csv(filename)
for wordstring in df['msg']:
    split = wordninja.split(wordstring)
    # Do something with split