代码:
import pandas as pd
import numpy as np
import re
df=pd.read_csv('twitDB.csv',header=None, sep=',',error_bad_lines=False,encoding='utf-8')
hula=df[[0,1,2,3]]
hula=hula.fillna(0)
hula['tweet'] = hula[0].astype(str) +hula[1].astype(str)+hula[2].astype(str)+hula[3].astype(str)
dhole=hula["tweet"]
dhole = re.sub('\s+', ' ',dhole )
抓住这个
错误:预期的字符串或类似字节的对象
答案 0 :(得分:1)
我认为您需要Series.replace
或Series.str.replace
,因为使用Series
(数组)和re.sub
适用于标量:
dhole = dhole.replace('\s+', ' ', regex=True)
#or
dhole = dhole.str.replace('\s+', ' ')
样品:
>>> hula = pd.DataFrame({'tweet':['ss ddd s ss','d d','f t y']})
>>> dhole=hula["tweet"]
>>> print (dhole)
0 ss ddd s ss
1 d d
2 f t y
Name: tweet, dtype: object
>>> dhole = dhole.replace('\s+', ' ', regex=True)
>>> print (dhole)
0 ss ddd s ss
1 d d
2 f t y
Name: tweet, dtype: object
>>> dhole = dhole.str.replace('\s+', ' ')
>>> print (dhole)
0 ss ddd s ss
1 d d
2 f t y
Name: tweet, dtype: object