TypeError：ufunc'add'不包含带签名匹配类型的循环dtype（'S32'）dtype（'S32'）dtype（'S32'）

时间：2017-06-12 07:11:34

标签： python string list pandas lambda

示例数据帧：

df=pd.DataFrame({"Hashtags" : ["[u'AAPHealthCare4All']", "[]", "[u'NDTV']", "[u'CBI', u'PrannoyRoy', u'Delhi', u'Emergency']" , "[u'CBI']" ]})

样本输出

({"Hashtags" : ["#AAPHealthCare4All", " ", "NDTV", "CBI", "PrannoyRoy", "Delhi", "Emergency", "CBI"]})

这是我的代码：

# Splitting Hashtags
import pandas as pd
df = pd.read_csv("2.csv")
df1 = df.drop('Hashtags', axis=1).join(
             df.Hashtags
             .str
             .split(expand=True)
             .stack()
             .reset_index(drop=True, level=1)
             .rename('Hashtags')           
             )
df1.to_csv('string_HT.csv', index=False)
# Cleaning HASHTAGS
for index,row in df1.iterrows():
    df1['Hashtags'] =df1['Hashtags'].str.strip("u'  ',")

for index,row in df1.iterrows():
    df1['Hashtags'] = df1['Hashtags'].str.strip("',")

for index,row in df1.iterrows():
    df1['Hashtags'] = df1['Hashtags'].str.strip("u'")


df1['Hashtags'] = "#" + df1['Hashtags']
df1.rename(columns={'Favorite_Count' : 'Favorite Count','Retweet_Count' :'Retweet Count', 'User_Mentions':'User Mentions','User_Location'   : 'User Location','No_of_Followers': 'No of Followers','Status_Count':'Status Count','Geo_Enabled':'Geo Enabled','Compound_Score':'Compound Score'}, inplace=True) #Rename column names to suit tableau file
df1.to_csv('string_HT.csv', index=False)

这就是我想要实现的目标

我正在尝试在清除它并删除不必要的括号，字符和引号/逗号之后，在列中的每个hashtags字符串之前添加“＃”。我在整个代码中执行了大量操作以进行数据清理和操作，它指向了这个错误。

错误

  File "C:/../filename.py", line 469, in <module>
    df1['Hashtags'] = "#" + df1['Hashtags']

  File "C:\ANACONDA\lib\site-packages\pandas\core\ops.py", line 715, in wrapper
    result = wrap_results(safe_na_op(lvalues, rvalues))

  File "C:\ANACONDA\lib\site-packages\pandas\core\ops.py", line 676, in safe_na_op
    return na_op(lvalues, rvalues)

  File "C:\ANACONDA\lib\site-packages\pandas\core\ops.py", line 662, in na_op
    result[mask] = op(x[mask], y)

  File "C:\ANACONDA\lib\site-packages\pandas\core\ops.py", line 70, in <lambda>
    radd=arith_method(lambda x, y: y + x, names('radd'), op('+'),

TypeError: ufunc 'add' did not contain a loop with signature matching types dtype('S32') dtype('S32') dtype('S32')

1 个答案:

答案 0 :(得分：0)

我认为最好不要使用iterrows循环，如果存在更快的矢量化解决方案。

也许有助于取代：

for index,row in df1.iterrows():
    df1['Hashtags'] =df1['Hashtags'].str.strip("u'  ',")

for index,row in df1.iterrows():
    df1['Hashtags'] = df1['Hashtags'].str.strip("',")

for index,row in df1.iterrows():
    df1['Hashtags'] = df1['Hashtags'].str.strip("u'")

加倍str.strip - 首先删除字符u,和第二个'：

df1['Hashtags'] = df1['Hashtags'].str.strip("[u, ]").str.strip("'")
df1['Hashtags'] = "#" + df1['Hashtags']

或添加astype：

df1['Hashtags'] = "#" + df1['Hashtags'].astype(str)