groupby和join text列

时间:2016-03-30 09:29:16

标签: python pandas dataframe

我有一个带有此标头的{cvv}文件text|business_id

我想将与一项业务相关的所有文本分组

我使用了review_data=review_data.groupby(['business_id'])['text'].apply("".join)

review_data就像:

                                                   text  \
0     mr hoagi institut walk doe seem like throwback...   
1     excel food superb custom servic miss mario mac...   
2     yes place littl date open weekend staff alway ... 

         business_id  
0     5UmKMjUEUNdYWqANhGckJw  
1     5UmKMjUEUNdYWqANhGckJw  
2     5UmKMjUEUNdYWqANhGckJw

我收到此错误:TypeError: sequence item 131: expected string, float found

这些是第130至132行:

130 use order fair often  past 2 year food get progress wors everi time order  doesnt help owner alway regist rude everi time  final decid im done  dont think feel let inconveni order food restaur  let alon one food isnt even good also insid dirti heck  deliv food bmw cant buy scrub brush  found golden dragon collier squar 100 time better|SQ0j7bgSTazkVQlF5AnqyQ
131 popular denni|wqu7ILomIOPSduRwoWp4AQ
132 want smth quick late night would say denni|wqu7ILomIOPSduRwoWp4AQ

1 个答案:

答案 0 :(得分:0)

我认为您需要在groupby之前使用notnull过滤boolean indexing数据:

print review_data
          text             business_id
0    mr hoagi   5UmKMjUEUNdYWqANhGckJw
1  excel food   5UmKMjUEUNdYWqANhGckJw
2          NaN  5UmKMjUEUNdYWqANhGckJw
3   yes place   5UmKMjUEUNdYWqANhGckJw


review_data = review_data[review_data['text'].notnull()]
print review_data
          text             business_id
0    mr hoagi   5UmKMjUEUNdYWqANhGckJw
1  excel food   5UmKMjUEUNdYWqANhGckJw
3   yes place   5UmKMjUEUNdYWqANhGckJw

review_data=review_data.groupby(['business_id'])['text'].apply("".join)
print review_data
business_id
5UmKMjUEUNdYWqANhGckJw    mr hoagi excel food yes place 
Name: text, dtype: object