我有一个带有此标头的{cvv}文件text|business_id
我想将与一项业务相关的所有文本分组
我使用了review_data=review_data.groupby(['business_id'])['text'].apply("".join)
review_data
就像:
text \
0 mr hoagi institut walk doe seem like throwback...
1 excel food superb custom servic miss mario mac...
2 yes place littl date open weekend staff alway ...
business_id
0 5UmKMjUEUNdYWqANhGckJw
1 5UmKMjUEUNdYWqANhGckJw
2 5UmKMjUEUNdYWqANhGckJw
我收到此错误:TypeError: sequence item 131: expected string, float found
这些是第130至132行:
130 use order fair often past 2 year food get progress wors everi time order doesnt help owner alway regist rude everi time final decid im done dont think feel let inconveni order food restaur let alon one food isnt even good also insid dirti heck deliv food bmw cant buy scrub brush found golden dragon collier squar 100 time better|SQ0j7bgSTazkVQlF5AnqyQ
131 popular denni|wqu7ILomIOPSduRwoWp4AQ
132 want smth quick late night would say denni|wqu7ILomIOPSduRwoWp4AQ
答案 0 :(得分:0)
我认为您需要在groupby
之前使用notnull
过滤boolean indexing数据:
print review_data
text business_id
0 mr hoagi 5UmKMjUEUNdYWqANhGckJw
1 excel food 5UmKMjUEUNdYWqANhGckJw
2 NaN 5UmKMjUEUNdYWqANhGckJw
3 yes place 5UmKMjUEUNdYWqANhGckJw
review_data = review_data[review_data['text'].notnull()]
print review_data
text business_id
0 mr hoagi 5UmKMjUEUNdYWqANhGckJw
1 excel food 5UmKMjUEUNdYWqANhGckJw
3 yes place 5UmKMjUEUNdYWqANhGckJw
review_data=review_data.groupby(['business_id'])['text'].apply("".join)
print review_data
business_id
5UmKMjUEUNdYWqANhGckJw mr hoagi excel food yes place
Name: text, dtype: object