我在Python中获得了以下数据框:
d = pd.DataFrame({'id': [1, 1, 1, 2, 2, 3],
'col1': ['normal', 'well', 'normal', 'normal', 'well', 'normal'],
'col2': ['bad', 'normal','normal', 'normal', 'normal', 'bad']})
我想按id汇总,但如果没有其他内容('well'或'bad'),请保留列为'normal'或'normal'以外的字符串。如下所示:
result = pd.DataFrame({'id': [1, 2, 3],
'col1': ['well', 'well', 'normal'],
'col2': ['bad', 'normal', 'bad']})
我正在考虑排序,然后使用groupby和.first但不确定如何在每列的顶部获得所需的级别。
答案 0 :(得分:5)
使用分类来定义订单
cats = ['well', 'bad', 'normal']
d = d.assign(
col1=pd.Categorical(d.col1, cats, ordered=True),
col2=pd.Categorical(d.col2, cats, ordered=True)
)
d.groupby('id', as_index=False).min()
id col1 col2
0 1 well bad
1 2 well normal
2 3 normal bad
答案 1 :(得分:4)
如果在replace
之前没有NaN
s值,请先NaN
使用GroupBy.first
:
d = d.replace('normal', np.nan).groupby('id').first().fillna('normal')
#alternative solution
d = d.mask(d == 'normal').groupby('id').first().fillna('normal')
print (d)
col1 col2
id
1 well bad
2 well normal
3 normal bad
答案 2 :(得分:2)
创建帮助键以帮助排序,然后我们执行Traceback (most recent call last):
File "test_request.py", line 53, in
grpc_request()
File "test_request.py", line 50, in grpc_request
response = stub.Predict(request=request,metadata=metadata)
File "/usr/local/lib/python2.7/dist-packages/grpc/_channel.py", line 487, in call
return _end_unary_response_blocking(state, call, False, deadline)
File "/usr/local/lib/python2.7/dist-packages/grpc/_channel.py", line 437, in _end_unary_response_blocking
raise _Rendezvous(state, None, None, deadline)
grpc._channel._Rendezvous: <_Rendezvous of RPC that terminated with (StatusCode.UNAVAILABLE, OS Error)>
groupby