我有以下DataFrame df
:
center status devices
1 Green [d1, d2]
1 Green [d5, d1, d2]
2 Green []
3 Green [d5, d6]
我需要展开{{1}}列中的列表。目标是按devices
和center
对数据进行分组,然后计算每组的观察次数。
预期结果如下:
device
答案 0 :(得分:3)
先展平list
,然后再按DataFrameGroupBy.size
进行汇总:
#create Series
s = df.set_index('center')['devices']
#create DataFrame, reshape by stack and conver MultiIndex to columns
df = pd.DataFrame(s.values.tolist(), index=s.index).stack().reset_index()
df.columns= ['center','i','devices']
#aggregate count
df = df.groupby(['center','devices']).size().reset_index(name='count')
print (df)
center device count
0 1 d1 2
1 1 d2 2
2 1 d5 1
3 3 d5 1
4 3 d6 1
另一种提高性能的解决方案:
from itertools import chain
df = pd.DataFrame({
'devices' : list(chain.from_iterable(df['devices'].tolist())),
'center' : df['center'].values.repeat(df['devices'].str.len())
})
df = df.groupby(['center','devices']).size().reset_index(name='count')
print (df)
center devices count
0 1 d1 2
1 1 d2 2
2 1 d5 1
3 3 d5 1
4 3 d6 1
答案 1 :(得分:2)
在过滤掉空白列表后使用unnesting,然后使用groupby
size
unnesting(df[df.devices.astype(bool)],['devices']).groupby(['center','devices']).size().reset_index(name='count')
Out[214]:
center devices count
0 1 d1 2
1 1 d2 2
2 1 d5 1
3 3 d5 1
4 3 d6 1
def unnesting(df, explode):
idx=df.index.repeat(df[explode[0]].str.len())
df1=pd.concat([pd.DataFrame({x:np.concatenate(df[x].values)} )for x in explode],axis=1)
df1.index=idx
return df1.join(df.drop(explode,1),how='left')