Question

我有这个数据框

ID  type    name    number  description comments
1   short   A       2       XX          xxx
1   short   B                               
1   short   C       4       YY          yyy
1   full    A/B/C
2   short   E       
2   short   F       9       ZZ          zzz                     
2   short   G       7       WW          www
2   short   H       
2   full    E/F/G/H

我想通过将type full中number，description和comments列值（如果存在）折叠在type short行中来将其转换为行

id  type    name    number  description comments
1   full    A/B/C   2/4     XX/YY       xxx/yyy
2   full    E/F/G/H 9/7     ZZ/WW       zzz/www

我尝试使用聚合和groupby函数，但没有成功。

你能帮我吗？

谢谢！

Answer 1

您可以使用具有lambda函数的dict.fromkeys为没有列id的所有列和字典d1中的键创建动态字典，然后传递给GroupBy.agg：

f = lambda x: '/'.join(x.dropna().astype(str))

d1 = {'type':'last', 'name':'last'}
d2 = dict.fromkeys(df.columns.difference(['id'] + list(d1.keys())), f)
d = {**d1, **d2}    

df = df.groupby('id', sort=False, as_index=False).agg(d)
print (df)
   id  type     name comments description   number
0   1  full    A/B/C  xxx/yyy       XX/YY  2.0/4.0
1   2  full  E/F/G/H  zzz/www       ZZ/WW  9.0/7.0

如果需要按类型处理lambda函数中的值-例如数字和非数字列的总和：

f = lambda x: x.sum() if np.issubdtype(x.dtype, np.number) else '/'.join(x.dropna())

d1 = {'type':'last', 'name':'last'}
d2 = dict.fromkeys(df.columns.difference(['id'] + list(d1.keys())), f)
d = {**d1, **d2}           
df = df.groupby('id', sort=False, as_index=False).agg(d)
print (df)
   id  type     name comments description  number
0   1  full    A/B/C  xxx/yyy       XX/YY     6.0
1   2  full  E/F/G/H  zzz/www       ZZ/WW    16.0

熊猫按类型

1 个答案: