我想将数据框中的数据排列成多个数据帧或组。输入数据是
id channel path
15 direct a1
15 direct a2
15 direct a3
15 direct a4
213 paid b2
213 paid b1
2222 direct as25
2222 direct dw46
2222 direct 32q
3111 paid d32a
3111 paid 23ff
3111 paid www32
3111 paid 2d2
所需的输出应该是
id channel p1 p2
213 paid b2 b2
id channel p1 p2 p3
2222 direct as25 dw46 dw46
id channel p1 p2 p3 p4
15 direct a1 a2 a3 a4
3111 paid d32a 23ff www32 2d2
请说明我能实现的方式。感谢
答案 0 :(得分:1)
我认为您可以先cumcount
然后pivot_table
创建帮助列cols
。
然后你需要找到notnull
列的长度(减去前2)和
这length
groupby
。每组中的最后dropna
列:
df['cols'] = 'p' + (df.groupby('id')['id'].cumcount() + 1).astype(str)
df1 = df.pivot_table(index=['id', 'channel'],
columns='cols',
values='path',
aggfunc='first').reset_index().rename_axis(None, axis=1)
print df1
id channel p1 p2 p3 p4
0 15 direct a1 a2 a3 a4
1 213 paid b2 b1 None None
2 2222 direct as25 dw46 32q None
3 3111 paid d32a 23ff www32 2d2
print df1.apply(lambda x: x.notnull().sum() - 2 , axis=1)
0 4
1 2
2 3
3 4
dtype: int64
for i, g in df1.groupby(df1.apply(lambda x: x.notnull().sum() - 2 , axis=1)):
print i
print g.dropna(axis=1)
2
id channel p1 p2
1 213 paid b2 b1
3
id channel p1 p2 p3
2 2222 direct as25 dw46 32q
4
id channel p1 p2 p3 p4
0 15 direct a1 a2 a3 a4
3 3111 paid d32a 23ff www32 2d2
要进行存储,您可以使用dictionary
的{{1}}:
DataFrames