假设我们有以下输入DataFrame:
In [80]: %paste
data = {
'Item_2_id': {0: 24, 1: 41, 2: 34},
'Item_2_quantity': {0: 4, 1: 1, 2: 4},
'Item_3_id': {0: 16, 1: 33, 2: 8},
'Item_3_quantity': {0: 1, 1: 1, 2: 2},
'customer_name': {0: 'John', 1: 'Paul', 2: 'Andrew'},
'item_1_id': {0: 4, 1: 8, 2: 1},
'item_1_quantity': {0: 1, 1: 3, 2: 1},
'order_id': {0: 1, 1: 2, 2: 3}
}
cols = 'order_id customer_name item_1_id item_1_quantity Item_2_id Item_2_quantity Item_3_id Item_3_quantity'.split()
df = pd.DataFrame(data)[cols]
df
## -- End pasted text --
Out[80]:
order_id customer_name item_1_id item_1_quantity Item_2_id Item_2_quantity Item_3_id Item_3_quantity
0 1 John 4 1 24 4 16 1
1 2 Paul 8 3 41 1 33 1
2 3 Andrew 1 1 34 4 8 2
我们如何对所有id
和quantity
列进行分组,以便我们获得以下所需的DataFrame:
In [85]: result
Out[85]:
order_id customer_name id quantity
0 1 John [4, 24, 16] [1, 4, 1]
1 2 Paul [8, 41, 33] [3, 1, 1]
2 3 Andrew [1, 34, 8] [1, 4, 2]
我的尝试:
In [191]: id_vars = ['order_id','customer_name']
In [192]: df.set_index(id_vars) \
.groupby(lambda x: x.split('_')[-1], axis=1) \
.agg(lambda x: x.tolist())
Out[192]:
id quantity
order_id customer_name
1 John (i, t, e, m, _, 1, _, i, d) (i, t, e, m, _, 1, _, q, u, a, n, t, i, t, y)
2 Paul (I, t, e, m, _, 2, _, i, d) (I, t, e, m, _, 2, _, q, u, a, n, t, i, t, y)
3 Andrew (I, t, e, m, _, 3, _, i, d) (I, t, e, m, _, 3, _, q, u, a, n, t, i, t, y)
如果我只是打印它 - 它可以正常工作:
In [193]: df.set_index(id_vars) \
.groupby(lambda x: x.split('_')[-1], axis=1) \
.agg(lambda x: print(x.tolist()))
[4, 24, 16]
[8, 41, 33]
[1, 34, 8]
[1, 4, 1]
[3, 1, 1]
[1, 4, 2]
Out[193]:
id quantity
order_id customer_name
1 John None None
2 Paul None None
3 Andrew None None
实际上,当我试图回答another question并找到另一种解决方案时,PS实际上我遇到了这个问题,但我觉得必须有更优雅的解决方案,它使用的是:
df.groupby(..., axis=1).agg(...)
或
df.groupby(..., axis=1).apply(...)
答案 0 :(得分:3)
以下是a very nice solution from @DSM:
In [123]: df.set_index(['order_id','customer_name']) \
...: .groupby(lambda x: x.split('_')[-1], axis=1) \
...: .agg(lambda x: x.values.tolist())
...:
Out[123]:
id quantity
order_id customer_name
1 John [4, 24, 16] [1, 4, 1]
2 Paul [8, 41, 33] [3, 1, 1]
3 Andrew [1, 34, 8] [1, 4, 2]
答案 1 :(得分:3)
使用filter
df[['order_id', 'customer_name']].assign(
id=df.filter(regex='[Ii]tem_\d+_id').values.tolist(),
quantity=df.filter(regex='[Ii]tem_\d+_quantity').values.tolist()
)
order_id customer_name id quantity
0 1 John [4, 24, 16] [1, 4, 1]
1 2 Paul [8, 41, 33] [3, 1, 1]
2 3 Andrew [1, 34, 8] [1, 4, 2]
答案 2 :(得分:1)
不如@MaxU:
vega-lite