我有以下数据框。
order_id Clusters
0 519 Cluster 5
1 520 Cluster 1
2 521 Cluster 1
3 523 Cluster 5
4 524 Cluster 1
5 525 Cluster 4
6 526 Cluster 4
7 527 Cluster 1
8 528 Cluster 2
9 529 Cluster 5
10 530 Cluster 6
11 531 Cluster 3
12 532 Cluster 1
13 533 Cluster 4
14 534 Cluster 5
15 535 Cluster 5
我希望从以上数据框中删除以下系列。
Cluster 1 [520 ,521, 524, 527, 532]
Cluster 2 [528]
Cluster 3 [531]
Cluster 4 [525,526,533]
Cluster 5 [519,523,529,534,535]
Cluster 6 [530]
这是我在python中的方法。
clusters_order_id = []
df_clusters = df.groupby('Clusters')
for i in df_clusters['order_id']:
clusters_order_id.append(i)
给了我
clusters_order_id
Out[196]:
0 (Cluster 1, [520, 521, 524, 527, 532])
1 (Cluster 2, [528])
2 (Cluster 3, [531])
3 (Cluster 4, [525, 526, 533])
4 (Cluster 5, [519, 523, 529, 534, 535])
5 (Cluster 6, [530])
但是我没有得到如何分成上面的一系列形式。因此Cluster 1,Cluster 2将成为我的索引,相应的order id将成为一个数组。请帮忙。
答案 0 :(得分:2)
pivot_table
的另一个解决方案:
In [473]: df.pivot_table(index='Clusters', aggfunc=pd.Series.tolist)
Out[473]:
order_id
Clusters
Cluster 1 [520, 521, 524, 527, 532]
Cluster 2 [528]
Cluster 3 [531]
Cluster 4 [525, 526, 533]
Cluster 5 [519, 523, 529, 534, 535]
Cluster 6 [530]
答案 1 :(得分:1)
print df.groupby('Clusters')['order_id'].apply(lambda x: x.tolist())
Clusters
Cluster 1 [520, 521, 524, 527, 532]
Cluster 2 [528]
Cluster 3 [531]
Cluster 4 [525, 526, 533]
Cluster 5 [519, 523, 529, 534, 535]
Cluster 6 [530]
Name: order_id, dtype: object
定时:
In [153]: %timeit df.groupby('Clusters')['order_id'].apply(lambda x: x.tolist())
1000 loops, best of 3: 751 µs per loop
In [154]: %timeit df.pivot_table(index='Clusters', aggfunc=pd.Series.tolist)
100 loops, best of 3: 3.55 ms per loop