请考虑以下数据。
我在数据上缺少一些值,即
是否有一种GroupBy
方法可以填充数据,使数据对称?在某种意义上说是对称的,它由['Project', 'Stage']
分组,因此不具有该类别的数据通过将它们填充预定义/计算的值(在此示例中,值为0)来扩展。为了更清楚,请在底部看到所需的输出。
我浏览了文档,但是迷失在文档海中。真的很感谢一些提示或指示!谢谢!
import pandas as pd
from io import StringIO
import numpy as np
data = """ x y Project Stage
0 -10.540327 61.649657999999995 P1 S2
1 32.379704 -130.71579599999998 P1 S2
2 -2.620516 71.260701 P1 S2
3 13.978546 31.711221000000002 P1 S4
4 41.391873 88.070636 P1 S4
5 -0.303671 -84.376145 P1 S4
6 -65.370553 -7.694756 P2 S3
7 9.062894 177.233938 P2 S3
8 17.029269 -100.364923 P2 S3
9 13.354712 73.78603100000001 P2 S2
10 18.781817999999998 -109.3554 P2 S2
11 67.189193 113.94348000000001 P2 S2
12 8.762746 128.735548 P2 S1
13 59.287251 11.017662 P2 S1
14 -2.4981430000000002 -166.897776 P2 S1
15 1.1900870000000001 -82.309501 P3 S3
16 21.938575 -118.20861699999999 P3 S3
17 45.414544 122.49246000000001 P3 S3
18 -36.322626 -63.310702 P3 S2
19 53.568994999999994 141.37641100000002 P3 S2
20 26.129929999999998 153.706601 P3 S2
21 -3.9384330000000003 102.90164399999999 P3 S6
22 -8.198288 137.616797 P3 S6
23 -8.240618 -146.80676 P3 S6
24 15.193395 25.697023 P3 S5
25 20.886185 68.356337 P3 S5
26 -22.101129999999998 89.639481 P3 S5"""
df = pd.read_csv(StringIO(data), sep='\t', index_col=0)
df.groupby(['Project', 'Stage']).count()
def create_df(p, s):
return pd.DataFrame({
'Project': [p] * 3,
'Stage': [s] * 3
})
df.append([
create_df('P1', 'S1'),
create_df('P1', 'S3'),
create_df('P2', 'S4')
]).groupby(['Project', 'Stage']).count()
答案 0 :(得分:2)
i
).reindex(index=pd.MultiIndex.from_product()
,并从步骤1和分组的两列传递i
。最后,填写0
,并在相关列中使用.sort_values
。代码:
i = [df['Project'].unique(), df['Stage'].unique()]
df = df.groupby(['Project', 'Stage']).count()
df = df.reindex(index=pd.MultiIndex.from_product(i, names=['Project', 'Stage']),
fill_value=0).reset_index().sort_values(['Project', 'Stage'])
df
输出:
Project Stage x y
3 P1 S1 0 0
0 P1 S2 3 3
2 P1 S3 0 0
1 P1 S4 3 3
5 P1 S5 0 0
4 P1 S6 0 0
9 P2 S1 3 3
6 P2 S2 3 3
8 P2 S3 3 3
7 P2 S4 0 0
11 P2 S5 0 0
10 P2 S6 0 0
15 P3 S1 0 0
12 P3 S2 3 3
14 P3 S3 3 3
13 P3 S4 0 0
17 P3 S5 3 3
16 P3 S6 3 3