我在使用pandas数据框时比较陌生,并且存在分组问题:我想为前三列中具有相同值的所有行分组一个6列数据框,然后我要添加一个新列最后一列的值,其中第四列的值= 0。
因此,原始数据帧如下所示:
A B C D E F G
0 11018 20190102 0 0 1546387200 37 34
1 11018 20190102 0 1 1546390800 33 36
2 11018 20190102 0 2 1546394400 19 19
3 11018 20190102 0 3 1546398000 17 26
4 11018 20190102 0 4 1546401600 16 26
5 11018 20190102 0 5 1546405200 13 23
6 11018 20190102 0 6 1546408800 11 15
7 11018 20190102 1200 0 1546430400 25 24
8 11018 20190102 1200 1 1546434000 21 3
9 11018 20190102 1200 2 1546437600 13 4
10 11018 20190102 1200 3 1546441200 7 3
11 11018 20190102 1200 4 1546444800 2 1
12 11018 20190102 1200 5 1546448400 -3 6
13 11018 20190102 1200 6 1546452000 -7 2
14 11035 20190103 0 0 1546473600 -15 -14
15 11035 20190103 0 1 1546477200 -17 -11
16 11035 20190103 0 2 1546480800 -20 -12
17 11035 20190103 0 3 1546484400 -23 -16
18 11035 20190103 0 4 1546488000 -26 -11
19 11035 20190103 0 5 1546491600 -28 -11
20 11035 20190103 0 6 1546495200 -27 -12
21 11031 20190103 1100 0 1546516800 0 1
22 11031 20190103 1100 1 1546520400 4 -7
23 11031 20190103 1100 2 1546524000 5 -6
24 11031 20190103 1100 3 1546527600 2 -16
25 11031 20190103 1100 4 1546531200 -3 -14
26 11031 20190103 1100 5 1546534800 -8 -12
27 11031 20190103 1100 6 1546538400 -12 -14
.
.
.
.
等
新数据框应为:
A B C D E F G H
0 11018 20190102 0 0 1546387200 37 34 34
1 11018 20190102 0 1 1546390800 33 36 34
2 11018 20190102 0 2 1546394400 19 19 34
3 11018 20190102 0 3 1546398000 17 26 34
4 11018 20190102 0 4 1546401600 16 26 34
5 11018 20190102 0 5 1546405200 13 23 34
6 11018 20190102 0 6 1546408800 11 15 34
7 11018 20190102 1200 0 1546430400 25 24 24
8 11018 20190102 1200 1 1546434000 21 3 24
9 11018 20190102 1200 2 1546437600 13 4 24
10 11018 20190102 1200 3 1546441200 7 3 24
11 11018 20190102 1200 4 1546444800 2 1 24
12 11018 20190102 1200 5 1546448400 -3 6 24
13 11018 20190102 1200 6 1546452000 -7 2 24
14 11035 20190103 0 0 1546473600 -15 -14 -14
15 11035 20190103 0 1 1546477200 -17 -11 -14
16 11035 20190103 0 2 1546480800 -20 -12 -14
17 11035 20190103 0 3 1546484400 -23 -16 -14
18 11035 20190103 0 4 1546488000 -26 -11 -14
19 11035 20190103 0 5 1546491600 -28 -11 -14
20 11035 20190103 0 6 1546495200 -27 -12 -14
21 11031 20190103 1100 0 1546516800 0 1 1
22 11031 20190103 1100 1 1546520400 4 -7 1
23 11031 20190103 1100 2 1546524000 5 -6 1
24 11031 20190103 1100 3 1546527600 2 -16 1
25 11031 20190103 1100 4 1546531200 -3 -14 1
26 11031 20190103 1100 5 1546534800 -8 -12 1
27 11031 20190103 1100 6 1546538400 -12 -14 1
.
.
.
.
等
有解决此问题的简便方法吗?请注意,原始数据帧中的行也可以混合在一起。感谢您的帮助!
答案 0 :(得分:1)
替代解决方案:
def col_6(df):
df['H'] = df[df['D'] == 0]['G'].values[0]
return df
df.groupby(['A','B','C']).apply(col_6)