我想为前三列中具有相同值的所有行分组一个6列数据框,然后我要添加一个新列,其中最后一个列的值为第4列的值= 0
原始数据框如下所示:
A B C D E F G
0 11018 20190102 0 0 1546387200 37 34
1 11018 20190102 0 1 1546390800 33 36
2 11018 20190102 0 2 1546394400 19 19
3 11018 20190102 0 3 1546398000 17 26
4 11018 20190102 0 4 1546401600 16 26
5 11018 20190102 0 5 1546405200 13 23
6 11018 20190102 0 6 1546408800 11 15
7 11018 20190102 1200 0 1546430400 25 24
8 11018 20190102 1200 1 1546434000 21 3
9 11018 20190102 1200 2 1546437600 13 4
10 11018 20190102 1200 3 1546441200 7 3
11 11018 20190102 1200 4 1546444800 2 1
12 11018 20190102 1200 5 1546448400 -3 6
13 11018 20190102 1200 6 1546452000 -7 2
14 11035 20190103 0 0 1546473600 -15 -14
15 11035 20190103 0 1 1546477200 -17 -11
16 11035 20190103 0 2 1546480800 -20 -12
17 11035 20190103 0 3 1546484400 -23 -16
18 11035 20190103 0 4 1546488000 -26 -11
19 11035 20190103 0 5 1546491600 -28 -11
20 11035 20190103 0 6 1546495200 -27 -12
21 11031 20190103 1100 0 1546516800 0 1
22 11031 20190103 1100 1 1546520400 4 -7
23 11031 20190103 1100 2 1546524000 5 -6
24 11031 20190103 1100 3 1546527600 2 -16
25 11031 20190103 1100 4 1546531200 -3 -14
26 11031 20190103 1100 5 1546534800 -8 -12
27 11031 20190103 1100 6 1546538400 -12 -14
.
.
.
.
新数据框应为:
A B C D E F G H
0 11018 20190102 0 0 1546387200 37 34 34
1 11018 20190102 0 1 1546390800 33 36 34
2 11018 20190102 0 2 1546394400 19 19 34
3 11018 20190102 0 3 1546398000 17 26 34
4 11018 20190102 0 4 1546401600 16 26 34
5 11018 20190102 0 5 1546405200 13 23 34
6 11018 20190102 0 6 1546408800 11 15 34
7 11018 20190102 1200 0 1546430400 25 24 24
8 11018 20190102 1200 1 1546434000 21 3 24
9 11018 20190102 1200 2 1546437600 13 4 24
10 11018 20190102 1200 3 1546441200 7 3 24
11 11018 20190102 1200 4 1546444800 2 1 24
12 11018 20190102 1200 5 1546448400 -3 6 24
13 11018 20190102 1200 6 1546452000 -7 2 24
14 11035 20190103 0 0 1546473600 -15 -14 -14
15 11035 20190103 0 1 1546477200 -17 -11 -14
16 11035 20190103 0 2 1546480800 -20 -12 -14
17 11035 20190103 0 3 1546484400 -23 -16 -14
18 11035 20190103 0 4 1546488000 -26 -11 -14
19 11035 20190103 0 5 1546491600 -28 -11 -14
20 11035 20190103 0 6 1546495200 -27 -12 -14
21 11031 20190103 1100 0 1546516800 0 1 1
22 11031 20190103 1100 1 1546520400 4 -7 1
23 11031 20190103 1100 2 1546524000 5 -6 1
24 11031 20190103 1100 3 1546527600 2 -16 1
25 11031 20190103 1100 4 1546531200 -3 -14 1
26 11031 20190103 1100 5 1546534800 -8 -12 1
27 11031 20190103 1100 6 1546538400 -12 -14 1
.
.
.
.
这里我已经找到了以下格式的解决方案:
def col_6(df):
df['H'] = df[df['D'] == 0]['G'].values[0]
return df
df.groupby(['A','B','C']).apply(col_6)
但是:在某些情况下,第4列的值= 0的行会丢失。在这种情况下,应将组的其他行(第4列= 1,2,..)设置为NaN。
例如原始帧:
A B C D E F G
0 11018 20190102 0 0 1546387200 37 34
1 11018 20190102 0 1 1546390800 33 36
2 11018 20190102 0 2 1546394400 19 19
3 11018 20190102 0 3 1546398000 17 26
4 11018 20190102 0 4 1546401600 16 26
5 11018 20190102 0 5 1546405200 13 23
6 11018 20190102 0 6 1546408800 11 15
7 11018 20190102 1200 1 1546434000 21 3
8 11018 20190102 1200 2 1546437600 13 4
9 11018 20190102 1200 3 1546441200 7 3
10 11018 20190102 1200 4 1546444800 2 1
11 11018 20190102 1200 5 1546448400 -3 6
12 11018 20190102 1200 6 1546452000 -7 2
最后一帧应如下所示:
A B C D E F G H
0 11018 20190102 0 0 1546387200 37 34 34
1 11018 20190102 0 1 1546390800 33 36 34
2 11018 20190102 0 2 1546394400 19 19 34
3 11018 20190102 0 3 1546398000 17 26 34
4 11018 20190102 0 4 1546401600 16 26 34
5 11018 20190102 0 5 1546405200 13 23 34
6 11018 20190102 0 6 1546408800 11 15 34
7 11018 20190102 1200 1 1546434000 21 3 nan
8 11018 20190102 1200 2 1546437600 13 4 nan
9 11018 20190102 1200 3 1546441200 7 3 nan
10 11018 20190102 1200 4 1546444800 2 1 nan
11 11018 20190102 1200 5 1546448400 -3 6 nan
12 11018 20190102 1200 6 1546452000 -7 2 nan
是否存在有效的解决方案,如何解决缺少行的问题(基于上述一般解决方案)?
非常感谢您的帮助!
答案 0 :(得分:1)
首先仅过滤0
行,并按组汇总first
,然后按DataFrame.join
添加新列:
s = (df[df['D'] == 0].groupby(['A','B','C'])['G'].first()).rename('H')
df = df.join(s, on=['A','B','C'])
print (df)
A B C D E F G H
0 11018 20190102 0 0 1546387200 37 34 34.0
1 11018 20190102 0 1 1546390800 33 36 34.0
2 11018 20190102 0 2 1546394400 19 19 34.0
3 11018 20190102 0 3 1546398000 17 26 34.0
4 11018 20190102 0 4 1546401600 16 26 34.0
5 11018 20190102 0 5 1546405200 13 23 34.0
6 11018 20190102 0 6 1546408800 11 15 34.0
7 11018 20190102 1200 1 1546434000 21 3 NaN
8 11018 20190102 1200 2 1546437600 13 4 NaN
9 11018 20190102 1200 3 1546441200 7 3 NaN
10 11018 20190102 1200 4 1546444800 2 1 NaN
11 11018 20190102 1200 5 1546448400 -3 6 NaN
12 11018 20190102 1200 6 1546452000 -7 2 NaN