在尝试将数据帧中的所有float列除以相同列的预先指定的切片时,我将获得正确的值,直到尝试将索引链接到父数据帧为止。您能帮我找到一种按时间正确索引的方法吗?
由于在此论坛上提供了一些帮助,用于划分数据帧的lamda函数运行良好,但是在尝试设置索引后似乎有所改变。
In [18]: df
Out[18]:
Time Well 1 2 3 4
0 0:00:00 A 0.0000 0.0000 0.0000 0.0000
1 0:00:00 B 0.0000 0.0000 0.0000 0.0000
2 0:00:00 C 0.0000 0.0000 0.0000 0.0000
3 0:00:00 D 0.0000 0.0000 0.0000 0.0000
4 0:00:00 E 0.0000 0.0000 0.0000 0.0000
5 0:00:00 F 0.0000 0.0000 0.0000 0.0000
6 0:00:00 G 0.0000 0.0000 0.0000 0.0000
7 0:00:00 H 0.0000 0.0000 0.0000 0.0000
8 0:00:14 A 0.0002 0.0014 0.0001 -0.0017
9 0:00:14 B 0.0024 -0.0020 -0.0016 -0.0006
10 0:00:14 C 0.0027 0.0018 0.0003 0.0024
11 0:00:14 D 0.0019 0.0019 0.0025 0.0013
12 0:00:14 E 0.0024 0.0021 0.0012 0.0005
13 0:00:14 F 0.0017 0.0015 -0.0003 0.0006
14 0:00:14 G 0.0003 0.0001 0.0001 -0.0017
15 0:00:14 H 0.0003 -0.0006 -0.0008 -0.0001
16 1:24:16 A 0.0293 0.0533 0.0223 0.0131
17 1:24:16 B 0.0295 0.0268 0.0200 0.0079
18 1:24:16 C 0.0373 0.0381 0.0165 0.0198
19 1:24:16 D 0.0327 0.0277 0.0282 0.0162
20 1:24:16 E 0.0400 0.0339 0.0234 0.0186
21 1:24:16 F 0.0270 0.0298 0.0141 0.0150
22 1:24:16 G 0.0215 0.0176 0.0114 0.0163
23 1:24:16 H 0.0251 0.0166 0.0292 0.0287
24 10:09:43 A 0.5072 0.6620 0.5092 0.5133
25 10:09:43 B 0.6089 0.5283 0.5426 0.4787
26 10:09:43 C 0.6340 0.6379 0.5221 0.5884
27 10:09:43 D 0.6167 0.5926 0.5856 0.5639
28 10:09:43 E 0.6512 0.6605 0.5561 0.5234
29 10:09:43 F 0.6168 0.6490 0.5577 0.5390
30 10:09:43 G 0.6312 0.5739 0.5221 0.5121
31 10:09:43 H 0.5123 0.5036 0.5052 0.4465
#### Current Output Received #####
test = df.groupby('Time').apply(lambda x: x.iloc[:,2:6].reset_index(drop=True)/df[df['Time']=='1:24:16'].iloc[:,2:6].reset_index(drop=True))
In [20]: test
Out[20]:
1 2 3 4
Time
0:00:00 0 0.000000 0.000000 0.000000 0.000000
1 0.000000 0.000000 0.000000 0.000000
2 0.000000 0.000000 0.000000 0.000000
3 0.000000 0.000000 0.000000 0.000000
4 0.000000 0.000000 0.000000 0.000000
5 0.000000 0.000000 0.000000 0.000000
6 0.000000 0.000000 0.000000 0.000000
7 0.000000 0.000000 0.000000 0.000000
0:00:14 0 0.006826 0.026266 0.004484 -0.129771
1 0.081356 -0.074627 -0.080000 -0.075949
2 0.072386 0.047244 0.018182 0.121212
3 0.058104 0.068592 0.088652 0.080247
4 0.060000 0.061947 0.051282 0.026882
5 0.062963 0.050336 -0.021277 0.040000
6 0.013953 0.005682 0.008772 -0.104294
7 0.011952 -0.036145 -0.027397 -0.003484
10:09:43 0 17.310580 12.420263 22.834081 39.183206
1 20.640678 19.712687 27.130000 60.594937
2 16.997319 16.742782 31.642424 29.717172
3 18.859327 21.393502 20.765957 34.808642
4 16.280000 19.483776 23.764957 28.139785
5 22.844444 21.778523 39.553191 35.933333
6 29.358140 32.607955 45.798246 31.417178
7 20.410359 30.337349 17.301370 15.557491
1:24:16 0 1.000000 1.000000 1.000000 1.000000
1 1.000000 1.000000 1.000000 1.000000
2 1.000000 1.000000 1.000000 1.000000
3 1.000000 1.000000 1.000000 1.000000
4 1.000000 1.000000 1.000000 1.000000
5 1.000000 1.000000 1.000000 1.000000
6 1.000000 1.000000 1.000000 1.000000
7 1.000000 1.000000 1.000000 1.000000
#### Now attemting to adjust the index ####
test.index = df.set_index(['Time','Well']).index
In [22]: test
Out[22]:
1 2 3 4
Time Well
0:00:00 A 0.000000 0.000000 0.000000 0.000000
B 0.000000 0.000000 0.000000 0.000000
C 0.000000 0.000000 0.000000 0.000000
D 0.000000 0.000000 0.000000 0.000000
E 0.000000 0.000000 0.000000 0.000000
F 0.000000 0.000000 0.000000 0.000000
G 0.000000 0.000000 0.000000 0.000000
H 0.000000 0.000000 0.000000 0.000000
0:00:14 A 0.006826 0.026266 0.004484 -0.129771
B 0.081356 -0.074627 -0.080000 -0.075949
C 0.072386 0.047244 0.018182 0.121212
D 0.058104 0.068592 0.088652 0.080247
E 0.060000 0.061947 0.051282 0.026882
F 0.062963 0.050336 -0.021277 0.040000
G 0.013953 0.005682 0.008772 -0.104294
H 0.011952 -0.036145 -0.027397 -0.003484
1:24:16 A 17.310580 12.420263 22.834081 39.183206
B 20.640678 19.712687 27.130000 60.594937
C 16.997319 16.742782 31.642424 29.717172
D 18.859327 21.393502 20.765957 34.808642
E 16.280000 19.483776 23.764957 28.139785
F 22.844444 21.778523 39.553191 35.933333
G 29.358140 32.607955 45.798246 31.417178
H 20.410359 30.337349 17.301370 15.557491
10:09:43 A 1.000000 1.000000 1.000000 1.000000
B 1.000000 1.000000 1.000000 1.000000
C 1.000000 1.000000 1.000000 1.000000
D 1.000000 1.000000 1.000000 1.000000
E 1.000000 1.000000 1.000000 1.000000
F 1.000000 1.000000 1.000000 1.000000
G 1.000000 1.000000 1.000000 1.000000
H 1.000000 1.000000 1.000000 1.000000
#### My oversimplified approach to re-introduce 'Time' and 'Well' only puts
#in nan values.
test[['Time', 'Well']] = df[['Time','Well']]
您可以看到1的矩阵从时间= 1:24:16移到时间= 10:09:43。我希望找到一种按时间正确建立索引的方法,或者至少是一种将核心信息存储到新划分的数据框中的“时间”和“井”列的方法。
答案 0 :(得分:1)
您需要使用reset_index(drop=True)
,然后重新分配输出数据框的索引,如下所示:
test = df.groupby('Time', sort=False).apply(lambda x: x.iloc[:,2:6].reset_index(drop=True)/df[df['Time']=='0:00:14'].iloc[:,2:6].reset_index(drop=True))
test.index = df.set_index(['Time','Well']).index
输出:
1 2 3 4
Time Well
0:00:00 A 0.000000 0.000000 0.00 -0.000000
B 0.000000 -0.000000 -0.00 -0.000000
C 0.000000 0.000000 0.00 0.000000
D 0.000000 0.000000 0.00 0.000000
E 0.000000 0.000000 0.00 0.000000
F 0.000000 0.000000 -0.00 0.000000
G 0.000000 0.000000 0.00 -0.000000
H 0.000000 -0.000000 -0.00 -0.000000
0:00:14 A 1.000000 1.000000 1.00 1.000000
B 1.000000 1.000000 1.00 1.000000
C 1.000000 1.000000 1.00 1.000000
D 1.000000 1.000000 1.00 1.000000
E 1.000000 1.000000 1.00 1.000000
F 1.000000 1.000000 1.00 1.000000
G 1.000000 1.000000 1.00 1.000000
H 1.000000 1.000000 1.00 1.000000
1:24:16 A 146.500000 38.071429 223.00 -7.705882
B 12.291667 -13.400000 -12.50 -13.166667
C 13.814815 21.166667 55.00 8.250000
D 17.210526 14.578947 11.28 12.461538
E 16.666667 16.142857 19.50 37.200000
F 15.882353 19.866667 -47.00 25.000000
G 71.666667 176.000000 114.00 -9.588235
H 83.666667 -27.666667 -36.50 -287.000000