完整的问题 -
搜索DataFrame中的每一列,以确定值的第一个实例何时大于存储在DataFrame中每列的最后一行中的值并输出索引
实施例。 df.head():
Well A1 A2 A3 A4
Temperature
25.0 371.335253 360.026443 253.228769 593.436104
25.2 331.957145 332.224668 233.607595 561.057715
25.4 305.472591 303.777874 213.500582 535.310186
25.6 285.713623 274.069361 202.024427 515.261876
25.8 252.716374 254.610848 181.719415 488.988468
实施例。 df.tail():
Well A1 A2 A3 A4
Temperature
94.79 -441.775980 -664.549239 1060.674188 1158.481056
94.99 -492.189733 -709.521424 1029.628209 1087.625128
mean 280.759521 283.417750 201.471571 519.939366
std 72.404373 69.023406 45.447202 58.150127
4*std 570.377014 559.511373 383.260378 752.539875
我想在A1(570.37)中使用4 * std的值,并从列顶部开始搜索A1中第一个大于(570.37)的值并输出Temperature。我需要为所有列重复此操作。
我希望输出作为一个新的数据帧,就像下面这个例子一样......我对如何构建它感到很遗憾?
Well Temp
A1 26.0
A2 27.6
A3 26.8
... ...
H12 27.2
我将不胜感激任何帮助!
答案 0 :(得分:0)
我相信如果每列都有更大的价值,你需要:
print (df)
A1 A2 A3 A4
Well Temperature
25.0 371.335253 360.026443 253.228769 593.436104
25.2 331.957145 632.224668 233.607595 561.057715
25.4 3005.472591 303.777874 213.500582 535.310186
25.6 285.713623 274.069361 202.024427 515.261876
25.8 252.716374 254.610848 181.719415 488.988468
94.79 -441.775980 -664.549239 1060.674188 1158.481056
94.99 -492.189733 -709.521424 1029.628209 1087.625128
mean 280.759521 283.417750 201.471571 519.939366
std 72.404373 69.023406 45.447202 58.150127
4*std 570.377014 559.511373 383.260378 752.539875
df1 = df.iloc[:-3].gt(df.iloc[-1]).idxmax().rename_axis('Well').reset_index(name='Temp')
print (df1)
Well Temp
0 A1 25.4
1 A2 25.2
2 A3 94.79
3 A4 94.79
<强>详情:
print (df.iloc[:-3].gt(df.iloc[-1]))
A1 A2 A3 A4
Well Temperature
25.0 False False False False
25.2 False True False False
25.4 True False False False
25.6 False False False False
25.8 False False False False
94.79 False False True True
94.99 False False True True
print (df.iloc[:-3].gt(df.iloc[-1]).idxmax())
A1 25.4
A2 25.2
A3 94.79
A4 94.79
dtype: object
如果可能某个值不大,一个可能的解决方案是使用NaN
索引将新行添加到末尾:
print (df)
A1 A2 A3 A4
Well Temperature
25.0 371.335253 360.026443 253.228769 593.436104
25.2 331.957145 332.224668 233.607595 561.057715
25.4 3005.472591 303.777874 213.500582 535.310186
25.6 285.713623 274.069361 202.024427 515.261876
25.8 252.716374 254.610848 181.719415 488.988468
94.79 -441.775980 -664.549239 1060.674188 1158.481056
94.99 -492.189733 -709.521424 1029.628209 1087.625128
mean 280.759521 283.417750 201.471571 519.939366
std 72.404373 69.023406 45.447202 58.150127
4*std 570.377014 559.511373 383.260378 752.539875
df1 = df.iloc[:-3].append((df.iloc[-1] + 1).rename(np.nan))
print (df1)
A1 A2 A3 A4
Well Temperature
25.0 371.335253 360.026443 253.228769 593.436104
25.2 331.957145 332.224668 233.607595 561.057715
25.4 3005.472591 303.777874 213.500582 535.310186
25.6 285.713623 274.069361 202.024427 515.261876
25.8 252.716374 254.610848 181.719415 488.988468
94.79 -441.775980 -664.549239 1060.674188 1158.481056
94.99 -492.189733 -709.521424 1029.628209 1087.625128
NaN 571.377014 560.511373 384.260378 753.539875
df2 = df1.gt(df.iloc[-1]).idxmax().rename_axis('Well').reset_index(name='Temp')
print (df2)
Well Temp
0 A1 25.4
1 A2 NaN
2 A3 94.79
3 A4 94.79
print (df1.gt(df.iloc[-1]))
A1 A2 A3 A4
Well Temperature
25.0 False False False False
25.2 False False False False
25.4 True False False False
25.6 False False False False
25.8 False False False False
94.79 False False True True
94.99 False False True True
NaN True True True True