熊猫:如何按降序将功能逐行应用于多列

时间:2018-11-18 19:06:16

标签: python pandas

我有一个包含1000列的数据框df1。在每一列中都有一个随机值。看起来像:

     0  1   2   3   4   5   6   7   8   9   ...     990 991 992 993 994 995 996 997 998 999
0   23  15  4   4   23  0   38  14  11  14  ...     22  3   25  3   24  8   1   14  18  27

我有第二个数据帧df2,其秒值f看起来像:

                        dtm     f
0   2018-03-01 00:00:00 +0000   50.135
1   2018-03-01 00:00:01 +0000   50.130
2   2018-03-01 00:00:02 +0000   50.120
3   2018-03-01 00:00:03 +0000   50.112
4   2018-03-01 00:00:04 +0000   50.102
5   2018-03-01 00:00:05 +0000   50.097
6   2018-03-01 00:00:06 +0000   50.095
7   2018-03-01 00:00:07 +0000   50.095
8   2018-03-01 00:00:08 +0000   50.092
9   2018-03-01 00:00:09 +0000   50.095
10  2018-03-01 00:00:10 +0000   50.097
11  2018-03-01 00:00:11 +0000   50.097
12  2018-03-01 00:00:12 +0000   50.097
13  2018-03-01 00:00:13 +0000   50.100
14  2018-03-01 00:00:14 +0000   50.102
15  2018-03-01 00:00:15 +0000   50.105
16  2018-03-01 00:00:16 +0000   50.102
17  2018-03-01 00:00:17 +0000   50.102
18  2018-03-01 00:00:18 +0000   50.100
19  2018-03-01 00:00:19 +0000   50.100
20  2018-03-01 00:00:20 +0000   50.100
21  2018-03-01 00:00:21 +0000   50.097
22  2018-03-01 00:00:22 +0000   50.097
23  2018-03-01 00:00:23 +0000   50.095
24  2018-03-01 00:00:24 +0000   50.092
25  2018-03-01 00:00:25 +0000   50.090
26  2018-03-01 00:00:26 +0000   50.090
27  2018-03-01 00:00:27 +0000   50.087
28  2018-03-01 00:00:28 +0000   50.085
29  2018-03-01 00:00:29 +0000   50.082
...     ...     ...
86371   2018-03-01 23:59:31 +0000   49.925
86372   2018-03-01 23:59:32 +0000   49.925
86373   2018-03-01 23:59:33 +0000   49.925
86374   2018-03-01 23:59:34 +0000   49.927
86375   2018-03-01 23:59:35 +0000   49.927
86376   2018-03-01 23:59:36 +0000   49.930
86377   2018-03-01 23:59:37 +0000   49.930
86378   2018-03-01 23:59:38 +0000   49.930
86379   2018-03-01 23:59:39 +0000   49.930
86380   2018-03-01 23:59:40 +0000   49.930
86381   2018-03-01 23:59:41 +0000   49.930
86382   2018-03-01 23:59:42 +0000   49.930
86383   2018-03-01 23:59:43 +0000   49.927
86384   2018-03-01 23:59:44 +0000   49.925
86385   2018-03-01 23:59:45 +0000   49.925
86386   2018-03-01 23:59:46 +0000   49.920
86387   2018-03-01 23:59:47 +0000   49.920
86388   2018-03-01 23:59:48 +0000   49.920
86389   2018-03-01 23:59:49 +0000   49.920
86390   2018-03-01 23:59:50 +0000   49.920
86391   2018-03-01 23:59:51 +0000   49.917
86392   2018-03-01 23:59:52 +0000   49.917
86393   2018-03-01 23:59:53 +0000   49.915
86394   2018-03-01 23:59:54 +0000   49.915
86395   2018-03-01 23:59:55 +0000   49.915
86396   2018-03-01 23:59:56 +0000   49.912
86397   2018-03-01 23:59:57 +0000   49.915
86398   2018-03-01 23:59:58 +0000   49.917
86399   2018-03-01 23:59:59 +0000   49.917
86400   2018-03-02 00:00:00 +0000   49.915

从df1的初始值开始,我需要在f> 50时将其每次增加1,并在f <50时将其减少1。结果应该是另一个数据帧,每秒具有1行,相对值和1000列。 我尝试过:

if (f.f>50).any():
    df1=df1.apply(lambda x: ((f.f/f.f)*x+1).cumsum())

但是它只在第一行正确的表中显示,然后在其他86400行中出现NaN。

有帮助吗?预先谢谢你

1 个答案:

答案 0 :(得分:0)

可能不是最节省内存的解决方案...

# Preallocate the result DataFrame
res = pd.DataFrame(np.tile(df1, (len(df2), 1)))

# Compute a numpy array of corrections to add to each cell in `res`
mask = np.where(df2.f > 50, 1, -1)
adjust = np.tile(mask, (len(res), 1)).T.cumsum(axis=0)

# Add the adjustment array to the result DataFrame
res += adjust