Question

我有一个数据框，该数据框具有一列“ rel_max”，该列具有本地最大值的所有值的列表（如果相关或更有用，我还将有一列这些本地极值的索引）。我想使用此值或索引列表并屏蔽数据框，以便在其正确位置具有最大值，对于数据框的所有其他值，均取NaN或0。

df = pd.DataFrame({'123': [20.908, 8.743, 8.34, 2.4909],
                 '124': [2, 2.34, 0, 4.1234],
                  '412': [2, 20.123, 3.123123, 0],
                   '516': [5, 20.120, 4.12, 0],
                   '129': [6, 20.10, 3.123123, 0],
                    'rel_max': [[20.908, 6], [8.743,20.123], [8.34,4.12], [4.1234]]},

                 index=['2015-01-10', '2015-02-10', '2015-03-10', '2015-04-10'])

这是具有相对最大值的数据框。 ^

这是预期的数据帧。

df1 = pd.DataFrame({'123': [20.908, 8.743, 8.34, 0],
                 '124': [0, 0, 0, 4.1234],
                  '412': [0, 20.123, 0, 0],
                   '516': [0, 0, 4.12, 0],
                   '129': [6, 0, 0, 0],
                    'rel_max': [[20.908, 6], [8.743,20.123], [8.34,4.12], [4.1234]]},

                 index=['2015-01-10', '2015-02-10', '2015-03-10', '2015-04-10'])

基本上，我试图仅使用本地极值来检索或提取数据框。

               123     124     412   516  129          rel_max
2015-01-10  20.908  0.0000   0.000  0.00    6      [20.908, 6]
2015-02-10   8.743  0.0000  20.123  0.00    0  [8.743, 20.123]
2015-03-10   8.340  0.0000   0.000  4.12    0     [8.34, 4.12]
2015-04-10   0.000  4.1234   0.000  0.00    0         [4.1234]

Answer 1

使用索引。 Firt获得最小值和最大值，并使用numpy和pandas broadcasting创建两个掩码m1和m2。然后，再次播放

smax = df.rel_max.str[0]
smin = df.rel_max.str[1]

m1 = df == np.broadcast_to(smax.values.reshape(-1,1), df.shape)

m2 = df == np.broadcast_to(smin.values.reshape(-1,1), df.shape)

df[m1 | m2]

更详细地说，smax是具有最大值的序列，而smin是min值的序列。 m1是True / False个值的数据帧。每当True中的像元等于广播值中的值之一时，它就会产生df。我建议单独运行代码的每个部分并查看输出，这更直观;）

输出为：

            123     124     412     516     129 rel_max
2015-01-10  20.908  0.0000  0.000   0.00    6   [20.908, 6]
2015-02-10  8.743   0.0000  20.123  0.00    0   [8.743, 20.123]
2015-03-10  8.340   0.0000  0.000   4.12    0   [8.34, 4.12]
2015-04-10  0.000   4.1234  0.000   0.00    0   [4.1234]

Answer 2

您可以尝试这样的事情：

pd.concat([df.iloc[:, :-1].where(df.apply(lambda x: x[:-1].isin(x.iloc[-1]), axis=1), 0), 
           df.iloc[:, -1]], axis=1)

输出：

               123     124     412   516  129          rel_max
2015-01-10  20.908  0.0000   0.000  0.00  6.0      [20.908, 6]
2015-02-10   8.743  0.0000  20.123  0.00  0.0  [8.743, 20.123]
2015-03-10   8.340  0.0000   0.000  4.12  0.0     [8.34, 4.12]
2015-04-10   0.000  4.1234   0.000  0.00  0.0         [4.1234]

如何屏蔽给定数据框中的值或索引列表的数据框

2 个答案: