如何在熊猫数据框中使用NaN选择和替换特定值。如何从每个1级多索引中删除列

时间:2018-09-25 21:06:37

标签: python pandas dataframe

我有一个csv文件,我将其读入熊猫框架:

import pandas as pd


csv_file = pd.read_csv('hello.csv', engine='c', delimiter=',', index_col=0,
                       skiprows=1, header=[0, 1])

这是csv文件(print(csv_file))的视图:

bodyparts        nose                  ...        right_ear              
coords              x           y      ...                y    likelihood
0          197.486369    4.545954      ...       206.351233  1.280000e-06
1          319.946460  191.035224      ...       206.321893  9.680000e-07
2          319.880388  191.012984      ...       206.322207  9.520000e-07
3          320.286005  190.843329      ...       206.227396  1.020000e-06
4          320.210989  190.863304      ...         3.106570  8.350000e-07
5          320.212529  190.867178      ...         3.116692  8.460000e-07
6           -0.794705    2.462400      ...         3.112797  8.500000e-07
7           -0.785404    2.485562      ...         3.117945  8.430000e-07
8          319.786777  191.003882      ...         3.125062  8.820000e-07
9          319.947064  191.030201      ...       206.202980  9.210000e-07
10         319.845807  191.002510      ...       206.177779  8.660000e-07
11         320.135816  190.967408      ...       206.190732  8.910000e-07
12          -0.935765    2.568168      ...       206.260773  8.860000e-07
13          -0.932833    2.525062      ...       206.273504  8.780000e-07
14          -0.960939    2.500079      ...       206.272811  8.680000e-07
15          -0.832561    2.442907      ...       206.266416  8.720000e-07
16          -0.838884    2.421689      ...       206.242941  9.440000e-07
17          -0.857173    2.421467      ...       206.243972  9.950000e-07
18          -0.841627    2.414854      ...       206.225004  9.820000e-07
...               ...         ...      ...              ...           ...
10459      349.556703  301.995042      ...       307.018688  9.999745e-01
10460      348.608277  301.098244      ...       309.648986  9.999962e-01
10461      349.995217  303.397438      ...       311.149967  9.999974e-01
10462      349.109666  305.710711      ...       311.893106  9.999955e-01
10463      352.142571  310.081763      ...       317.420410  9.907742e-01
10464      351.916488  317.078128      ...       319.407211  2.706501e-01
10465      353.809847  320.086683      ...       323.478481  9.911720e-01
10466      349.233529  321.859424      ...       323.383276  8.724346e-01

生成的数据帧具有两个级别的MultiIndexed:

tuple(('body_part1', 'body_part2', ..., 'body_partn'), ('x', 'y', 'likelihood')

print(df.column()):

MultiIndex(levels=[['left_ear', 'nose', 'right_ear', 'tail'], ['likelihood', 'x', 'y']],
           labels=[[1, 1, 1, 3, 3, 3, 0, 0, 0, 2, 2, 2], [1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2, 0]],
           names=['bodyparts', 'coords'])

如果坐标的可能性较小,则我不希望将坐标替换为NaN。新数据框没有似然度列。第一行来自“ nose”的示例:

coords           x           y    likelihood
0       197.486369    4.545954  3.890000e-07

After函数应如下所示:

coords           x           y
0              NaN         NaN

请注意,在此过程中,未完成的值保持不变!

1 个答案:

答案 0 :(得分:2)

假设您有一个定义“降低”可能性的阈值:

for col in df.columns.levels[0]:
    df.loc[df[(col, 'likelihood')] < threshold, [(col, 'x'), (col, 'y')]] = np.nan

我还认为可能会有一种更理想的方法(无需遍历各列),但这也应该可行。