根据另一列的多行填充DataFrame列

时间:2020-04-07 13:34:38

标签: python pandas dataframe where-clause multiple-conditions

我有一个DataFrame nplt:

nplt
Out[120]: 
     sexage  npark16cd  population  page
0       M00  E26000001  146.631840   NaN
1       M01  E26000001  122.677630   NaN
2       M02  E26000001  127.645516   NaN
3       M03  E26000001  138.313014   NaN
4       M04  E26000001  150.898252   NaN
5       M05  E26000001  149.086291   NaN
6       M06  E26000001  145.075953   NaN
7       M07  E26000001  159.893446   NaN
8       M08  E26000001  149.886962   NaN
9       M09  E26000001  182.406901   NaN
10      M10  E26000001  182.058425   NaN
11      M11  E26000001  186.962104   NaN
12      M12  E26000001  200.875284   NaN
13      M13  E26000001  209.038917   NaN
14      M14  E26000001  163.530837   NaN
15      M15  E26000001  161.171527   NaN

我希望填充以性别为条件的列页面,例如,性别连续等于M00,M01,M02,M03,我希望页面为p0_3,性别连续等于M04,M05,M06,但我希望页面连续为p4_6,喜欢:

nplt
Out[120]: 
     sexage  npark16cd  population  page
0       M00  E26000001  146.631840   p0_3
1       M01  E26000001  122.677630   p0_3
2       M02  E26000001  127.645516   p0_3
3       M03  E26000001  138.313014   p0_3
4       M04  E26000001  150.898252   p4_6
5       M05  E26000001  149.086291   p4_6
6       M06  E26000001  145.075953   p4_6
7       M07  E26000001  159.893446   NaN
8       M08  E26000001  149.886962   NaN
9       M09  E26000001  182.406901   NaN
10      M10  E26000001  182.058425   NaN
11      M11  E26000001  186.962104   NaN
12      M12  E26000001  200.875284   NaN
13      M13  E26000001  209.038917   NaN
14      M14  E26000001  163.530837   NaN
15      M15  E26000001  161.171527   NaN

,依此类推。我实际的DataFrame中的sexage列的范围是M00-M90和F00-F90。

是否有一种有效的方法?

非常感谢。

为更清楚地解释,我希望F00,F01,F02和F03系列的等价物以及M00,M01,M02和M03的值在页面中以及对于M04,M05,M06和F04都具有值p0_3 ,F05和F06的页面值为p4_6。例如:

nplt
    Out[120]: 
         sexage  npark16cd  population  page
    0       M00  E26000001  146.631840   p0_3
    1       M01  E26000001  122.677630   p0_3
    2       M02  E26000001  127.645516   p0_3
    3       M03  E26000001  138.313014   p0_3
    4       M04  E26000001  150.898252   p4_6
    5       M05  E26000001  149.086291   p4_6
    6       M06  E26000001  145.075953   p4_6
    7       M07  E26000001  159.893446   p7_10
    8       M08  E26000001  149.886962   p7_10
    9       M09  E26000001  182.406901   p7_10
    10      M10  E26000001  182.058425   p7_10
    11      M11  E26000001  186.962104   NaN
    12      M12  E26000001  200.875284   NaN
    13      M13  E26000001  209.038917   NaN
    14      M14  E26000001  163.530837   NaN
    15      M15  E26000001  161.171527   NaN


2355    F80  W18000003  102.553290   nan
2356    F81  W18000003  115.013810   nan
2357    F82  W18000003   94.524735   p82_85
2358    F83  W18000003   77.677229   p82_85
2359    F84  W18000003  103.239723   p82_85
2360    F85  W18000003   82.496796   p82_85
2361    F86  W18000003   71.609379   p86_90
2362    F87  W18000003   83.220993   p86_90
2363    F88  W18000003   80.120960   p86_90
2364    F89  W18000003   65.742056   p86_90
2365    F90  W18000003  204.664775   p86_90

我希望M00-M90和F00-F90中的分组(p0_4,p4_6,p7_10 ... p86_90)相同。列页面中的值将在将来的代码中采样层。

列页面中成员数量不同的原因是因为这些成员基于年龄范围不同的年龄组,例如0-3、4-6、5-7、8-12、13、14 -18 ...一直到90。

我尝试过:

nplt.loc[(nplt['sexage'] == {'M00', 'M01', 'M02', 'M03', 'F00', 'F01',
                  'F02', 'F03'}), 'page'] = 'p0_3'

但是它不起作用。请提供任何帮助。

1 个答案:

答案 0 :(得分:0)

Cat