在Pandas中的几个列中应用地图

时间:2017-01-25 10:23:31

标签: python pandas

我正在尝试在pandas中的多个列中应用地图,以反映数据无效的时间。如果我的df [' Count']列中的数据无效,我想设置我的df ['值'],df ['较低的置信区间'], df ['较高置信区间']和df ['分母']列为-1。

这是数据帧的示例:

i

目前,我正在尝试:

Count   Value       Lower Confidence Interval  Upper Confidence Interval  Denominator
121743  54.15758428 53.95153779                54.36348867                224794
280     91.80327869 88.18009411                94.38654088                305
430     56.95364238 53.39535553                60.44152684                755
970     70.54545455 68.0815009                 72.89492873                1375
nan             
70      28.57142857 23.27957213                34.52488678                245
125     62.5        55.6143037                 68.91456314                200

然后:

set_minus_1s = {np.nan: -1, '*': -1, -1: -1}

并收到错误:

df[['Value', 'Count', 'Lower Confidence Interval', 'Upper Confidence Interval', 'Denominator']] = df['Count'].map(set_minus_1s)

是否有任何方法可以链接列引用以对地图进行一次调用,而不是每列都有单独的行来将ValueError: Must have equal len keys and value when setting with an iterable 字典作为地图调用?

1 个答案:

答案 0 :(得分:1)

我认为您可以使用wheremask并在应用map后替换所有不在isnull的行:

val = df['Count'].map(set_minus_1s)
print (val)
0    NaN
1    NaN
2    NaN
3    NaN
4   -1.0
5    NaN
6    NaN
Name: Count, dtype: float64

cols =['Value','Count','Lower Confidence Interval','Upper Confidence Interval','Denominator']
df[cols] = df[cols].where(val.isnull(), val, axis=0)
print (df)
      Count      Value  Lower Confidence Interval  Upper Confidence Interval  \
0  121743.0  54.157584                  53.951538                  54.363489   
1     280.0  91.803279                  88.180094                  94.386541   
2     430.0  56.953642                  53.395356                  60.441527   
3     970.0  70.545455                  68.081501                  72.894929   
4      -1.0  -1.000000                  -1.000000                  -1.000000   
5      70.0  28.571429                  23.279572                  34.524887   
6     125.0  62.500000                  55.614304                  68.914563   

   Denominator  
0     224794.0  
1        305.0  
2        755.0  
3       1375.0  
4         -1.0  
5        245.0  
6        200.0  
cols = ['Value', 'Count', 'Lower Confidence Interval', 'Upper Confidence Interval', 'Denominator']
df[cols] = df[cols].mask(val.notnull(), val, axis=0)
print (df)
      Count      Value  Lower Confidence Interval  Upper Confidence Interval  \
0  121743.0  54.157584                  53.951538                  54.363489   
1     280.0  91.803279                  88.180094                  94.386541   
2     430.0  56.953642                  53.395356                  60.441527   
3     970.0  70.545455                  68.081501                  72.894929   
4      -1.0  -1.000000                  -1.000000                  -1.000000   
5      70.0  28.571429                  23.279572                  34.524887   
6     125.0  62.500000                  55.614304                  68.914563   

   Denominator  
0     224794.0  
1        305.0  
2        755.0  
3       1375.0  
4         -1.0  
5        245.0  
6        200.0