Set specific values in a mixed valued DataFrame to fixed value?

时间:2016-10-19 13:34:44

标签: python pandas dataframe

I have a data frame with response and predictor variables in the columns and observations in the rows. Some of the values in the responses are below a given limit of detection (LOD). As I am planing to apply a rank transformation on the responses, I would like to set all those values equal to LOD. Say, the data frame is

data.head()

  age  response1  response2  response3 risk     sex smoking
0  33   0.272206   0.358059   0.585652   no  female     yes
1  38   0.425486   0.675391   0.721062  yes  female      no
2  20   0.910602   0.200606   0.664955  yes  female      no
3  38   0.966014   0.584317   0.923788  yes  female      no
4  27   0.756356   0.550512   0.106534   no  female     yes

I would like to do

responses = ['response1', 'response2', 'response3']
LOD = 0.2

data[responses][data[responses] <= LOD] = LOD

which for multiple reasons does not work (, as pandas doesn't know if it should produce a view on the data or not and it won't, I guess)

How do I set all values in

data[responses] <= LOD

equal to LOD?


Minimal example:

import numpy as np
import pandas as pd

from pandas import Series, DataFrame

x = Series(random.randint(0,2,50), dtype='category')
x.cat.categories = ['no', 'yes']

y = Series(random.randint(0,2,50), dtype='category')
y.cat.categories = ['no', 'yes']

z = Series(random.randint(0,2,50), dtype='category')
z.cat.categories = ['male', 'female']

a = Series(random.randint(20,60,50), dtype='category')

data = DataFrame({'risk':x, 'smoking':y, 'sex':z,
    'response1': random.rand(50),
    'response2': random.rand(50),
    'response3': random.rand(50),
    'age':a})

1 个答案:

答案 0 :(得分:0)

您可以使用DataFrame.mask

import numpy as np
import pandas as pd

np.random.seed(123)
x = pd.Series(np.random.randint(0,2,10), dtype='category')
x.cat.categories = ['no', 'yes']
y = pd.Series(np.random.randint(0,2,10), dtype='category')
y.cat.categories = ['no', 'yes']
z = pd.Series(np.random.randint(0,2,10), dtype='category')
z.cat.categories = ['male', 'female']

a = pd.Series(np.random.randint(20,60,10), dtype='category')

data = pd.DataFrame({
'risk':x, 
'smoking':y, 
'sex':z, 
'response1': np.random.rand(10),
'response2': np.random.rand(10),
'response3': np.random.rand(10),
'age':a})
print (data)
  age  response1  response2  response3 risk     sex smoking
0  24   0.722443   0.425830   0.866309   no    male     yes
1  23   0.322959   0.312261   0.250455  yes    male     yes
2  22   0.361789   0.426351   0.483034   no  female      no
3  40   0.228263   0.893389   0.985560   no  female     yes
4  59   0.293714   0.944160   0.519485   no  female      no
5  22   0.630976   0.501837   0.612895   no    male     yes
6  40   0.092105   0.623953   0.120629   no  female      no
7  27   0.433701   0.115618   0.826341  yes    male     yes
8  55   0.430863   0.317285   0.603060  yes    male     yes
9  48   0.493685   0.414826   0.545068   no    male      no
responses = ['response1', 'response2', 'response3']
LOD = 0.2

print (data[responses] <= LOD)
  response1 response2 response3
0     False     False     False
1     False     False     False
2     False     False     False
3     False     False     False
4     False     False     False
5     False     False     False
6      True     False      True
7     False      True     False
8     False     False     False
9     False     False     False

data[responses] = data[responses].mask(data[responses] <= LOD, LOD)
print (data)
  age  response1  response2  response3 risk     sex smoking
0  24   0.722443   0.425830   0.866309   no    male     yes
1  23   0.322959   0.312261   0.250455  yes    male     yes
2  22   0.361789   0.426351   0.483034   no  female      no
3  40   0.228263   0.893389   0.985560   no  female     yes
4  59   0.293714   0.944160   0.519485   no  female      no
5  22   0.630976   0.501837   0.612895   no    male     yes
6  40   0.200000   0.623953   0.200000   no  female      no
7  27   0.433701   0.200000   0.826341  yes    male     yes
8  55   0.430863   0.317285   0.603060  yes    male     yes
9  48   0.493685   0.414826   0.545068   no    male      no