I have a data frame with response and predictor variables in the columns and observations in the rows. Some of the values in the responses are below a given limit of detection (LOD). As I am planing to apply a rank transformation on the responses, I would like to set all those values equal to LOD. Say, the data frame is
data.head()
age response1 response2 response3 risk sex smoking
0 33 0.272206 0.358059 0.585652 no female yes
1 38 0.425486 0.675391 0.721062 yes female no
2 20 0.910602 0.200606 0.664955 yes female no
3 38 0.966014 0.584317 0.923788 yes female no
4 27 0.756356 0.550512 0.106534 no female yes
I would like to do
responses = ['response1', 'response2', 'response3']
LOD = 0.2
data[responses][data[responses] <= LOD] = LOD
which for multiple reasons does not work (, as pandas doesn't know if it should produce a view on the data or not and it won't, I guess)
How do I set all values in
data[responses] <= LOD
equal to LOD?
Minimal example:
import numpy as np
import pandas as pd
from pandas import Series, DataFrame
x = Series(random.randint(0,2,50), dtype='category')
x.cat.categories = ['no', 'yes']
y = Series(random.randint(0,2,50), dtype='category')
y.cat.categories = ['no', 'yes']
z = Series(random.randint(0,2,50), dtype='category')
z.cat.categories = ['male', 'female']
a = Series(random.randint(20,60,50), dtype='category')
data = DataFrame({'risk':x, 'smoking':y, 'sex':z,
'response1': random.rand(50),
'response2': random.rand(50),
'response3': random.rand(50),
'age':a})
答案 0 :(得分:0)
您可以使用DataFrame.mask
:
import numpy as np
import pandas as pd
np.random.seed(123)
x = pd.Series(np.random.randint(0,2,10), dtype='category')
x.cat.categories = ['no', 'yes']
y = pd.Series(np.random.randint(0,2,10), dtype='category')
y.cat.categories = ['no', 'yes']
z = pd.Series(np.random.randint(0,2,10), dtype='category')
z.cat.categories = ['male', 'female']
a = pd.Series(np.random.randint(20,60,10), dtype='category')
data = pd.DataFrame({
'risk':x,
'smoking':y,
'sex':z,
'response1': np.random.rand(10),
'response2': np.random.rand(10),
'response3': np.random.rand(10),
'age':a})
print (data)
age response1 response2 response3 risk sex smoking
0 24 0.722443 0.425830 0.866309 no male yes
1 23 0.322959 0.312261 0.250455 yes male yes
2 22 0.361789 0.426351 0.483034 no female no
3 40 0.228263 0.893389 0.985560 no female yes
4 59 0.293714 0.944160 0.519485 no female no
5 22 0.630976 0.501837 0.612895 no male yes
6 40 0.092105 0.623953 0.120629 no female no
7 27 0.433701 0.115618 0.826341 yes male yes
8 55 0.430863 0.317285 0.603060 yes male yes
9 48 0.493685 0.414826 0.545068 no male no
responses = ['response1', 'response2', 'response3']
LOD = 0.2
print (data[responses] <= LOD)
response1 response2 response3
0 False False False
1 False False False
2 False False False
3 False False False
4 False False False
5 False False False
6 True False True
7 False True False
8 False False False
9 False False False
data[responses] = data[responses].mask(data[responses] <= LOD, LOD)
print (data)
age response1 response2 response3 risk sex smoking
0 24 0.722443 0.425830 0.866309 no male yes
1 23 0.322959 0.312261 0.250455 yes male yes
2 22 0.361789 0.426351 0.483034 no female no
3 40 0.228263 0.893389 0.985560 no female yes
4 59 0.293714 0.944160 0.519485 no female no
5 22 0.630976 0.501837 0.612895 no male yes
6 40 0.200000 0.623953 0.200000 no female no
7 27 0.433701 0.200000 0.826341 yes male yes
8 55 0.430863 0.317285 0.603060 yes male yes
9 48 0.493685 0.414826 0.545068 no male no