我想根据其范围的有效性选择我的原始数据。有一种仪器,最敏感的设置是C,然后是B然后是A.所以从C开始,看看是否所有值都小于阈值,如果是,那么完美,将此灵敏度中的所有数据设置为最佳= 1。
from StringIO import StringIO
a = """category,val,sensitivity_level
x,20,A
x,31,B
x,60,C
x,20,A
x,25,B
x,60,C
y,20,A
y,40,B
y,60,C
y,20,A
y,24,B
y,30,C"""
df = pd.read_csv(StringIO(a))
def grp_1evel_1(x):
"""
return if all the elements are less than threshold
"""
return x<=30
def grp_1evel_0(x):
"""
Input: data grouped by category. Here I want to go through this categories, in an descending order,
that is C, B and then A, and wherever one of this categories has x<=30 valid for all elements select
that category as best category. Think about a device sensitivity, that at the highest sensitivity the
data maybe garbage, so you would like to move down the sensitivity and check again.
"""
x['islessthan30'] = x.groupby('sensitivity_level').transform(grp_1evel_1)
return x
print df.groupby('category').apply(grp_1evel_0)
但不幸的是,上面的代码不会产生这个矩阵,因为 - 我无法按降序排序 - 我无法将值分配给groupby的groupby
x,20,A,1
x,31,B,0
x,60,C,0
x,20,A,1
x,25,B,0
x,60,C,0
y,20,A,0
y,29,B,1
y,60,C,0
y,20,A,0
y,24,B,1
y,30,C,0
任何提示?
算法应如下
在一个类别中,从最高灵敏度开始,如果所有值都小于阈值,则将此灵敏度设置为1,并跳过其他较低的灵敏度。
答案 0 :(得分:4)
我认为你正在寻找这样的东西:
In [28]: df
Out[28]:
category val sensitivity_level
0 x 20 A
1 x 31 B
2 x 60 C
3 x 20 A
4 x 25 B
5 x 60 C
6 y 20 A
7 y 40 B
8 y 60 C
9 y 20 A
10 y 24 B
11 y 30 C
In [29]:
In [29]: res = df.groupby(['category', 'sensitivity_level']).max()
In [30]: res
Out[30]:
val
category sensitivity_level
x A 20
B 31
C 60
y A 20
B 40
C 60
In [31]: res[res.val <= 30]
Out[31]:
val
category sensitivity_level
x A 20
y A 20
因此,您可以按类别和敏感度级别进行分组。最后一行为每个类别提供所需的敏感度级别。这样就可以避免创建一个中间列,说明每个级别是否小于30。
假设一个x=31
实际上是20:
In [33]: df.val.iloc[1] = 20
In [34]: df
Out[34]:
category val sensitivity_level
0 x 20 A
1 x 20 B
2 x 60 C
3 x 20 A
4 x 25 B
5 x 60 C
6 y 20 A
7 y 40 B
8 y 60 C
9 y 20 A
10 y 24 B
11 y 30 C
然后我们希望x使用B和y仍然使用A.我们可以稍微修改最后一步:
In [51]: res = df.groupby(['category', 'sensitivity_level']).max()
In [48]: x = res[res.val <= 30]
In [49]:
In [49]: x
Out[49]:
val
category sensitivity_level
x A 20
B 25
y A 20
In [71]: x.reset_index('category').sort_index(ascending=False).groupby(level='sensitivity_level').first()
Out[71]:
category val
sensitivity_level
A y 20
B x 25
可能有更好的方法来完成最后一步。