我有一个功能来检测和删除实验数据中的异常值。 我想对我的数据应用此函数,存储在数据框中。然而,数据框由许多实验对象和4个实验条件组成,而离群检测功能应该应用于关卡和每个主题+试验代码。 这就是我的数据:
subject trialcode correct latency
0 1790361018 nonsn 1 4051
1 1790361018 neighbour 1 1266
2 1790361018 neighbour 1 2145
3 1790361018 nonsn 0 2959
4 1790361018 nonneighbour 1 1086
5 1790361018 nonwords 1 2956
6 1790361018 nonwords 1 3814
7 1790361018 nonneighbour 1 4924
8 1790361018 nonwords 0 4771
9 1790361018 nonneighbour 0 2654
10 1790361018 neighbour 1 945
11 1790361018 nonneighbour 1 1189
12 1790361018 neighbour 1 1215
13 1790361018 neighbour 1 800
14 1790361018 neighbour 1 752
15 1790361018 neighbour 1 963
16 1790361018 neighbour 1 1822
17 1790361018 nonneighbour 1 856
18 1790361018 nonneighbour 1 695
19 1790361018 nonwords 1 2020
20 1790361018 neighbour 1 1303
21 1790361018 nonneighbour 1 1597
22 1790361018 nonwords 1 1327
23 1790361018 neighbour 1 1084
24 1790361018 neighbour 1 2434
25 1790361018 nonneighbour 1 917
26 1790361018 neighbour 1 1170
27 1790361018 nonwords 0 1388
28 1790361018 nonwords 1 1871
29 1790361018 neighbour 1 967
这是我们的功能:
def reject_outliers(data, m=2):
return data[abs(data - np.mean(data)) < m * np.std(data)]
有没有办法在主题+试用代码组上应用此功能?