现在,我正在努力计算错误接受率(FAR)。我有一个从5人收集的数据集。要计算FAR,我们可以使用以下公式:
FAR = Number acceptance of impostors / total number of impostors
在这种情况下,我使用欧几里德距离比较模板和测试数据,如下所示,以获得进口商的数量接受度:
第1阶段
person 1 --> person 2
person 3
person 4
person 5
第2阶段
person 2 --> person 1
person 3
person 4
person 5
第3阶段
person 3--> person 1
person 2
person 4
person 5
第4阶段
person 4--> person 1
person 2
person 3
person 5
第5阶段
person 5--> person 1
person 2
person 3
person 4
我已经完成了stage 1
的循环,但是对于stage 2
直到stage 5
,我找不到如何将它们全部循环以提供一个输出数量接受冒名顶替者的输出。我的stage 1
代码如下:
import pandas as pd
import numpy as np
from scipy.spatial import distance
#load dataset
dataset = pd.read_csv('data.csv', index_col=0)
#template
data_p1 = dataset[dataset['Person']==1]
y = []
#looping to compare template person 1 and testing person 2 until 37
for x in xrange(2,6):
#testing
data_p2 = dataset[dataset['Person']==x]
#distance function
dst = distance.euclidean(data_p1 ,data_p2 )
# example of threshold
if dst <= 0.05:
y.append("wrong")
y.count("wrong")
0
dataset.head()
x1 x2 x3 x4 x5 Person
0 1 55 0 10 1
0 4 87 0 17 2
0 3 68 0 14 3
3 7 86 26 14 4
0 2 82 0 10 5
答案 0 :(得分:1)
这是大熊猫排列算法:
from __future__ import print_function
from itertools import permutations
import pandas as pd
df = pd.read_csv('data.csv', sep='\s+', index_col=['Person'])
print(df)
idx = list(permutations(df.index, 2))
new = pd.DataFrame(
{'route' : [
[ (df.loc[Ind[0], 'x1'], df.loc[Ind[0], 'x2'], df.loc[Ind[0], 'x3'], df.loc[Ind[0], 'x4'], df.loc[Ind[0], 'x5']),
(df.loc[Ind[1], 'x1'], df.loc[Ind[1], 'x2'], df.loc[Ind[1], 'x3'], df.loc[Ind[1], 'x4'], df.loc[Ind[1], 'x5'])
] for Ind in idx
]
},
index = idx
)
print(new)
输出:
x1 x2 x3 x4 x5
Person
1 0 1 55 0 10
2 0 4 87 0 17
3 0 3 68 0 14
4 3 7 86 26 14
5 0 2 82 0 10
route
(1, 2) [(0, 1, 55, 0, 10), (0, 4, 87, 0, 17)]
(1, 3) [(0, 1, 55, 0, 10), (0, 3, 68, 0, 14)]
(1, 4) [(0, 1, 55, 0, 10), (3, 7, 86, 26, 14)]
(1, 5) [(0, 1, 55, 0, 10), (0, 2, 82, 0, 10)]
(2, 1) [(0, 4, 87, 0, 17), (0, 1, 55, 0, 10)]
(2, 3) [(0, 4, 87, 0, 17), (0, 3, 68, 0, 14)]
(2, 4) [(0, 4, 87, 0, 17), (3, 7, 86, 26, 14)]
(2, 5) [(0, 4, 87, 0, 17), (0, 2, 82, 0, 10)]
(3, 1) [(0, 3, 68, 0, 14), (0, 1, 55, 0, 10)]
(3, 2) [(0, 3, 68, 0, 14), (0, 4, 87, 0, 17)]
(3, 4) [(0, 3, 68, 0, 14), (3, 7, 86, 26, 14)]
(3, 5) [(0, 3, 68, 0, 14), (0, 2, 82, 0, 10)]
(4, 1) [(3, 7, 86, 26, 14), (0, 1, 55, 0, 10)]
(4, 2) [(3, 7, 86, 26, 14), (0, 4, 87, 0, 17)]
(4, 3) [(3, 7, 86, 26, 14), (0, 3, 68, 0, 14)]
(4, 5) [(3, 7, 86, 26, 14), (0, 2, 82, 0, 10)]
(5, 1) [(0, 2, 82, 0, 10), (0, 1, 55, 0, 10)]
(5, 2) [(0, 2, 82, 0, 10), (0, 4, 87, 0, 17)]
(5, 3) [(0, 2, 82, 0, 10), (0, 3, 68, 0, 14)]
(5, 4) [(0, 2, 82, 0, 10), (3, 7, 86, 26, 14)]
另一个版本,您将在不同列中拥有成对人员:
new = pd.DataFrame(
{'a' : [ (df.loc[Ind[0], 'x1'], df.loc[Ind[0], 'x2'], df.loc[Ind[0], 'x3'], df.loc[Ind[0], 'x4'], df.loc[Ind[0], 'x5'])
for Ind in idx
],
'b' : [ (df.loc[Ind[1], 'x1'], df.loc[Ind[1], 'x2'], df.loc[Ind[1], 'x3'], df.loc[Ind[1], 'x4'], df.loc[Ind[1], 'x5'])
for Ind in idx
]
},
index = idx
)
输出:
a b
(1, 2) (0, 1, 55, 0, 10) (0, 4, 87, 0, 17)
(1, 3) (0, 1, 55, 0, 10) (0, 3, 68, 0, 14)
(1, 4) (0, 1, 55, 0, 10) (3, 7, 86, 26, 14)
(1, 5) (0, 1, 55, 0, 10) (0, 2, 82, 0, 10)
(2, 1) (0, 4, 87, 0, 17) (0, 1, 55, 0, 10)
(2, 3) (0, 4, 87, 0, 17) (0, 3, 68, 0, 14)
(2, 4) (0, 4, 87, 0, 17) (3, 7, 86, 26, 14)
(2, 5) (0, 4, 87, 0, 17) (0, 2, 82, 0, 10)
(3, 1) (0, 3, 68, 0, 14) (0, 1, 55, 0, 10)
(3, 2) (0, 3, 68, 0, 14) (0, 4, 87, 0, 17)
(3, 4) (0, 3, 68, 0, 14) (3, 7, 86, 26, 14)
(3, 5) (0, 3, 68, 0, 14) (0, 2, 82, 0, 10)
(4, 1) (3, 7, 86, 26, 14) (0, 1, 55, 0, 10)
(4, 2) (3, 7, 86, 26, 14) (0, 4, 87, 0, 17)
(4, 3) (3, 7, 86, 26, 14) (0, 3, 68, 0, 14)
(4, 5) (3, 7, 86, 26, 14) (0, 2, 82, 0, 10)
(5, 1) (0, 2, 82, 0, 10) (0, 1, 55, 0, 10)
(5, 2) (0, 2, 82, 0, 10) (0, 4, 87, 0, 17)
(5, 3) (0, 2, 82, 0, 10) (0, 3, 68, 0, 14)
(5, 4) (0, 2, 82, 0, 10) (3, 7, 86, 26, 14)
如果我理解你的循环正确(我不确定我得到了它):
for x in range(1, 6):
for y in range(1, 6):
if x!=y:
print(x,y)
注意:但您绝对不想将此方法与熊猫一起使用!