如何计算多类别分类问题的真实和误报率?说,
public void SendFileViaTCP(string sFile, string sIPAddress, int portNo)
{
byte[] sendingBuffer = null;
TcpClient client = null;
toolStripStatusLabel1.Text = "";
lbConnect.Items.Clear();
progressBar1.Value = 0;
Application.DoEvents();
NetworkStream networkStream = null;
int bufferSize = 5000;
string checksum = CalculateMD5(sFile);
lbConnect.Items.Add("File " + sFile + " checksum = " + checksum);
try
{
client = new TcpClient(sIPAddress, portNo);
toolStripStatusLabel1.Text = "Connected to the Server...\n";
networkStream = client.GetStream();
FileStream fileStream = new FileStream(sFile, FileMode.Open, FileAccess.Read);
long fsLength = fileStream.Length;
int nPackets = Convert.ToInt32(Math.Ceiling(Convert.ToDouble(fsLength) / Convert.ToDouble(bufferSize)));
progressBar1.Maximum = nPackets;
Application.DoEvents();
int currentPacketLength;
for (int i = 0; i < nPackets; i++) {
if (fsLength > bufferSize) {
currentPacketLength = bufferSize;
fsLength -= currentPacketLength;
Application.DoEvents();
}
else {
currentPacketLength = Convert.ToInt32(fsLength);
}
sendingBuffer = new byte[currentPacketLength];
fileStream.Read(sendingBuffer, 0, currentPacketLength);
networkStream.Write(sendingBuffer, 0, (int)sendingBuffer.Length);
progressBar1.PerformStep();
Application.DoEvents();
}
toolStripStatusLabel1.Text = "Sent " + fileStream.Length.ToString() + " bytes to the server";
fileStream.Close();
}
catch (Exception ex) {
Console.WriteLine(ex.Message);
}
finally {
networkStream.Close();
client.Close();
}
}
void SendString(string str, string sIPAddress, int portNo)
{
try {
TcpClient tcpClient = new TcpClient();
lbConnect.Items.Add("Connecting...");
Application.DoEvents();
// use the ipaddress as in the server program
var tsk = tcpClient.ConnectAsync(sIPAddress, portNo);
tsk.Wait(3000); // here we set how long we want to wait before deciding the server is not responding.
lbConnect.Items.Add("Connected");
lbConnect.Items.Add("Sending string: " + str);
Application.DoEvents();
Stream stream = tcpClient.GetStream();
ASCIIEncoding asen= new ASCIIEncoding();
byte[] ba=asen.GetBytes(str);
lbConnect.Items.Add("Transmitting...");
Application.DoEvents();
stream.Write(ba,0,ba.Length);
byte[] bb=new byte[100];
int k = stream.Read(bb,0,100);
string sResponse = string.Empty;
for (int i = 0; i < k; i++) {
sResponse += Convert.ToChar(bb[i]);
}
lbConnect.Items.Add(sResponse);
Application.DoEvents();
tcpClient.Close();
}
catch (Exception e) {
lbConnect.Items.Add("Error: " + e.StackTrace);
Application.DoEvents();
}
}
混淆矩阵由y_true = [1, -1, 0, 0, 1, -1, 1, 0, -1, 0, 1, -1, 1, 0, 0, -1, 0]
y_prediction = [-1, -1, 1, 0, 0, 0, 0, -1, 1, -1, 1, 1, 0, 0, 1, 1, -1]
计算,但这只会改变问题。
在@ seralouk的回答之后编辑。此处,课程metrics.confusion_matrix(y_true, y_prediction)
将被视为否定,而-1
和0
则是肯定的变体。
答案 0 :(得分:8)
import numpy as np
from sklearn.metrics import confusion_matrix
y_true = [1, -1, 0, 0, 1, -1, 1, 0, -1, 0, 1, -1, 1, 0, 0, -1, 0]
y_prediction = [-1, -1, 1, 0, 0, 0, 0, -1, 1, -1, 1, 1, 0, 0, 1, 1, -1]
cnf_matrix = confusion_matrix(y_true, y_prediction)
print(cnf_matrix)
#[[1 1 3]
# [3 2 2]
# [1 3 1]]
FP = cnf_matrix.sum(axis=0) - np.diag(cnf_matrix)
FN = cnf_matrix.sum(axis=1) - np.diag(cnf_matrix)
TP = np.diag(cnf_matrix)
TN = cnf_matrix.sum() - (FP + FN + TP)
FP = FP.astype(float)
FN = FN.astype(float)
TP = TP.astype(float)
TN = TN.astype(float)
# Sensitivity, hit rate, recall, or true positive rate
TPR = TP/(TP+FN)
# Specificity or true negative rate
TNR = TN/(TN+FP)
# Precision or positive predictive value
PPV = TP/(TP+FP)
# Negative predictive value
NPV = TN/(TN+FN)
# Fall out or false positive rate
FPR = FP/(FP+TN)
# False negative rate
FNR = FN/(TP+FN)
# False discovery rate
FDR = FP/(TP+FP)
# Overall accuracy
ACC = (TP+TN)/(TP+FP+FN+TN)
对于我们有很多课程的一般情况,这些指标在下图中以图形方式表示:
答案 1 :(得分:-1)
由于有几种方法可以解决这个问题,而且没有一种方法可以解决这个问题(请参阅https://stats.stackexchange.com/questions/202336/true-positive-false-negative-true-negative-false-positive-definitions-for-mul?noredirect=1&lq=1和 https://stats.stackexchange.com/questions/51296/how-do-you-calculate-precision-and-recall-for-multiclass-classification-using-co#51301),这是the paper which I was unclear about中似乎使用的解决方案:
将两个前景页面之间的混淆视为误报
因此解决方案是import numpy as np
,使用y_true
和y_prediction
作为np.array
,然后:
FP = np.logical_and(y_true != y_prediction, y_prediction != -1).sum() # 9
FN = np.logical_and(y_true != y_prediction, y_prediction == -1).sum() # 4
TP = np.logical_and(y_true == y_prediction, y_true != -1).sum() # 3
TN = np.logical_and(y_true == y_prediction, y_true == -1).sum() # 1
TPR = 1. * TP / (TP + FN) # 0.42857142857142855
FPR = 1. * FP / (FP + TN) # 0.9
答案 2 :(得分:-1)
另一个简单的方法是PyCM(由我),它支持多类混淆矩阵分析。
适用于您的问题:
>>> from pycm import ConfusionMatrix
>>> y_true = [1, -1, 0, 0, 1, -1, 1, 0, -1, 0, 1, -1, 1, 0, 0, -1, 0]
>>> y_prediction = [-1, -1, 1, 0, 0, 0, 0, -1, 1, -1, 1, 1, 0, 0, 1, 1, -1]
>>> cm = ConfusionMatrix(actual_vector=y_true,predict_vector=y_prediction)
>>> print(cm)
Predict -1 0 1
Actual
-1 1 1 3
0 3 2 2
1 1 3 1
Overall Statistics :
95% CI (0.03365,0.43694)
Bennett_S -0.14706
Chi-Squared None
Chi-Squared DF 4
Conditional Entropy None
Cramer_V None
Cross Entropy 1.57986
Gwet_AC1 -0.1436
Joint Entropy None
KL Divergence 0.01421
Kappa -0.15104
Kappa 95% CI (-0.45456,0.15247)
Kappa No Prevalence -0.52941
Kappa Standard Error 0.15485
Kappa Unbiased -0.15405
Lambda A 0.2
Lambda B 0.27273
Mutual Information None
Overall_ACC 0.23529
Overall_RACC 0.33564
Overall_RACCU 0.33737
PPV_Macro 0.23333
PPV_Micro 0.23529
Phi-Squared None
Reference Entropy 1.56565
Response Entropy 1.57986
Scott_PI -0.15405
Standard Error 0.10288
Strength_Of_Agreement(Altman) Poor
Strength_Of_Agreement(Cicchetti) Poor
Strength_Of_Agreement(Fleiss) Poor
Strength_Of_Agreement(Landis and Koch) Poor
TPR_Macro 0.22857
TPR_Micro 0.23529
Class Statistics :
Classes -1 0 1
ACC(Accuracy) 0.52941 0.47059 0.47059
BM(Informedness or bookmaker informedness) -0.13333 -0.11429 -0.21667
DOR(Diagnostic odds ratio) 0.5 0.6 0.35
ERR(Error rate) 0.47059 0.52941 0.52941
F0.5(F0.5 score) 0.2 0.32258 0.17241
F1(F1 score - harmonic mean of precision and sensitivity) 0.2 0.30769 0.18182
F2(F2 score) 0.2 0.29412 0.19231
FDR(False discovery rate) 0.8 0.66667 0.83333
FN(False negative/miss/type 2 error) 4 5 4
FNR(Miss rate or false negative rate) 0.8 0.71429 0.8
FOR(False omission rate) 0.33333 0.45455 0.36364
FP(False positive/type 1 error/false alarm) 4 4 5
FPR(Fall-out or false positive rate) 0.33333 0.4 0.41667
G(G-measure geometric mean of precision and sensitivity) 0.2 0.30861 0.18257
LR+(Positive likelihood ratio) 0.6 0.71429 0.48
LR-(Negative likelihood ratio) 1.2 1.19048 1.37143
MCC(Matthews correlation coefficient) -0.13333 -0.1177 -0.20658
MK(Markedness) -0.13333 -0.12121 -0.19697
N(Condition negative) 12 10 12
NPV(Negative predictive value) 0.66667 0.54545 0.63636
P(Condition positive) 5 7 5
POP(Population) 17 17 17
PPV(Precision or positive predictive value) 0.2 0.33333 0.16667
PRE(Prevalence) 0.29412 0.41176 0.29412
RACC(Random accuracy) 0.08651 0.14533 0.10381
RACCU(Random accuracy unbiased) 0.08651 0.14619 0.10467
TN(True negative/correct rejection) 8 6 7
TNR(Specificity or true negative rate) 0.66667 0.6 0.58333
TON(Test outcome negative) 12 11 11
TOP(Test outcome positive) 5 6 6
TP(True positive/hit) 1 2 1
TPR(Sensitivity, recall, hit rate, or true positive rate) 0.2 0.28571 0.2