我想计算两个文本文件的混淆矩阵。有没有人知道python或shell脚本中的库或工具可以做到这一点?
例如我有两个文件
文件A:
public <T extends GenericnessFactory<T>> List<T> selectAll(T returningClass){
List<T> objects = new ArrayList<T>();
String name = returningClass.getClass().getSimpleName();
Field[] fields = returningClass.getClass().getFields();
int fieldCount = fields.length;
String[] columns = new String[fieldCount];
HashMap<String,Field> fieldMap = new HashMap<String,Field>();
for (int i = 0; i < fieldCount; i++) {
Field fld = fields[i];
int mods = fld.getModifiers();
if (!Modifier.isFinal(mods) && ! Modifier.isTransient(mods) && !Modifier.isStatic(mods)) {
columns[i] = fld.getName();
fieldMap.put(columns[i], fld);
}
}
SQLiteDatabase readable = getReadableDatabase();
Cursor c = readable.query(name, columns, null,null, null, null, null);
if (c.moveToFirst()) {
T item = returningClass.generate();
for (int i = 0; i < fieldCount; i++) {
fieldMap.get(columns[i]).set(item, c.getString(i));
}
}
c.close();
return objects;
}
文件B:
1
1
2
2
我会在哪里得到混淆矩阵:
2
2
2
2
更新:我想指出原帖包含行标签和列标签
答案 0 :(得分:4)
这可能是矫枉过正,但scikit-learn会很容易做到这一点:
from sklearn.metrics import confusion_matrix
# Read the data
with open('file1', 'r') as infile:
true_values = [int(i) for i in infile]
with open('file2', 'r') as infile:
predictions = [int(i) for i in infile]
# Make confusion matrix
confusion = confusion_matrix(true_values, predictions)
print(confusion)
带输出
[[0 2]
[0 2]]
http://scikit-learn.org/stable/modules/generated/sklearn.metrics.confusion_matrix.html
更新: 要使用标签进行打印,您可以转换为带有pandas的数据框,如下所示:
def print_confusion(confusion):
print(' ' + ' '.join([str(n) for n in range(confusion.shape[1])]))
for rownum in range(confusion.shape[0]):
print(str(rownum) + ' ' + ' '.join([str(n) for n in confusion[rownum]]))
打印
0 1
0 0 2
1 0 2