计算两个文本文件的混淆矩阵

时间:2016-10-25 19:02:03

标签: python bash shell confusion-matrix

我想计算两个文本文件的混淆矩阵。有没有人知道python或shell脚本中的库或工具可以做到这一点?

例如我有两个文件

文件A:

public <T extends GenericnessFactory<T>> List<T> selectAll(T returningClass){
    List<T> objects = new ArrayList<T>();
    String name = returningClass.getClass().getSimpleName();
    Field[] fields = returningClass.getClass().getFields();
    int fieldCount = fields.length;
    String[] columns = new String[fieldCount];
    HashMap<String,Field> fieldMap = new HashMap<String,Field>();
    for (int i = 0; i < fieldCount; i++) {
        Field fld = fields[i];
        int mods = fld.getModifiers();
        if (!Modifier.isFinal(mods) && ! Modifier.isTransient(mods) && !Modifier.isStatic(mods)) {
            columns[i] = fld.getName();
            fieldMap.put(columns[i], fld);
        }
    }

    SQLiteDatabase readable = getReadableDatabase();

    Cursor c = readable.query(name, columns, null,null, null, null, null);

    if (c.moveToFirst()) {
        T item = returningClass.generate();
        for (int i = 0; i < fieldCount; i++) {
            fieldMap.get(columns[i]).set(item, c.getString(i));
        }
    }

    c.close();
    return objects;
}

文件B:

1
1
2
2

我会在哪里得到混淆矩阵:

2
2
2
2

更新:我想指出原帖包含行标签和列标签

1 个答案:

答案 0 :(得分:4)

这可能是矫枉过正,但scikit-learn会很容易做到这一点:

from sklearn.metrics import confusion_matrix

# Read the data
with open('file1', 'r') as infile:
    true_values = [int(i) for i in infile]
with open('file2', 'r') as infile:
    predictions = [int(i) for i in infile]

# Make confusion matrix
confusion = confusion_matrix(true_values, predictions)

print(confusion)

带输出

[[0 2]
 [0 2]]

http://scikit-learn.org/stable/modules/generated/sklearn.metrics.confusion_matrix.html

更新: 要使用标签进行打印,您可以转换为带有pandas的数据框,如下所示:

def print_confusion(confusion):
    print('   ' + '  '.join([str(n) for n in range(confusion.shape[1])]))
    for rownum in range(confusion.shape[0]):
        print(str(rownum) + '  ' + '  '.join([str(n) for n in confusion[rownum]]))

打印

   0  1
0  0  2
1  0  2