我编写了python代码来分析具有//creates new class of oledocumentproperties
var doc = new OleDocumentPropertiesClass();
//open your selected file
doc.Open(@"C:\Users\ABC\Desktop\Test\1.jpg", false, dsoFileOpenOptions.dsoOptionDefault);
//you can set properties with summaryproperties.nameOfProperty = value; for example
doc.SummaryProperties.Company = "lol"; //Line 8 : Shows error
doc.SummaryProperties.Author = "me";
//after making changes, you need to use this line to save them
doc.Save();
和62 columns
的数据集。我正在使用299 rows
进行logistic regression
预测。我正在考虑的列是:
flare_up
通过使用以上列,我需要预测Demo1, Demo5, Dis1, Dis1Treat, Dis2, Dis3, Dis4, Dis4Treat, Dis5, Dis5Treat, Dis6, Dis6Treat, Dis7, DisHis1, DisHis2, DisStage1, DisStage2, LungFun19
的二进制值。
此代码从CSV获取所需数据:
Flare_Up
我正在使用以下代码绘制混淆矩阵:
df = pd.read_csv("datasource/FinalData/Scoring_dataset.csv")
print(df.head(3))
y = np.array(df.Demo1.tolist())
df = df.drop('Dis2', 1)
df = df.drop('Dis3', 1)
df['LungFun1'] = StandardScaler().fit_transform(df['LungFun1'].values.reshape(-1,1)) #optionally rescale non-normalized column
X = np.array(df.as_matrix())
这是我用来计算X, X_val, y, y_val = train_test_split(X, y, test_size = 0.2)
print(X, X_val, y, y_val)
lrn = LogisticRegression()
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2)
lrn.fit(X_train, y_train)
y_pred = lrn.predict(X_test)
cm = confusion_matrix(y_test, y_pred)
if lrn.classes_[0] == 1:
#make sure the ordering of the classes is correct
cm = np.array([[cm[0,1], cm[0,0]], [cm[1,1], cm[1,0]]])
pc.plot_confusion_matrix(cm, ['0', '1'], )
pr, tpr, fpr = pc.show_data(cm, print_res = 1);
的函数:
ROC
以下是绘制def ROC(X, y, c, r):
#makes cross_validation for given parameters c,r. Returns FPR, TPR (averaged)
dic_weight = {1:len(y)/(r*np.sum(y)), 0:len(y)/(len(y)-r*np.sum(y))} #specify class weights
lrn = LogisticRegression(penalty = 'l2', C = c, class_weight = dic_weight)
N_iter = 300 #repeat how often (taking the mean)
mean_tpr = 0.0
mean_thresh = 0.0
mean_fpr = np.linspace(0, 1, 50000)
mean_auc = 0
for it in range(N_iter):
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2)
lrn.fit(X_train, y_train)
y_prob = lrn.predict_proba(X_test)[:,lrn.classes_[1]]
fpr, tpr, thresholds = roc_curve(y_test, y_prob)
thresholds[0] = 1
mean_tpr += np.interp(mean_fpr, fpr, tpr)
mean_thresh += np.interp(mean_fpr, fpr, thresholds)
mean_tpr[0] = 0.0
mean_auc += roc_auc_score(y_test, y_prob)
mean_tpr /= N_iter
mean_thresh /= N_iter
mean_tpr[-1] = 1.0
return mean_fpr, mean_tpr, mean_auc/N_iter, mean_thresh
的函数:
ROC
运行代码时,我将其作为输出:
def plot_roc(X,y, list_par_1, par_1 = 'C', par_2 = 1):
f = plt.figure(figsize = (8,4));
for p in list_par_1:
if par_1 == 'C':
c = p
r = par_2
else:
r = p
c = par_2
list_FP, list_TP, AUC, mean_thresh = ROC(X, y, c, r)
plt.plot(list_FP, list_TP, label = 'C = {}, r = {}, TPR(3e-4) = {:.4f}'.format(c,r,list_TP[15]));
del list_FP, list_TP
plt.legend(title = 'values', loc='lower right')
plt.xlim(0, 0.001) #we are only interested in small values of FPR
plt.ylim(0.5, 0.9)
plt.xlabel('FPR')
plt.ylabel('TPR')
plt.title('ROC detail')
plt.axvline(3e-4, color='b', linestyle='dashed', linewidth=2)
plt.show()
plt.close()
plot_roc(X,y, [1, 3, 10, 30, 100], 'r', 1)
我不明白为什么精度会以Precision = 1.00000
Recall (TPR) = 1.00000
Fallout (FPR) = 0.00000e+00
的形式出现。数据标准化了吗?我的代码中缺少什么?