如何在包含1300张图像的数据集上使用metrics.silouhette_score,这些图像具有ResNet50特征矢量(每个图像的长度为2048)以及介于1到9之间的离散类标签?
import pandas as pd
import numpy as np
from sklearn.metrics import pairwise_distances
from sklearn import cluster, datasets, preprocessing, metrics
from sklearn.cluster import KMeans
df = pd.read_csv("master.csv")
labels = list(df['Q3 Theme1'])
labels_reshaped = np.ndarray(labels).reshape(-1,1)
X = open('entire_dataset__resnet50_feature_vectors.txt')
X_Data = X.read()
print('Silhouette Score:', metrics.silhouette_score(X_Data, labels_reshaped,
metric='cosine'))
我收到此错误:
Traceback (most recent call last):
File "/dataset/silouhette_score.py", line 8, in <module>
labels_reshaped = np.ndarray(labels).reshape(-1,1)
ValueError: sequence too large; cannot be greater than 32
Process finished with exit code 1
对于其他代码:
import pandas as pd
import numpy as np
from sklearn.metrics import pairwise_distances
from sklearn import cluster, datasets, preprocessing, metrics
from sklearn.cluster import KMeans
df = pd.read_csv("master.csv")
labels = list(df['Q3 Theme1'])
labels_reshaped = np.ndarray(labels).reshape(1,-1)
X = open('entire_dataset__resnet50_feature_vectors.txt')
X_Data = X.read()
print('Silhouette Score:', metrics.silhouette_score(X_Data, labels_reshaped,
metric='cosine'))
我收到此错误:
Traceback (most recent call last):
File "/dataset/silouhette_score.py", line 8, in <module>
labels_reshaped = np.ndarray(labels).reshape(1,-1)
ValueError: sequence too large; cannot be greater than 32
Process finished with exit code 1
如果我运行其他代码:
import pandas as pd
from sklearn import metrics
df = pd.read_csv("master.csv")
labels = list(df['Q3 Theme1'])
X = open('entire_dataset__resnet50_feature_vectors.txt')
X_Data = X.read()
print('Silhouette Score:', metrics.silhouette_score(X_Data, labels,
metric='cosine'))
我将其作为输出:https://pastebin.com/raw/hk2axdWL
如何解决此代码,以便可以打印单个轮廓分数?
Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.
Process finished with exit code 1
我在这里粘贴了特征向量文件(.txt文件)的一行:https://pastebin.com/raw/hk2axdWL(由2048个数字组成,用空格分隔)
答案 0 :(得分:0)
我最终能够弄清楚这一点。我需要创建与sklearn所需的格式完全相同的特征矢量:
import pandas as pd
from sklearn import metrics
df = pd.read_csv("master.csv")
labels = list(df['Q3 Theme1'])
X = open('entire_dataset__resnet50_feature_vectors.txt')
#X_Data = X.read()
fv = []
for line in X:
line = line.strip("\n")
tmp_arr = line.split(' ')
print(tmp_arr)
fv.append(tmp_arr)
print(fv)
print('Silhouette Score:', metrics.silhouette_score(fv, labels,
metric='cosine'))