Sales Discount Profit Product ID
0 0.050090 0.000000 0.262335 FUR-ADV-10000002
1 0.110793 0.000000 0.260662 FUR-ADV-10000108
2 0.309561 0.864121 0.241432 FUR-ADV-10000183
3 0.039217 0.591474 0.260687 FUR-ADV-10000188
4 0.070205 0.000000 0.263628 FUR-ADV-10000190
5 0.697873 0.000000 0.281162 FUR-ADV-10000571
6 0.064918 0.000000 0.261285 FUR-ADV-10000600
7 0.091950 0.000000 0.262946 FUR-ADV-10000847
8 0.056013 0.318384 0.257952 FUR-ADV-10001283
9 0.304472 0.318384 0.265739 FUR-ADV-10001440
10 0.046234 0.318384 0.261058 FUR-ADV-10001659
Am使用K弯头法找到正确的簇数
import matplotlib.pyplot as plt
def kelbow(final_df,k):
from sklearn.cluster import KMeans
x = []
for i in range(1,k):
kmeans = KMeans(n_clusters = i)
kmeans.fit(final_df)
x.append(kmeans.inertia_)
plt.plot(range(1,k), 30)
plt.title('The elbow method')
plt.xlabel('The number of clusters')
plt.ylabel('WCSS')
plt.show()
return x
返回功能, kelbow(final_df,30),
但是代码抛出错误,因为 ValueError:无法将字符串转换为浮点型:'TEC-STA-10004927' 如何找到群集?
答案 0 :(得分:0)
设置虚拟变量。
final_df = pd.get_dummies(final_df, columns=['ProductID'], dtype=('int64'))
final_df = final_df.drop(['ProductID'], axis=1)
答案 1 :(得分:0)
这应该对您有用:
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans
def kelbow(df, k):
x = []
final_df = pd.get_dummies(df, columns=df.select_dtypes(['object']).columns)
for i in range(1,k):
kmeans = KMeans(n_clusters = i)
kmeans.fit(final_df)
x.append(kmeans.inertia_)
plt.plot(range(1,k), 30)
plt.title('The elbow method')
plt.xlabel('The number of clusters')
plt.ylabel('WCSS')
plt.show()
return x