TypeError:float()参数必须是字符串或数字,array = np.array(array,dtype = dtype,order = order,copy = copy)

时间:2016-04-26 17:11:51

标签: error-handling syntax-error k-means

我将K-means聚类应用于来自cvs和excel文件的数据框

参考:http://scikit-learn.org/stable/auto_examples/cluster/plot_cluster_iris.html#example-cluster-plot-cluster-iris-py

我尝试使用csv文件中的数据运行代码,数据如下: DataFile

但是会收到以下错误:

追踪(最近一次呼叫最后一次):

文件“”,第1行,in     runfile('/ Users / nadiastraton / Documents / workspacePython / 02450Toolbox_Python / Thesis / Scripts / Clustering / cluster3.py',wdir ='/ Users / nadiastraton / Documents / workspacePython / 02450Toolbox_Python / Thesis / Scripts / Clustering')

文件“/Applications/anaconda2/lib/python2.7/site-packages/spyderlib/widgets/externalshell/sitecustomize.py”,第699行,在runfile中     execfile(filename,namespace)

文件“/Applications/anaconda2/lib/python2.7/site-packages/spyderlib/widgets/externalshell/sitecustomize.py”,第81行,在execfile中     builtins.execfile(filename,* where)

文件“/Users/cluster3.py”,第46行,in     est.fit(x.as_matrix)

文件“/Applications/anaconda2/lib/python2.7/site-packages/sklearn/cluster/k_means_.py”,第812行,适合     X = self._check_fit_data(X)

文件“/Applications/anaconda2/lib/python2.7/site-packages/sklearn/cluster/k_means_.py”,第786行,在_check_fit_data中     X = check_array(X,accept_sparse ='csr',dtype = np.float64)

文件“/Applications/anaconda2/lib/python2.7/site-packages/sklearn/utils/validation.py”,第373行,在check_array中     array = np.array(array,dtype = dtype,order = order,copy = copy)

TypeError:float()参数必须是字符串或数字

打印(的文档

import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
import pandas as pd
from sklearn.cluster import KMeans

np.random.seed(5)

centers = [[1, 1], [-1, -1], [1, -1]]

data=pd.read_csv('/DataVisualisationSample.csv')
print(data.head())


x = pd.DataFrame(data,columns = ['Post_Share_Count','Post_Like_Count','Comment_Count'])
y = pd.DataFrame(data,columns = ['Comment_Like_Count'])

print(x.info())


estimators = {'k_means_data_3': KMeans(n_clusters=3),
              'k_means_data_8': KMeans(n_clusters=12),
              'k_means_data_bad_init': KMeans(n_clusters=3, n_init=1,
                                              init='random')}


fignum = 1
for name, est in estimators.items():
    fig = plt.figure(fignum, figsize=(4, 3))
    plt.clf()
    ax = Axes3D(fig, rect=[0, 0, .95, 1], elev=48, azim=134)

    plt.cla()
    est.fit(x.as_matrix)
    labels = est.labels_

    ax.scatter(x[:, 2], x[:, 0], x[:, 1], c=labels.astype(np.int))

    ax.w_xaxis.set_ticklabels([])
    ax.w_yaxis.set_ticklabels([])
    ax.w_zaxis.set_ticklabels([])
    ax.set_xlabel('Post_Share_Count')
    ax.set_ylabel('Post_Like_Count')
    ax.set_zlabel('Comment_Count')
    fignum = fignum + 1

# Plot the ground truth
fig = plt.figure(fignum, figsize=(4, 3))
plt.clf()
ax = Axes3D(fig, rect=[0, 0, .95, 1], elev=48, azim=134)

plt.cla()

for name, label in [('Popular', 0),
                    ('Not Popular', 1),
                    ('Least Popular', 2)]:
    ax.text3D(x[y == label, 2].mean(),
              x[y == label, 0].mean() + 1.5,
              x[y == label, 1].mean(), name,
              horizontalalignment='center',
              bbox=dict(alpha=.5, edgecolor='w', facecolor='w'))
# Reorder the labels to have colors matching the cluster results
y = np.choose(y, [1, 2, 0]).astype(np.int)
ax.scatter(x[:, 2], x[:, 0], x[:, 1], c=y).astype(np.int)

ax.w_xaxis.set_ticklabels([])
ax.w_yaxis.set_ticklabels([])
ax.w_zaxis.set_ticklabels([])
ax.set_xlabel('Post_Share_Count')
ax.set_ylabel('Post_Like_Count')
ax.set_zlabel('Comment_Count')
plt.show()

试图修复错误:

(est.fit(x.as_matrix)代替est.fit(x))     (c = labels.astype(np.int)而不是c = labels.astype(np.float)) - (我文件中的所有值都是int。)

但是从np.float更改为np.int并不能解决问题。

0 个答案:

没有答案