Question

我看到tensorflow contrib库中有一个Kmeans集群的实现。但是，我无法进行估算2D点的聚类中心的简单操作。

代码：

var data = (List<DataVM>)Session["data"]

我收到以下错误：

## Generate synthetic data
N,D = 1000, 2 # number of points and dimenstinality

means = np.array([[0.5, 0.0],
                  [0, 0],
                  [-0.5, -0.5],
                  [-0.8, 0.3]])
covs = np.array([np.diag([0.01, 0.01]),
                 np.diag([0.01, 0.01]),
                 np.diag([0.01, 0.01]),
                 np.diag([0.01, 0.01])])
n_clusters = means.shape[0]

points = []
for i in range(n_clusters):
    x = np.random.multivariate_normal(means[i], covs[i], N )
    points.append(x)
points = np.concatenate(points)

## construct model
kmeans = tf.contrib.learn.KMeansClustering(num_clusters = n_clusters)
kmeans.fit(points.astype(np.float32))

我想我做错了什么但是无法弄清楚文档中的内容。

修改：

我使用InvalidArgumentError (see above for traceback): Shape [-1,2] has negative dimensions [[Node: input = Placeholder[dtype=DT_FLOAT, shape=[?,2], _device="/job:localhost/replica:0/task:0/cpu:0"]()]]解决了它，但它确实很慢（我必须将每个群集中的点数减少到10才能看到结果）。为什么这样，我怎样才能让它更快？

input_fn

解决：

似乎应该设置相对容差。所以我只改变了一行，它工作正常。 def input_fn(): return tf.constant(points, dtype=tf.float32), None ## construct model kmeans = tf.contrib.learn.KMeansClustering(num_clusters = n_clusters, relative_tolerance=0.0001) kmeans.fit(input_fn=input_fn) centers = kmeans.clusters() print(centers)

Answer 1

原始代码使用Tensorflow 1.2返回以下错误：

    WARNING:tensorflow:From <stdin>:1: calling BaseEstimator.fit (from         
    tensorflow.contrib.learn.python.learn.estimators.estimator) with x 
    is deprecated and will be removed after 2016-12-01.
    Instructions for updating:
    Estimator is decoupled from Scikit Learn interface by moving into
    separate class SKCompat. Arguments x, y and batch_size are only
    available in the SKCompat class, Estimator will only accept input_fn.

根据您的编辑，您似乎发现input_fn是唯一可接受的输入。如果你真的想使用TF，我会升级到r1.2并将Estimator包装在SKCompat类中，如错误消息所示。否则，我只会使用SKLearn包。您也可以手动在TF中实现自己的聚类算法，如this blog。

所示

Kmeans如何在tensorflow中进行聚类工作？

1 个答案: