我正在尝试使用scipy中的kmeans聚类,正是这里的聚类:
我要做的是转换列表列表,如下所示:
data without_x[
[0, 0, 0, 0, 0, 0, 0, 20.0, 1.0, 48.0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1224.0, 125.5, 3156.0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 22.5, 56.0, 41.5, 85.5, 0, 0, 0, 0, 0, 0, 0, 0, 1495.0, 3496.5, 2715.0, 5566.5, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
]
进入一个ndarry以便与Kmeans方法一起使用它。当我尝试将列表列表转换为ndarray时,我得到一个空数组,从而排除了整个分析。 ndarray的长度是可变的,它取决于收集的样本数量。但是我可以轻松搞定 LEN(data_without_x)
以下是返回空列表的代码片段。
import numpy as np
import "other functions"
data, data_without_x = data_preparation.generate_sampled_pdf()
nodes_stats, k, list_of_list= result_som.get_number_k()
data_array = np.array(data_without_x)
whitened = whiten(data_array)
centroids, distortion = kmeans(whitened, int(k), iter=100000)
这就是我得到的输出只是保存在一个简单的日志文件中:
___________________________
this is the data array[[ 0. 0. 0. ..., 0. 0. 0.]
[ 0. 0. 0. ..., 0. 0. 0.]
[ 0. 0. 0. ..., 0. 0. 0.]
...,
[ 0. 0. 0. ..., 0. 0. 0.]
[ 0. 0. 0. ..., 0. 0. 0.]
[ 0. 0. 0. ..., 0. 0. 0.]]
___________________________
This is the whitened array[[ nan nan nan ..., nan nan nan]
[ nan nan nan ..., nan nan nan]
[ nan nan nan ..., nan nan nan]
...,
[ nan nan nan ..., nan nan nan]
[ nan nan nan ..., nan nan nan]
[ nan nan nan ..., nan nan nan]]
___________________________
当我尝试将列表列表转换为numpy.array时,有没有人知道会发生什么?
感谢您的帮助
答案 0 :(得分:4)
这正是如何将列表列表转换为python中的ndarray。您确定您的data_without_x填写正确吗?在我的机器上:
data = [[1,2,3,4],[4,5,6,7,8]]
data_arr = np.array(data)
data_arr
array([[1,2,3,4],
[5,6,7,8]])
我认为你期待的行为
查看您的输入内容有很多零...请记住,打印输出并不能显示所有内容。您可能只是看到所有"零"从你的输入。检查特定的非零元素以确保
答案 1 :(得分:0)
vq.whiten
和vq.kmeans
期待一个形状(M, N)
的数组,其中每一行是一个观察点。转换你的data_array
:
import numpy as np
import scipy.cluster.vq as vq
np.random.seed(2013)
data_without_x = [
[0, 0, 0, 0, 0, 0, 0, 20.0, 1.0, 48.0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
1224.0, 125.5, 3156.0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 22.5, 56.0, 41.5, 85.5, 0, 0, 0, 0, 0, 0, 0, 0, 1495.0,
3496.5, 2715.0, 5566.5, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
]
data_array = np.array(data_without_x).T
whitened = vq.whiten(data_array)
centroids, distortion = vq.kmeans(whitened, 5)
print(centroids)
产量
[[ 1.22649791e+00 2.69573144e+00]
[ 3.91943108e-03 5.57406434e-03]
[ 5.73668382e+00 4.83161524e+00]
[ 0.00000000e+00 1.29763133e+00]]
答案 2 :(得分:0)
使用numpy的asarray功能。 这很简单: 参考:https://docs.scipy.org/doc/numpy/reference/generated/numpy.asarray.html