使用Scandy Kmeans的Pandas数据框子集?

时间:2015-06-22 15:29:43

标签: python pandas scipy

我有一个使用df = pd.read_csv('my.csv',sep=',')导入的数据框。在该CSV文件中,第一行是列名,第一列是观察名。

我知道如何使用以下方法选择熊猫数据框的子集:

df.iloc[:,1::]

只给出了数值。但是当我尝试使用此命令与scipy.cluster.vq.kmeans一起使用时,

kmeans(df.iloc[:,1::],3)

我收到错误'DataFrame' object has no attribute 'dtype'

有什么建议吗?

1 个答案:

答案 0 :(得分:3)

以下是使用KMeans的示例。

from sklearn.datasets import make_blobs
from itertools import product
import numpy as np
import pandas as pd
from sklearn.cluster import KMeans

# try to simulate your data
# =====================================================
X, y = make_blobs(n_samples=1000, n_features=10, centers=3)

columns = ['feature' + str(x) for x in np.arange(1, 11, 1)]
d = {key: values for key, values in zip(columns, X.T)}
d['label'] = y
data = pd.DataFrame(d)

Out[72]: 
     feature1  feature10  feature2  ...    feature8  feature9  label
0      1.2324    -2.6588   -7.2679  ...      5.4166    8.9043      2
1      0.3569    -1.6880   -5.7671  ...     -2.2465   -1.7048      0
2      1.0177    -1.7145   -5.8591  ...     -0.5755   -0.6969      0
3      1.5735    -0.0597   -4.9009  ...      0.3235   -0.2400      0
4     -0.1042    -1.6703   -4.0541  ...      0.4456   -1.0406      0
..        ...        ...       ...  ...         ...       ...    ...
995   -0.0983    -1.4569   -3.5179  ...     -0.3164   -0.6685      0
996    1.3151    -3.3253   -7.0984  ...      3.7563    8.4052      2
997   -0.9177     0.7446   -4.8527  ...     -2.3793   -0.4038      0
998    2.0385    -3.9001   -7.7472  ...      5.2290    9.2281      2
999    3.9357    -7.2564    5.7881  ...      1.2288   -2.2305      1

[1000 rows x 11 columns]

# fit your data with KMeans
# =====================================================

kmeans = KMeans(n_clusters=3)
kmeans.fit_predict(data.ix[:, :-1].values)

Out[70]: array([1, 0, 0, ..., 0, 1, 2], dtype=int32)