转换Pandas dtype of dataframe

时间:2015-03-02 19:27:55

标签: python numpy pandas k-means

我有一个 Pandas 数据框,它存储为'对象',但我需要将数据框结构更改为'int',因为'object'dtype将不会在 kmeans() numpy 库的功能

我已设法将数据框的每一列转换为float64,基于此示例Pandas: change data type of columns,但我无法将整个内容更改为其他任何内容。

 #create subset of user variables
 user.posts = user.posts.astype('int')
 user.views = user.views.astype('int')
 user.kudos = user.kudos.astype('int')

 Y = user[['posts','views','kudos']]
 #convert dataframe into float
 X.convert_objects(convert_numeric=True).dtypes

Out[205]:
 posts    float64
 views    float64
 kudos    float64
 dtype: object

这会在我尝试运行时导致问题

K = range(1,10)

# scipy.cluster.vq.kmeans
KM = [kmeans(X,k) for k in K] # apply kmeans 1 to 10

我收到错误

  --->KM = [kmeans(X,k) for k in K] # apply kmeans 1 to 10
  ^

  AttributeError: 'DataFrame' object has no attribute 'dtype'

kmeans对K或X数据帧有什么问题,如何解决? 感谢

1 个答案:

答案 0 :(得分:4)

将其保存为值,而不是对象。根据这篇文章 How to convert a pandas DataFrame subset of columns AND rows into a numpy array?

user.posts = user.posts.astype('float')
user.views = user.views.astype('float')
user.kudos = user.kudos.astype('float')

Y = user[['posts','views','kudos']].values