Question

我有一个数据框，想要将其转换为数组。以下是我的代码：

>>> test.columns
Index(['OverallQual', 'GrLivArea', 'GarageCars', 'TotalBsmtSF', 'FullBath',
       'YearBuilt'],
      dtype='object')
>>> test.head()
   OverallQual  GrLivArea  GarageCars  TotalBsmtSF  FullBath  YearBuilt
0            5        896         1.0        882.0         1       1961
1            6       1329         1.0       1329.0         1       1958
2            5       1629         2.0        928.0         2       1997
3            6       1604         2.0        926.0         2       1998
4            8       1280         2.0       1280.0         2       1992

>>> test.as_matrix()
array([[  5.00000000e+00,   8.96000000e+02,   1.00000000e+00,
          8.82000000e+02,   1.00000000e+00,   1.96100000e+03],
       [  6.00000000e+00,   1.32900000e+03,   1.00000000e+00,
          1.32900000e+03,   1.00000000e+00,   1.95800000e+03],
       [  5.00000000e+00,   1.62900000e+03,   2.00000000e+00,
          9.28000000e+02,   2.00000000e+00,   1.99700000e+03],
       ..., 
       [  5.00000000e+00,   1.22400000e+03,   2.00000000e+00,
          1.22400000e+03,   1.00000000e+00,   1.96000000e+03],
       [  5.00000000e+00,   9.70000000e+02,   0.00000000e+00,
          9.12000000e+02,   1.00000000e+00,   1.99200000e+03],
       [  7.00000000e+00,   2.00000000e+03,   3.00000000e+00,
          9.96000000e+02,   2.00000000e+00,   1.99300000e+03]])

从上面的输出中，您可以看到从test.as_matrix()返回的值是浮点值。我想知道为什么它将所有东西都转换为浮动，即使它们不是。

Answer 1

首先，请注意文档说不应再使用DataFrame.values，而应使用dtype=object。

其次，请注意您的原始数据似乎有dtype=int。这效率不高！如果是整数数据，则应使用DataFrame.values加载原始数据。然后，当你调用dtype=object时，dtype已经是int。

使用{{1}}效率不高，因为它将值存储在其他位置，并指向存储在DataFrame中的值。这可能会使内存消耗增加100％，并降低性能。

为什么`DataFrame.as_matrix`转换为float类型？

1 个答案: