Question

我有一个数据集如下：

state VDM  MDM  OM  
AP     1    2   5   
GOA    1    2   1   
GU     1    2   4   
KA     1    5   1   
MA     1    4   4

我尝试了以下代码：

aMat=df1000.as_matrix()
print(aMat)

这里df1000是数据集。

但是上面的代码给出了以下输出：

[['AP' 1 2 5]
 ['GOA' 1 2 1]
 ['GU' 1 2 4]
 ['KA' 1 5 1]
 ['MA' 1 4 4]]

我想创建一个二维列表或矩阵，如下所示：

[['1',  '2',    '5'],   
 ['1',  '2',    '1'],   
 ['1',  '2',    '4'],   
 ['1',  '5',    '1'],   
 ['1',  '4',    '4']]

Answer 1

您可以使用df.iloc[]：

df.iloc[:,1:].to_numpy()

array([[1, 2, 5],
   [1, 2, 1],
   [1, 2, 4],
   [1, 5, 1],
   [1, 4, 4]], dtype=int64)

或用于字符串矩阵：

df.astype(str).iloc[:,1:].to_numpy()

array([['1', '2', '5'],
   ['1', '2', '1'],
   ['1', '2', '4'],
   ['1', '5', '1'],
   ['1', '4', '4']], dtype=object)

请注意为什么我们不使用as_matrix()

“。as_matrix在将来的版本中将被删除。请改用.values。”

Answer 2

选择所有不带DataFrame.iloc的列，并通过DataFrame.astype将整数值转换为字符串，最后通过to_numpy或DataFrame.values转换为numpy数组：

#pandas 0.24+
aMat=df1000.iloc[:, 1:].astype(str).to_numpy()
#pandas below
aMat=df1000.iloc[:, 1:].astype(str).values

或通过DataFrame.drop删除第一列：

#pandas 0.24+
aMat=df1000.drop('state', axis=1).astype(str).to_numpy()
#pandas below
aMat=df1000.drop('state', axis=1).astype(str).values