Pandas:将数组列转换为numpy Matrix

时间:2015-02-27 22:29:05

标签: python csv numpy matrix pandas

我有以下格式的数据:

Col1   Col2       Col3
1,    1424549456, "3 4"
2,    1424549457, "2 3 4 5"

&安培;已成功将其读入熊猫。

如何将Col3转换为以下形式的numpy矩阵:

# each value needs to become a 1 in the index of the col
# i.e. in the above example 3 is the 4th value, thus
# it is [0 0 0 1]  [0 indexing is included]
mtx = [0 0 0 1 1 0    # corresponds to first row
       0 0 1 1 1 1];  # corresponds to second row

感谢您提供的任何帮助!

2 个答案:

答案 0 :(得分:3)

Since 0.13.1那里str.get_dummies

In [11]: s = pd.Series(["3 4", "2 3 4 5"])

In [12]: s.str.get_dummies(sep=" ")
Out[12]:
   2  3  4  5
0  0  1  1  0
1  1  1  1  1

您必须确保列是整数(而不是字符串)和reindex:

In [13]: df = s.str.get_dummies(sep=" ")

In [14]: df.columns = df.columns.map(int)

In [15]: df.reindex(columns=np.arange(6), fill_value=0)
Out[15]:
   0  1  2  3  4  5
0  0  0  0  1  1  0
1  0  0  1  1  1  1

要获取numpy值,请使用.values

In [16]: df.reindex(columns=np.arange(6), fill_value=0).values
Out[16]:
array([[0, 0, 0, 1, 1, 0],
       [0, 0, 1, 1, 1, 1]])

答案 1 :(得分:0)

如果没有大量数据,您可以执行类似

的操作
res = []
def f(v):
    r = np.zeros(6, np.int)
    r[map(int, v.split())] = 1
    res.append(r)
df.Col3.apply(f)
mat = np.array(res)

# if you really want it to be a matrix, you can do
mat = np.matrix(res)

查看this link了解详情