我有以下格式的数据:
Col1 Col2 Col3
1, 1424549456, "3 4"
2, 1424549457, "2 3 4 5"
&安培;已成功将其读入熊猫。
如何将Col3转换为以下形式的numpy矩阵:
# each value needs to become a 1 in the index of the col
# i.e. in the above example 3 is the 4th value, thus
# it is [0 0 0 1] [0 indexing is included]
mtx = [0 0 0 1 1 0 # corresponds to first row
0 0 1 1 1 1]; # corresponds to second row
感谢您提供的任何帮助!
答案 0 :(得分:3)
Since 0.13.1那里str.get_dummies
:
In [11]: s = pd.Series(["3 4", "2 3 4 5"])
In [12]: s.str.get_dummies(sep=" ")
Out[12]:
2 3 4 5
0 0 1 1 0
1 1 1 1 1
您必须确保列是整数(而不是字符串)和reindex:
In [13]: df = s.str.get_dummies(sep=" ")
In [14]: df.columns = df.columns.map(int)
In [15]: df.reindex(columns=np.arange(6), fill_value=0)
Out[15]:
0 1 2 3 4 5
0 0 0 0 1 1 0
1 0 0 1 1 1 1
要获取numpy值,请使用.values
:
In [16]: df.reindex(columns=np.arange(6), fill_value=0).values
Out[16]:
array([[0, 0, 0, 1, 1, 0],
[0, 0, 1, 1, 1, 1]])
答案 1 :(得分:0)
如果没有大量数据,您可以执行类似
的操作res = []
def f(v):
r = np.zeros(6, np.int)
r[map(int, v.split())] = 1
res.append(r)
df.Col3.apply(f)
mat = np.array(res)
# if you really want it to be a matrix, you can do
mat = np.matrix(res)
查看this link了解详情