np数组的Pandas列的形状异常

时间:2019-03-19 00:38:05

标签: python pandas dataframe

我的数据框具有两列np数组(l_cats和r_cats)。这是示例数据

l_name,l_cats,l_gh,r_name,r_cats,r_gh,score
piggly wiggly,1|2|4|0|0,1,piggly wiggly,1|2|4|3|0,1,1
piggly wiggly,1|2|4|0|0,1,piggly wiggly,1|2|4|3|0,1,1
piggly wiggly,1|2|4|0|0,1,piggly wiggly,1|2|4|3|0,1,1
piggly wiggly,1|2|4|0|0,1,piggly wiggly,1|2|4|3|0,1,1
.................
.................
<79 rows>

下面是我如何将数据读入这两列

 data = pd.read_csv(self.path, converters={'l_cats': lambda x: np.array([y for y in x.split('|')]),
                                                  'r_cats': lambda x: np.array([y for y in x.split('|')])})
flat = data['l_cats'].values
print(str(flat.shape))
# Output: (79,)
print(str(flat[0].shape))
# Output: (5,)
print(str(type(flat[0])))
# Output: <class 'numpy.ndarray'>

print(str(flat.shape))的输出不应该是(79,5)吗?

1 个答案:

答案 0 :(得分:0)

不,不应该。
如果您只是做print(flat),则应该了解原因。这是flat

[array(['1', '2', '4', '0', '0'], dtype='<U1')
 array(['1', '2', '4', '0', '0'], dtype='<U1')
 array(['1', '2', '4', '0', '0'], dtype='<U1')
 array(['1', '2', '4', '0', '0'], dtype='<U1')]

如您所见,它不是2D矩阵,而是1D数组或1D数组。要将其转换为2D矩阵,您可以执行以下操作:

mtx = np.stack(flat)