Question

我想根据CSV数据

中的表格创建矩阵

COEFFICIENT MATRIX 
,0,1,2,3,4
0,0.00876623398408,0.525189723661,0.528495953628,0.94228622319,0.0379073884588
1,0.434693398364,0.77017930965,0.00847865052462,0.544319471939,0.858970329817
2,0.978091233581,0.900800004769,0.504567295427,0.65499490009,0.397203736755
3,0.671510258373,0.554713361673,0.377098128478,0.246977226206,0.535900353082
...
5000,0.791781572037,0.70262685963,0.218775600741,0.19802280762,0.68177855465

我使用pandas来读取csv并返回一个矩阵。而不是得到matrix.shape = 5001 * 5，我得到了5002 * 1。

如何让pandas数据框根据逗号从CSV中分隔正确的列数，并且不将标题（在表标题之后）计为第一行？

 input = pd.read_csv(coeff_file, skiprows=0)
 input_mat = input.as_matrix()

 print input.shape
 print type(input)

 print input_mat.shape
 print type(input_mat)

返回

(5002, 1)
<class 'pandas.core.frame.DataFrame'>
(5002, 1)
<type 'numpy.ndarray'>

Answer 1

我认为您需要read_csv中的skiprows=1，skiprows=[0]或header=1参数：

df = pd.read_csv(coeff_file, skiprows=1, index_col=0)
print (df)
             0         1         2         3         4
0     0.008766  0.525190  0.528496  0.942286  0.037907
1     0.434693  0.770179  0.008479  0.544319  0.858970
2     0.978091  0.900800  0.504567  0.654995  0.397204
3     0.671510  0.554713  0.377098  0.246977  0.535900
5000  0.791782  0.702627  0.218776  0.198023  0.681779

df = pd.read_csv(coeff_file, header=1, index_col=0)
print (df)
             0         1         2         3         4
0     0.008766  0.525190  0.528496  0.942286  0.037907
1     0.434693  0.770179  0.008479  0.544319  0.858970
2     0.978091  0.900800  0.504567  0.654995  0.397204
3     0.671510  0.554713  0.377098  0.246977  0.535900
5000  0.791782  0.702627  0.218776  0.198023  0.681779

df = pd.read_csv(StringIO(temp), skiprows=[0], index_col=0)
print (df)
             0         1         2         3         4
0     0.008766  0.525190  0.528496  0.942286  0.037907
1     0.434693  0.770179  0.008479  0.544319  0.858970
2     0.978091  0.900800  0.504567  0.654995  0.397204
3     0.671510  0.554713  0.377098  0.246977  0.535900
5000  0.791782  0.702627  0.218776  0.198023  0.681779

Pandas数据帧不会根据csv中的逗号分隔列

1 个答案: