我想根据CSV数据
中的表格创建矩阵COEFFICIENT MATRIX
,0,1,2,3,4
0,0.00876623398408,0.525189723661,0.528495953628,0.94228622319,0.0379073884588
1,0.434693398364,0.77017930965,0.00847865052462,0.544319471939,0.858970329817
2,0.978091233581,0.900800004769,0.504567295427,0.65499490009,0.397203736755
3,0.671510258373,0.554713361673,0.377098128478,0.246977226206,0.535900353082
...
5000,0.791781572037,0.70262685963,0.218775600741,0.19802280762,0.68177855465
我使用pandas来读取csv并返回一个矩阵。而不是得到matrix.shape = 5001 * 5,我得到了5002 * 1。
如何让pandas数据框根据逗号从CSV中分隔正确的列数,并且不将标题(在表标题之后)计为第一行?
input = pd.read_csv(coeff_file, skiprows=0)
input_mat = input.as_matrix()
print input.shape
print type(input)
print input_mat.shape
print type(input_mat)
返回
(5002, 1)
<class 'pandas.core.frame.DataFrame'>
(5002, 1)
<type 'numpy.ndarray'>
答案 0 :(得分:1)
我认为您需要read_csv
中的skiprows=1
,skiprows=[0]
或header=1
参数:
df = pd.read_csv(coeff_file, skiprows=1, index_col=0)
print (df)
0 1 2 3 4
0 0.008766 0.525190 0.528496 0.942286 0.037907
1 0.434693 0.770179 0.008479 0.544319 0.858970
2 0.978091 0.900800 0.504567 0.654995 0.397204
3 0.671510 0.554713 0.377098 0.246977 0.535900
5000 0.791782 0.702627 0.218776 0.198023 0.681779
df = pd.read_csv(coeff_file, header=1, index_col=0)
print (df)
0 1 2 3 4
0 0.008766 0.525190 0.528496 0.942286 0.037907
1 0.434693 0.770179 0.008479 0.544319 0.858970
2 0.978091 0.900800 0.504567 0.654995 0.397204
3 0.671510 0.554713 0.377098 0.246977 0.535900
5000 0.791782 0.702627 0.218776 0.198023 0.681779
df = pd.read_csv(StringIO(temp), skiprows=[0], index_col=0)
print (df)
0 1 2 3 4
0 0.008766 0.525190 0.528496 0.942286 0.037907
1 0.434693 0.770179 0.008479 0.544319 0.858970
2 0.978091 0.900800 0.504567 0.654995 0.397204
3 0.671510 0.554713 0.377098 0.246977 0.535900
5000 0.791782 0.702627 0.218776 0.198023 0.681779