Question

我有一个.dat文件，关于我的来源我不确定。我必须阅读此文件才能执行PCA。假设它是白色间隔文件，我成功地能够读取文件的内容并忽略第一列（因为它是一个索引），但是第一行。以下是代码：

import numpy as np
import pandas as pd
from numpy import array

myarray = pd.read_csv('hand_postures.dat', delim_whitespace=True)
myarray = array(myarray)
print(myarray.shape)
myarray = np.delete(myarray,0,1)
print(myarray)
print(myarray.shape)

该文件在链接https://drive.google.com/open?id=0ByLV3kGjFP_zekN1U1c3OGFrUnM处共享。有人能帮助我指出我的错误吗？

Answer 1

调用pd.read_csv时需要额外的参数。

df = pd.read_csv('hand_postures.dat', header=None, delim_whitespace=True, index_col=[0])

df.head()

         1          2        3        4         5        6        7        8   \
0                                                                               
0 -65.55560   0.172413  44.4944  22.2472  0.000000  50.6723  34.3434  17.1717   
1 -65.55560   2.586210  43.8202  21.9101  0.277778  51.4286  34.3434  17.1717   
2 -45.55560   5.000000  43.8202  21.9101  0.833333  56.7227  42.4242  21.2121   
3   5.55556  -2.241380  46.5169  23.2584  1.111110  70.3361  85.8586  42.9293   
4  67.77780  20.689700  59.3258  29.6629  2.222220  80.9244  93.9394  46.9697   

         9        10       11       12        13       14        15       16  \
0                                                                              
0 -0.235294  54.6154  39.7849  19.8925  0.705883  37.2656   41.3043  20.6522   
1 -0.235294  55.3846  38.7097  19.3548  0.705883  38.6719   41.3043  20.6522   
2  0.000000  63.0769  47.3118  23.6559  0.000000  47.8125   54.3478  27.1739   
3 -0.117647  83.8462  90.3226  45.1613  0.352941  73.1250   92.3913  46.1957   
4  0.117647  93.8462  98.9247  49.4624 -0.352941  89.2969  100.0000  50.0000   

     17       18        19       20  
0                                    
0  15.0  34.6584   54.1270  27.0635  
1  14.4  35.2174   55.8730  27.9365  
2  14.4  43.6025   69.8413  34.9206  
3   3.6  73.7888   94.2857  47.1429  
4  -1.2  92.2360  106.5080  53.2540

header=None指定第一行是数据的一部分（而不是标题）
index_col=[0]指定将第一列视为索引

无法使用pandas读取.dat文件的第一行

1 个答案: