我有一个.dat文件,关于我的来源我不确定。我必须阅读此文件才能执行PCA。假设它是白色间隔文件,我成功地能够读取文件的内容并忽略第一列(因为它是一个索引),但是第一行。以下是代码:
import numpy as np
import pandas as pd
from numpy import array
myarray = pd.read_csv('hand_postures.dat', delim_whitespace=True)
myarray = array(myarray)
print(myarray.shape)
myarray = np.delete(myarray,0,1)
print(myarray)
print(myarray.shape)
该文件在链接https://drive.google.com/open?id=0ByLV3kGjFP_zekN1U1c3OGFrUnM处共享。有人能帮助我指出我的错误吗?
答案 0 :(得分:1)
调用pd.read_csv
时需要额外的参数。
df = pd.read_csv('hand_postures.dat', header=None, delim_whitespace=True, index_col=[0])
df.head()
1 2 3 4 5 6 7 8 \
0
0 -65.55560 0.172413 44.4944 22.2472 0.000000 50.6723 34.3434 17.1717
1 -65.55560 2.586210 43.8202 21.9101 0.277778 51.4286 34.3434 17.1717
2 -45.55560 5.000000 43.8202 21.9101 0.833333 56.7227 42.4242 21.2121
3 5.55556 -2.241380 46.5169 23.2584 1.111110 70.3361 85.8586 42.9293
4 67.77780 20.689700 59.3258 29.6629 2.222220 80.9244 93.9394 46.9697
9 10 11 12 13 14 15 16 \
0
0 -0.235294 54.6154 39.7849 19.8925 0.705883 37.2656 41.3043 20.6522
1 -0.235294 55.3846 38.7097 19.3548 0.705883 38.6719 41.3043 20.6522
2 0.000000 63.0769 47.3118 23.6559 0.000000 47.8125 54.3478 27.1739
3 -0.117647 83.8462 90.3226 45.1613 0.352941 73.1250 92.3913 46.1957
4 0.117647 93.8462 98.9247 49.4624 -0.352941 89.2969 100.0000 50.0000
17 18 19 20
0
0 15.0 34.6584 54.1270 27.0635
1 14.4 35.2174 55.8730 27.9365
2 14.4 43.6025 69.8413 34.9206
3 3.6 73.7888 94.2857 47.1429
4 -1.2 92.2360 106.5080 53.2540
header=None
指定第一行是数据的一部分(而不是标题)index_col=[0]
指定将第一列视为索引