从txt文件中查看和标记特定列,解析数据

时间:2014-01-06 21:19:52

标签: python parsing csv pandas

我有一个包含多列的数据集,我只对分析六列(0,1,2,4,6,7)中的数据感兴趣。我想用标题(时间,模式,事件,xcoord,ycoord,phi)标记它们。总共有十列,以下是数据的示例:

1385940076332   3   M   subject_avatar  -30.000000  1.000000    -59.028107  180.000000  0.000000    0.000000
1385940076336   2   M   subject_avatar  -30.000000  1.000000    -59.028107  180.000000  0.000000    0.000000
1385940076339   3   M   subject_avatar  -30.000000  1.000000    -59.028107  180.000000  0.000000    0.000000
1385940076342   3   M   subject_avatar  -30.000000  1.000000    -59.028107  180.000000  0.000000    0.000000
1385940076346   3   M   subject_avatar  -30.000000  1.000000    -59.028107  180.000000  0.000000    0.000000
1385940076350   2   M   subject_avatar  -30.000000  1.000000    -59.028107  180.000000  0.000000    0.000000
1385940076353   3   M   subject_avatar  -30.000000  1.000000    -59.028107  180.000000  0.000000    0.000000
1385940076356   3   M   subject_avatar  -30.000000  1.000000    -59.028107  180.000000  0.000000    0.000000

当我使用以下代码将数据解析为列时,它似乎只计算数据 - 但我希望能够列出数据以供进一步分析。这是我在@alko中使用的代码:

with open('/Users/Lab/Desktop/test.txt', 'r') as infile:
f = infile.readlines()
with open('filtered.txt', 'w') as outfile:
    for line in f:
            if 'subject_avatar' in line: #this line is just to take the relevant rows
                outfile.write(line)

import pandas as pd
df = pd.read_csv('filtered.txt', header=None, false_values=None, sep='\s+')[[0, 1, 2, 4, 6, 7]]
df.columns = ['time', 'mode', 'event', 'xcoord', 'ycoord', 'phi']
print df  

以下是该代码返回的内容:

class 'pandas.core.frame.DataFrame'
Int64Index: 115534 entries, 0 to 115533
Data columns (total 6 columns): 
time      115534  non-null values
mode      115534  non-null values
event     115534  non-null values
xcoord    115534  non-null values
ycoord    115534  non-null values
phi       115534  non-null values
dtypes: float64(3), int64(2), object(1)

我查看了pandas文档,并尝试了df.values和df.index,但这些都不会打印正确的数据(这将是带有正确标题的6列)

0 个答案:

没有答案