Question

我正在尝试导入一个.dat文件，该文件从我的实验中输出为标题行中的元数据，然后导入实验本身的数据（在带有虚线的行之后）。我的想法是剥离它，以便我有一个包含元数据的字符串变量列表和另一个变量作为带有结果的数据帧（破折号下面的部分）。我在尝试将以下数据作为数据框导入时遇到问题，因为上面的元数据被归类为字符串列表，因此整个文件保持这种格式。有没有办法将数据作为数据框而不是字符串列表？

Learned-Helplesness-Experiment  (TriplePlatform)  from      05.04.2017         13:41:24

software version:   DoublePlatform_1.3 04-Jun-2014

Setup of Experiment:    

Platform 1: 
ExpType:    M   M   M   M   M   M   M   M   M   M   

heated side:    right   right   right   right   right   right   right       right   right   right   

PIs:     n. def.     0   0   0   0   0   0   0   0   0  

Platform 2: 
ExpType:    Te  Te  Te  Y   Te  Y   Y   Y   Y   Y   

heated side:    right   right   right   ->M right   ->M ->M ->M ->M ->M 

PIs:     n. def.     0   0   0   0   0   0   0   0   0  

Platform 3: 
ExpType:    Y   Y   Y   Y   M_S Y   Y   Y   Y   Y   

heated side:    ->M ->M ->M ->M right   ->M ->M ->M ->M ->M 

PIs:     n. def.     0   0   0   0   0   0   0   0   0  


------------------------------------    ------------------------------------

 0   0   0   0   0
 1   47 -0.3759766   0.1123047   0.3710938
 2   97  0.01953125 -0.1318359   0.1123047
 3   157    -0.4150391   0.2246094   0.3369141
 4   207    -0.01953125 -0.2539063   0.1318359
 5   257    -0.3515625   0.3027344   0.3222656

Answer 1

我猜你在使用熊猫？我认为没有“一般”的做法。您可以手动打开/解析文件（直到“虚线”）。直到破折线的部分，你保持为“字符串列表”。然后你告诉pandas从行号x（你找到破折号）开始导入其余的。该选项名为skiprows。

Edit1（回复评论）：

这取决于您的标头是否具有恒定的行数。如果没有，您可能希望逐行读取文件，查找破折号：

with open('filename', 'r') as file:
    line_no = 0
    for line in file.read():
        line_no += 1
        if line.startswith('-'*37):
            # do sth
            break
        else:
            # do sth

<强> EDIT2

要导入数据部分，您可以使用

pandas.read_csv(..., sep='\t', skiprows=line_no)

如果tab是字段分隔符，或

pandas.read_csv(..., delim_whitespace=True, skiprows=line_no)

如果字段由一个（或多个）blanks

分隔

将.dat文件导入数据帧而不是python中的字符串列表

1 个答案: