Question

我有许多包含数据的文本数据文件。每个都包含一个标题（行数不定），因此一个开头：

machine: A, 6X
algorithm: AAA_15.6.03
add on: Open Field - 00
.
.
.
column legend: Field Size [mm]
row legend: Offaxis distance [mm]
data legend: Relative dose [%]

然后将有一个描述前面的数据块，下一行（30.0、40.0等是每个度量的字段大小，并用作列标题）：

Curves at depth [mm]:  15.000
,     30.0,     40.0,     50.0,     60.0,     80.0,    100.0,    120.0,    150.0,    200.0,    250.0,    300.0,    350.0,    400.0
-0.000,  100.003,   99.986,   99.967,   99.831,   99.961,  100.030, 100.142,   99.988,   99.984,   99.864,   99.810,   99.704,   99.660
 2.253,         ,         ,         ,         ,         ,         ,         ,         ,  100.007,         ,         ,         ,         
 2.278,         ,         ,         ,         ,         ,         ,         ,         ,         ,         ,   99.831,         ,         
 2.283,         ,         ,         ,         ,         ,         ,  100.155,         ,         ,         ,         ,         ,         
 2.324,         ,         ,         ,         ,   99.969,         ,         ,         ,         ,         ,         ,         ,         
 2.333,         ,         ,         ,         ,         ,  100.055,         ,         ,         ,         ,         ,         ,

然后将继续处理下一个数据块：

Curves at depth [mm]:  50.000
,     30.0,     40.0,     50.0,     60.0,     80.0,    100.0,    120.0,    150.0,    200.0,    250.0,    300.0,    350.0,    400.0
-0.000,   81.892,   83.400,   83.239,   84.356,   85.458,   85.714,   86.253,   86.542,   87.198,   87.287,   87.895,   87.871,   87.980
2.253,         ,         ,   83.240,         ,         ,         ,         ,         ,         ,         ,         ,         ,         
2.278,         ,         ,         ,         ,         ,         ,   86.262,         ,         ,         ,         ,         ,         
2.282,         ,   83.387,         ,         ,         ,         ,         ,         ,         ,         ,         ,         ,         
2.294,         ,         ,         ,         ,         ,         ,         ,         ,         ,         ,         ,         ,   87.996

此刻，我正在手动将每个数据块复制到一个单独的文件中，然后使用pandas read_csv函数，但是我想知道是否有一种更优雅的分割文件的方法。

我编写了一个函数，该函数可以为我提供每个块的起始行号以及获取该块的深度：

def datasetinfo(file):

datasets = 0
ifile = open(file, 'r')
lines = ifile.readlines()
datasets = 0
lineNumber = 0
BeginLine=[]; Depth=[]
for line in lines:

    line = line.replace('\t', ',')      # replaces all the tabs with commas
    line = line.rstrip('\r\n')            # strips any control characters from the end of the line

    if ("Curves at depth [mm]" in line):
        BeginLine.append(lineNumber)
        Depth.append(line.split(':')[1])
        datasets = datasets+1


    lineNumber = lineNumber + 1
return BeginLine, Depth

但是我不确定下一步要去哪里。我想将文件拆分为多个名称为“ Depth_15”，“ Depth_50”等的数据集。

将数据文件拆分为单独的熊猫数据帧

0 个答案: