我有许多包含数据的文本数据文件。每个都包含一个标题(行数不定),因此一个开头:
machine: A, 6X
algorithm: AAA_15.6.03
add on: Open Field - 00
.
.
.
column legend: Field Size [mm]
row legend: Offaxis distance [mm]
data legend: Relative dose [%]
然后将有一个描述前面的数据块,下一行(30.0、40.0等是每个度量的字段大小,并用作列标题):
Curves at depth [mm]: 15.000
, 30.0, 40.0, 50.0, 60.0, 80.0, 100.0, 120.0, 150.0, 200.0, 250.0, 300.0, 350.0, 400.0
-0.000, 100.003, 99.986, 99.967, 99.831, 99.961, 100.030, 100.142, 99.988, 99.984, 99.864, 99.810, 99.704, 99.660
2.253, , , , , , , , , 100.007, , , ,
2.278, , , , , , , , , , , 99.831, ,
2.283, , , , , , , 100.155, , , , , ,
2.324, , , , , 99.969, , , , , , , ,
2.333, , , , , , 100.055, , , , , , ,
然后将继续处理下一个数据块:
Curves at depth [mm]: 50.000
, 30.0, 40.0, 50.0, 60.0, 80.0, 100.0, 120.0, 150.0, 200.0, 250.0, 300.0, 350.0, 400.0
-0.000, 81.892, 83.400, 83.239, 84.356, 85.458, 85.714, 86.253, 86.542, 87.198, 87.287, 87.895, 87.871, 87.980
2.253, , , 83.240, , , , , , , , , ,
2.278, , , , , , , 86.262, , , , , ,
2.282, , 83.387, , , , , , , , , , ,
2.294, , , , , , , , , , , , , 87.996
此刻,我正在手动将每个数据块复制到一个单独的文件中,然后使用pandas read_csv函数,但是我想知道是否有一种更优雅的分割文件的方法。
我编写了一个函数,该函数可以为我提供每个块的起始行号以及获取该块的深度:
def datasetinfo(file):
datasets = 0
ifile = open(file, 'r')
lines = ifile.readlines()
datasets = 0
lineNumber = 0
BeginLine=[]; Depth=[]
for line in lines:
line = line.replace('\t', ',') # replaces all the tabs with commas
line = line.rstrip('\r\n') # strips any control characters from the end of the line
if ("Curves at depth [mm]" in line):
BeginLine.append(lineNumber)
Depth.append(line.split(':')[1])
datasets = datasets+1
lineNumber = lineNumber + 1
return BeginLine, Depth
但是我不确定下一步要去哪里。我想将文件拆分为多个名称为“ Depth_15”,“ Depth_50”等的数据集。