此处显示的数据文件是从仪器输出的测量记录。
我上传了here,有兴趣的人可以下载它。
Sample
RECORD-1
FID1, FID2, front_temperature, laser, laserlow, pressure, mode
-925 284 1452 315 143 16653 He -28500
-924 281 1462 322 136 16641 He -28628
-920 281 1455 311 139 16649 He -28756
-923 279 1454 312 139 16636 He -28884
......
Sample
RECORD-2
FID1, FID2, front_temperature, laser, laserlow, pressure, mode
-925 284 1452 315 143 16653 He -28500
......
......
通常,按照测试程序的顺序,有几个不同样品的记录。这些样本的数据记录都采用相同的格式。
如果数据文件中只有一个样本(* .txt格式),我可以将数据文件排列成pandas。 Dataframe,然后我可以用Python中的更多分析过程处理数据。
我的代码显示在这里:
# Whole datafile with several samples record inside
with open("record.txt") as f:
mylist = f.read().splitlines()
## The record for each sample length in 803 lines
lines = mylist[0:803]
### The sample_name was extract from the third line
sample_name = lines[2]
### For each sample, the measure record was saved in several aspects,
### which were regarded as some columns here
columns = lines[22].split()
### Generate an empty columns for saving data record later.
df = {columns[0][:-1]:[],columns[1][:-1]:[],columns[2][:-1]:[],columns[3][:-1]:[],columns[4][:-1]:[],
columns[5][:-1]:[],columns[6][:-1]:[],} #### I only though about this dumb method for now
## Data extracting
### the valid data record of sample 1 was from line 23
for i in range(0, len(lines[23:]),1):
for j in range(0, len(columns),1):
df[columns[j][:-1]].append(lines[23+i].split()[j])
pd.DataFrame(df)
结果显示如下:
从上面的代码中,我可以处理一个样本的数据文件。但是当记录文本中有几个样本时。我无法找到有效处理它的线索。
以下是我的目标的说明。生成用于保存所有样本记录的数据帧dict。
任何建议都会受到赞赏!
答案 0 :(得分:1)
我认为你正在寻找这样的东西:
html
现在import pandas as pd
# Whole datafile with several samples record inside
with open("record.txt",'r') as f:
mylist = f.read().splitlines()
dataset = []
while True:
try:
## The record for each sample length in 803 lines
lines, mylist = mylist[0:803], mylist[803:] #this split your list!!
### The sample_name was extract from the third line
sample_name = lines[2]
### For each sample, the measure record was saved in several aspects,
### which were regarded as some columns here
columns = lines[22].split()
### Generate an empty columns for saving data record later.
df = {columns[0][:-1]:[],columns[1][:-1]:[],columns[2][:-1]:[],columns[3][:-1]:[],columns[4][:-1]:[],
columns[5][:-1]:[],columns[6][:-1]:[],} #### I only though about this dumb method for now
## Data extracting
### the valid data record of sample 1 was from line 23
for i in range(0, len(lines[23:]),1):
for j in range(0, len(columns),1):
df[columns[j][:-1]].append(lines[23+i].split()[j])
except IndexError:
break
df = pd.DataFrame(df)
dataset.append(df)
应该包含样本1的df。