有效地排列多个相似的数据

时间:2016-07-28 10:05:46

标签: python arrays pandas dataframe

此处显示的数据文件是从仪器输出的测量记录。

我上传了here,有兴趣的人可以下载它。

背景

Sample
RECORD-1
FID1, FID2, front_temperature, laser, laserlow, pressure, mode
-925    284 1452    315 143 16653   He  -28500
-924    281 1462    322 136 16641   He  -28628
-920    281 1455    311 139 16649   He  -28756
-923    279 1454    312 139 16636   He  -28884
......

Sample
RECORD-2
FID1, FID2, front_temperature, laser, laserlow, pressure, mode
-925    284 1452    315 143 16653   He  -28500
......
......  

通常,按照测试程序的顺序,有几个不同样品的记录。这些样本的数据记录都采用相同的格式。

我的尝试

如果数据文件中只有一个样本(* .txt格式),我可以将数据文件排列成pandas。 Dataframe,然后我可以用Python中的更多分析过程处理数据。

我的代码显示在这里:

# Whole datafile with several samples record inside
with open("record.txt") as f:
     mylist = f.read().splitlines() 

## The record for each sample length in 803 lines
lines = mylist[0:803]

### The sample_name was extract from the third line
sample_name = lines[2]

### For each sample, the measure record was saved in several aspects, 
### which were regarded as some columns here
columns  = lines[22].split()

### Generate an empty columns for saving data record later.
df  = {columns[0][:-1]:[],columns[1][:-1]:[],columns[2][:-1]:[],columns[3][:-1]:[],columns[4][:-1]:[],
  columns[5][:-1]:[],columns[6][:-1]:[],} #### I only though about this dumb method for now

## Data extracting
### the valid data record of sample 1 was from line 23
for i in range(0, len(lines[23:]),1):
    for j in range(0, len(columns),1):
        df[columns[j][:-1]].append(lines[23+i].split()[j])
pd.DataFrame(df)  

结果显示如下:

enter image description here

我的目标

从上面的代码中,我可以处理一个样本的数据文件。但是当记录文本中有几个样本时。我无法找到有效处理它的线索。

以下是我的目标的说明。生成用于保存所有样本记录的数据帧dict。

enter image description here

任何建议都会受到赞赏!

1 个答案:

答案 0 :(得分:1)

我认为你正在寻找这样的东西:

html

现在import pandas as pd # Whole datafile with several samples record inside with open("record.txt",'r') as f: mylist = f.read().splitlines() dataset = [] while True: try: ## The record for each sample length in 803 lines lines, mylist = mylist[0:803], mylist[803:] #this split your list!! ### The sample_name was extract from the third line sample_name = lines[2] ### For each sample, the measure record was saved in several aspects, ### which were regarded as some columns here columns = lines[22].split() ### Generate an empty columns for saving data record later. df = {columns[0][:-1]:[],columns[1][:-1]:[],columns[2][:-1]:[],columns[3][:-1]:[],columns[4][:-1]:[], columns[5][:-1]:[],columns[6][:-1]:[],} #### I only though about this dumb method for now ## Data extracting ### the valid data record of sample 1 was from line 23 for i in range(0, len(lines[23:]),1): for j in range(0, len(columns),1): df[columns[j][:-1]].append(lines[23+i].split()[j]) except IndexError: break df = pd.DataFrame(df) dataset.append(df) 应该包含样本1的df。