将多字头读入字典,将数据读入DataFrame

时间:2018-02-02 19:08:58

标签: python list pandas dictionary readline

我有一个* .txt而以下配置:一个长标头和前面的数据。见下文

field1, field2, field3
field4, field5, field 6, field7, field8
field9, fiel10
field11, field12

1, 1.1, 1o.1
2, 0.5, 15
3, 0, 8.3
4, 2.1, 7.8
.. 
..

这是我制作的代码。为了从头部保存值,我创建了一个名为" header"的字典。

header={}
count=1
with open('file.txt') as f:
   while count<4:
      line = f.readline()
      if count==1:
         header['field1]=line.split(',')[0]
         header['field2]=line.split(',')[1]          
         header['field3]=line.split(',')[2]
      if count==2:
         header['field4]=line.split(',')[0]
         header['field5]=line.split(',')[1]          
         header['field6]=line.split(',')[2]
         header['field7]=line.split(',')[3]          
         header['field8]=line.split(',')[4]
      if count==3:
         header['field9]=line.split(',')[0]
         header['field10]=line.split(',')[1]          
      if count==4:
         header['field11]=line.split(',')[0]
         header['field12]=line.split(',')[1]          

#Read the full data into dataframe
df=  pd.read_csv('file.txt',skiprows=4,names=['Col1','Col2','Col3])

然而,我认为这样做并不是很有效也不优雅。我将不胜感激使用I / O文件指针或Pandas的简单版本。感谢

1 个答案:

答案 0 :(得分:0)

迭代标题行,拆分它们,然后遍历行enries:

header = {}
with open('file.txt') as fobj:
    counter = 1
    for line in fobj:
        # assuming empty line between multi-line header and data
        if not line.strip():
            break
        for entry in line.split(','):
            header['field{}'.format(counter)] = entry.strip()
            counter += 1

import pprint    
pprint.pprint(header)

输出:

{'field1': 'field1',
 'field10': 'fiel10',
 'field11': 'field11',
 'field12': 'field12',
 'field2': 'field2',
 'field3': 'field3',
 'field4': 'field4',
 'field5': 'field5',
 'field6': 'field 6',
 'field7': 'field7',
 'field8': 'field8',
 'field9': 'field9'}