我有一个* .txt而以下配置:一个长标头和前面的数据。见下文
field1, field2, field3
field4, field5, field 6, field7, field8
field9, fiel10
field11, field12
1, 1.1, 1o.1
2, 0.5, 15
3, 0, 8.3
4, 2.1, 7.8
..
..
这是我制作的代码。为了从头部保存值,我创建了一个名为" header"的字典。
header={}
count=1
with open('file.txt') as f:
while count<4:
line = f.readline()
if count==1:
header['field1]=line.split(',')[0]
header['field2]=line.split(',')[1]
header['field3]=line.split(',')[2]
if count==2:
header['field4]=line.split(',')[0]
header['field5]=line.split(',')[1]
header['field6]=line.split(',')[2]
header['field7]=line.split(',')[3]
header['field8]=line.split(',')[4]
if count==3:
header['field9]=line.split(',')[0]
header['field10]=line.split(',')[1]
if count==4:
header['field11]=line.split(',')[0]
header['field12]=line.split(',')[1]
#Read the full data into dataframe
df= pd.read_csv('file.txt',skiprows=4,names=['Col1','Col2','Col3])
然而,我认为这样做并不是很有效也不优雅。我将不胜感激使用I / O文件指针或Pandas的简单版本。感谢
答案 0 :(得分:0)
迭代标题行,拆分它们,然后遍历行enries:
header = {}
with open('file.txt') as fobj:
counter = 1
for line in fobj:
# assuming empty line between multi-line header and data
if not line.strip():
break
for entry in line.split(','):
header['field{}'.format(counter)] = entry.strip()
counter += 1
import pprint
pprint.pprint(header)
输出:
{'field1': 'field1',
'field10': 'fiel10',
'field11': 'field11',
'field12': 'field12',
'field2': 'field2',
'field3': 'field3',
'field4': 'field4',
'field5': 'field5',
'field6': 'field 6',
'field7': 'field7',
'field8': 'field8',
'field9': 'field9'}