我对Python很陌生,并且在尝试创建一个读取制表符分隔文本文件并从数据创建字典的函数时遇到了麻烦。我主要处理以下格式的文本文件,其中包含许多制表符分隔数字数据列,每列有相应的标题:
Time_(s) Mass_Flow_(kg/s) T_in_pipe(C) T_in_water(C) T_out_pipe(C) T_out_water(C)
0 1.2450 16.9029 16.8256 16.6234 16.6204
2.8700 1.2450 16.8873 16.8094 16.6237 19.6507
5.6600 1.2450 16.8889 16.8229 19.1406 29.1320
8.7800 1.2450 16.8875 16.8236 24.1325 34.9077
11.6200 1.2450 16.8794 16.8040 28.3927 38.5443
16.0600 1.2450 16.8615 16.7942 33.7205 42.4149
18.8900 1.2450 16.8512 16.7938 36.2797 44.1221
23.0200 1.2450 16.8319 16.7903 39.2102 46.1857
25.7600 1.2450 16.8380 16.7952 40.7243 47.2657
最好,我想编写一个代码,将每列数据存储为一个数组,但也要将每列的标题存储到一个单独的数组中,以便我可以将它们用作字典中的关键字。例如,如果我查找字典键“Mass_Flow_(kg / s)”,将返回质量流速列中所有值(不包括标题)的数组。
到目前为止,我已尝试使用numpy.loadtxt从列创建此类数值数组,但我没有成功提取标题数据,因此不得不跳过此行。下面的代码将生成我想要的字典,但我宁愿使用更灵活的代码,它不需要我手动命名每个列,尽管名称已经包含在.txt文件中。
import numpy as np
time, m_flow, Tin_pipe, Tin_water, Tout_pipe, Tout_water = np.loadtxt("pipeData.txt",skiprows=1,unpack=True)
#Assign each column in file to respective arrays
my_dict = {"Time":time, "Mass flow rate":m_flow, "Tin_pipe":Tin_pipe, "Tin_water":Tin_water, "Tout_pipe":Tout_pipe, "Tout_water":Tout_water}
#Line arrays to keywords and merge into a dictionary
我试过不跳过第一行但是loadtxt通常会返回wih:
ValueError: could not convert string to float: Time_(s)
因此,如果我想读取字符串数据和数值,我想我需要使用另一个模块。如果有人对我如何做到这一点有任何建议或知道更好的模块这样做,将不胜感激。
基思
答案 0 :(得分:1)
# This module kicks ass
import pandas as pd
pipe_data = pd.read_csv('pipeData.txt', sep='\t')
print pipe_data.columns # prints Time_(s), Mass_Flow_(kg/s), ...
print pipe_data['Time_(s)'] # print the Time_(s) column
答案 1 :(得分:0)
替代方案可能是将 csv 模块用于Python本身。
import csv
with open('temp.txt') as csvfile:
csvrows = csv.reader(csvfile, delimiter='\t')
fieldnames=next(csvrows)
print (fieldnames)
for row in csvrows:
print (row)
当我拿起您提供的数据并用单个标签替换列之间的多个空白时,结果就是这些。
['Time_(s)', 'Mass_Flow_(kg/s)', 'T_in_pipe(C)', 'T_in_water(C)', 'T_out_pipe(C)', 'T_out_water(C)']
['0', '1.2450', '16.9029', '16.8256', '16.6234', '16.6204']
[' 2.8700', '1.2450', '16.8873', '16.8094', '16.6237', '19.6507']
[' 5.6600', '1.2450', '16.8889', '16.8229', '19.1406', '29.1320']
[' 8.7800', '1.2450', '16.8875', '16.8236', '24.1325', '34.9077']
[' 11.6200', '1.2450', '16.8794', '16.8040', '28.3927', '38.5443']
[' 16.0600', '1.2450', '16.8615', '16.7942', '33.7205', '42.4149']
[' 18.8900', '1.2450', '16.8512', '16.7938', '36.2797', '44.1221']
[' 23.0200', '1.2450', '16.8319', '16.7903', '39.2102', '46.1857']
[' 25.7600', '1.2450', '16.8380', '16.7952', '40.7243', '47.2657']
主要问题可能是前导空白仍在第一列。