将输入文件的内容读入由id变量键入的字典中

时间:2014-02-05 05:40:44

标签: python python-2.7

我有一个像这样的sampleLabs1.txt文件(它有很多记录,所以我只列出了5行):

visitid cdate ctime pqno测试结果单位范围

OMHioJh8XEeq7152 6/15/2007 06:00 1181913408344759 CREAT 0.8 mg / dL 0.5-1.4 OMHioJh8XEeq7152 6/14/2007 07:10 1181827489130119 CREAT 0.8 mg / dL 0.5-1.4 OMHioJh8XEeq7152 6/11/2007 14:21 1181592540465036 CREAT 2.9 mg / dL 0.5-1.4 t2v0TjgroLTI6118 4/28/2006 14:18 1146257767528282 CREAT 8.7 mg / dL 0.5-1.4 t2v0TjgroLTI6118 5/1/2006 04:00 1146487572667772 CREAT 8.0 mg / dL 0.5-1.4

我想将输入文件的内容读入由“visitid”键入的字典中,也就是说,我想要这样的内容:

{OMHioJh8XEeq7152:6/15 / 2007,06:00,1181913408344759,CREAT,0.8,mg / dL,0.5-1.4,  OMHioJh8XEeq7152:6/14 / 2007,07:10,1181827489130119,CREAT,0.8,mg / dL,0.5-1.4,  OMHioJh8XEeq7152:6/11 / 2007,14:21,1181592540465036,CREAT,2.9,mg / dL,0.5-1.4,  t2v0TjgroLTI6118:4/28 / 2006,14:18,1146257767528282,CREAT,8.7,mg / dL,0.5-1.4,  t2v0TjgroLTI6118:5/1 / 2006,04:00,1146487572667772,CREAT,8.0,mg / dL,0.5-1.4}

我写了以下程序:

import os
newdict = {}
with open(os.path.join("..","c:\work\python programming","sampleLabs1.txt"),"rU") as f:
    for line in f:
        splitLine = line.split()
        newdict[(splitLine[0])] = ",".join(splitLine[1:])
newdict

然而,它确实给了我一本字典,但它似乎覆盖了每个密钥的前一条记录“visitid”,并且只保留了一个唯一密钥(“visitid”)。也就是说,我有这样的事情:

{OMHioJh8XEeq7152:6/15 / 2007,06:00,1181913408344759,CREAT,0.8,mg / dL,0.5-1.4,  t2v0TjgroLTI6118:5/1 / 2006,04:00,1146487572667772,CREAT,8.0,mg / dL,0.5-1.4}

但我想保留每个“visitid”指定的所有记录,例如:

{OMHioJh8XEeq7152:6/15 / 2007,06:00,1181913408344759,CREAT,0.8,mg / dL,0.5-1.4,  OMHioJh8XEeq7152:6/14 / 2007,07:10,1181827489130119,CREAT,0.8,mg / dL,0.5-1.4,  OMHioJh8XEeq7152:6/11 / 2007,14:21,1181592540465036,CREAT,2.9,mg / dL,0.5-1.4,  t2v0TjgroLTI6118:4/28 / 2006,14:18,1146257767528282,CREAT,8.7,mg / dL,0.5-1.4,  t2v0TjgroLTI6118:5/1 / 2006,04:00,1146487572667772,CREAT,8.0,mg / dL,0.5-1.4}

我将非常感谢您的帮助,有人可以帮我修改我的代码吗?谢谢大家的帮助。

2 个答案:

答案 0 :(得分:0)

from collections import defaultdict, namedtuple
import os

WORKDIR = "c:\work\python programming"

Datum = namedtuple('Datum', ['visitid', 'cdate', 'ctime', 'pqno', 'test', 'result', 'unit', 'range'])

def load_data(fname):
    fname = os.path.join(WORKDIR, fname)
    with open(fname, 'rU') as inf:
        data = (Datum(*(line.split())) for line in inf)
        newdict = defaultdict(list)
        for d in data:
            newdict[d.visitid].append(d)
    return newdict

def main():
    data = load_data('sampleLabs1.txt')
    # now do something with it

if __name__=="__main__":
    main()

答案 1 :(得分:0)

如果您的计划是分析visitid下的所有条目,或者比较visitid之间的平均值等,您可能希望将其视为数据库表。pandas包对此有好处:

import pandas
nd = pandas.read_csv('sampleLabs1.txt',sep=' ')
unique(nd['visitid'])  # all visitid values
nd[nd['visitid'] == 'OMHioJh8XEeq7152']['cdate'] # all cdates for a given visitid

要使用字典,您需要将每个visitid的值设置为某种元组 - 就像Hugh Bothwell的例子一样。