Question

我有一个csv文件，格式如下：

ComponentID subComponent    Measurement
X030        A1111111        784.26
X030        A2222222        784.26
X015        A1111111        997.35
X015        A2222222        997.35
X015        A3333333        997.35
X075        A1111111        673.2
X075        A2222222        673.2
X075        A3333333        673.2
X090        A1111111        1003.2
X090        A2222222        1003.2
X090        A3333333        1003.2
X105        A1111111        34.37
X105        A2222222        34.37
X105        A3333333        34.37
X105        A4444444        34.37

我希望将文件作为以下格式的python字典返回：

my_dict = {'X030': ['A1111111', 'A2222222', 784.26],
           'X015': ['A1111111', 'A2222222', 'A3333333', 997.35 ],
           'X075': ['A1111111', 'A2222222', 'A3333333', 673.2],
           'X090': ['A1111111', 'A2222222', 'A3333333', 1003.2],
           'X105': ['A1111111', 'A2222222', 'A3333333', 'A4444444', 34.37]
          }

最初，我使用itertools.groupby查看它，但这并没有让我到任何地方。我的困惑在于如何设计它，因为我不确定如何返回以下项目：ComponentID: [components, and only one measurement]

我不确定如何执行此任务，任何指导都表示赞赏

Answer 1

我在开始时理解数据结构时遇到了一些麻烦：是否保证任何给定ComponentID的所有子组件都具有相同的测量值？如果是这样，那么给定的TSV格式和期望的dict都不是用于存储该信息的非常合理的数据结构。

尽管如此，这里有一些简单的代码可以完全满足您的要求：

d = {}
with open('yourfile.tsv') as tsvfile:
  next(tsvfile)
  for line in tsvfile:
    row = line.split()
    componentid, subcomponent, measurement = row[0], row[1], float(row[2])
    if not componentid in d:
      d[componentid] = [subcomponent, measurement]
    else:
      assert measurement == d[componentid][-1]
      d[componentid] = d[componentid][:-1] + [subcomponent, measurement]

这里有一些代码将它置于更合理的结构中：

d = {}
with open('yourfile.tsv') as tsvfile:
  next(tsvfile)
  for line in tsvfile:
    row = line.split()
    componentid, subcomponent, measurement = row[0], row[1], float(row[2])
    if not componentid in d:
      d[componentid] = {'subcomponents': [subcomponent], 'measurement': measurement}
    else:
      assert measurement == d[componentid]['measurement']
      d[componentid]['subcomponents'] += [subcomponent]

给你

{
  'X105': {'measurement': 34.37, 'subcomponents': ['A1111111', 'A2222222', 'A3333333', 'A4444444']},
  'X015': {'measurement': 997.35, 'subcomponents': ['A1111111', 'A2222222', 'A3333333']},
  'X075': {'measurement': 673.2, 'subcomponents': ['A1111111', 'A2222222', 'A3333333']},
  'X030': {'measurement': 784.26, 'subcomponents': ['A1111111', 'A2222222']},
  'X090': {'measurement': 1003.2, 'subcomponents': ['A1111111', 'A2222222', 'A3333333']}
}

Answer 2

您可以遍历csv行并使用dict.setdefault方法将行存储在字典中：

>>> import csv
>>> d={}
>>> with open('your_file.csv', newline='') as csvfile:
...     spamreader = csv.reader(csvfile, delimiter='\t')
...     for row in spamreader:
...         d.setdefault(row[0],[]).extend(row[1:])
...     print d

Answer 3

我的方法是：

myData = {}
with open('p.csv') as inputfile:
    for line in inputfile:
        if ('ComponentID' not in line):
            row = [x.strip() for x in line.split('        ')]
            cid = row[0]
            sub = row[1]
            msmt = row[2]

            if cid in myData.keys():
                msmt = myData[cid][-1]
                myData[cid] = myData[cid][:-1]
                myData[cid].append(sub)
                myData[cid].append(msmt)
            else:
                myData[cid] = row[1:]
print myData

将CSV文件排序和重新组织为python词典

3 个答案: