Question

我正在使用Python来分析大量的CSV数据。此数据包含给定时间戳和主机对的4种不同类型的度量标准，度量标准类型在每行的第一个字段中指示。这是一个简化的例子：

metric,timestamp,hostname,value
metric1,1488063747,example01.net,12
metric2,1488063747,example01.net,23
metric3,1488063747,example01.net,34
metric4,1488063747,example01.net,45
metric1,1488063788,example02.net,56
metric2,1488063788,example02.net,67
metric3,1488063788,example02.net,78
metric4,1488063788,example02.net,89

因此，对于每个row（实际上是列表列表中的列表），我创建一个由时间戳和主机名组成的索引：

idx = row[1] + ',' + row[2]

现在，基于第一个字段（列表元素）的内容，我做了类似的事情：

if row[0] == 'metric1': metric_dict[idx] = row[3]

我为4个指标中的每一个都这样做。它有效，但似乎应该有更好的方法。似乎我需要以某种方式隐式或间接地根据row [0]的内容选择要使用的字典，但我的搜索没有产生结果。在这种情况下，4 if行不是很难，但在文件中包含更多度量标准类型并不罕见。是否可以这样做，并且在阅读列表列表后需要许多字典？谢谢。

Answer 1

问题：没有足够的说法。

解决方案：

conversion_dict = {'metric1': metric1_dict, 'metric2': metric2_dict}

for row:
    conversion_dict[row[0]][idx] = row[3]

Answer 2

为什么不喜欢

output = {}
for row in rows:
    # assuming this data is already split

    if not row[0] in output:
        output[row[0]] = {}
    idx = row[1] + ',' + row[2]
    output[row[0]][idx] = row[3]

Answer 3

如果您正在进行大量的表操作，您可能会发现pandas库很有帮助。如果我理解你正在尝试做什么：

import pandas as pd
from StringIO import StringIO

s = StringIO("""metric,timestamp,hostname,value
metric1,1488063747,example01.net,12
metric2,1488063747,example01.net,23
metric3,1488063747,example01.net,34
metric4,1488063747,example01.net,45
metric1,1488063788,example02.net,56
metric2,1488063788,example02.net,67
metric3,1488063788,example02.net,78
metric4,1488063788,example02.net,89
""")

df = pd.read_csv(s)
df.pivot(index="timestamp", columns='metric',values='value')

这会产生：

metric      metric1  metric2  metric3  metric4
timestamp                                     
1488063747       12       23       34       45
1488063788       56       67       78       89

隐含地决定应该使用哪个字典

3 个答案: