Question

我正在尝试使用基于索引的列表列表制作一个巨大的结果列表。我无法预定义巨型列表中将包含多少个列表。

id   value
1     30
1     21
1     12
1     0
2     1
2     9
2     14
3     12
3     2
4     3
5     1

result = []
for id, dfs in df.groupby('id'):
    ....

    for i, row in dfs.iterrows():
        x = helper(row[value])
        # If the list is found, append the element
        if (result[i]):
            result[i].append(x)
        # Dynamically make lists base on index
        else:
            result[i] = []

如果列表已定义，则只需在列表中附加值x。

预期输出：

    first index      second index  third index   fourth index
[[x1,x5,x10,x11,x14], [x2,x4,x9], [x3,x7],       [x20]]

x值是通过辅助函数计算的

Answer 1

对于我来说还不清楚，您是否希望将结果作为数据框或以'index'为键的字典或以正确顺序排列的项目列表。 顺便说一句，Python列表以索引0开头。

In [706]: result = collections.defaultdict(list)
     ...: for id, dfs in df.groupby('id'):
     ...:     result[id].extend(list(dfs['value'].values))
     ...:

In [707]: result  # this will be a dict
Out[707]:
defaultdict(list,
            {1: [30, 21, 12, 0], 2: [1, 9, 14], 3: [12, 2], 4: [3], 5: [1]})

In [708]: [result[k] for k in sorted(result.keys())]  # turn it into a list
Out[708]: [[30, 21, 12, 0], [1, 9, 14], [12, 2], [3], [1]]

如果您想对组中的每个项目进行某些操作，例如对helper()进行操作，则可以执行以下操作：

In [714]: def helper(val):
     ...:     return 'x' + str(val)  # simplifying whatever helper does

In [715]: result = collections.defaultdict(list)
     ...: for id, dfs in df.groupby('id'):
     ...:     result[id].extend(map(helper, dfs['value'].values))  # pass each value to helper

In [716]: result
Out[716]:
defaultdict(list,
            {1: ['x30', 'x21', 'x12', 'x0'],
             2: ['x1', 'x9', 'x14'],
             3: ['x12', 'x2'],
             4: ['x3'],
             5: ['x1']})

In [717]: [result[k] for k in sorted(result.keys())]
Out[717]:
[['x30', 'x21', 'x12', 'x0'],
 ['x1', 'x9', 'x14'],
 ['x12', 'x2'],
 ['x3'],
 ['x1']]

请注意，实际上并不需要result[id].extend(...)，因为该'id'的每组值都将一起传递。因此，您无需检查结果中是否已存在id。可能只是：

In [720]: result = collections.defaultdict(list)
     ...: for id, dfs in df.groupby('id'):
     ...:     result[id] = list(map(helper, dfs['value'].values))

理想情况下，您希望创建helper，以便可以通过对所有dfs行一起进行操作来与pd.apply()一起使用。

甚至更好的是，构建helper，以便它可以通过pd.groupby.GroupBy.apply()对每个groupby结果的数据帧进行处理。

根据索引动态创建列表列表-Python

1 个答案: