Question

假设我有一个如下数据框：

df = pd.DataFrame(range(4), index=range(4))
df = df.append(df)

得到的df是：

我想将相同索引的值组合到列表中。期望的结果是：

0 [0,0]
1 [1,1]
2 [2,2]
3 [3,3]

对于更现实的场景，我的索引将是日期，我想根据日期将多个obs聚合到一个列表中。通过这种方式，我可以在每个日期对obs执行一些功能。

Answer 1

对于更现实的场景，我的索引将是日期，我想根据日期将多个obs聚合到一个列表中。就这样，我可以在每个日期对obs执行一些功能。

如果那是你的目标，那么我不认为你想要实际列出一个清单。你想要做的是使用groupby，然后对这些组采取行动。例如：

>>> df.groupby(level=0)
<pandas.core.groupby.DataFrameGroupBy object at 0xa861f6c>
>>> df.groupby(level=0)[0]
<pandas.core.groupby.SeriesGroupBy object at 0xa86630c>
>>> df.groupby(level=0)[0].sum()
0    0
1    2
2    4
3    6
Name: 0, dtype: int64

您也可以提取一个列表：

>>> df.groupby(level=0)[0].apply(list)
0    [0, 0]
1    [1, 1]
2    [2, 2]
3    [3, 3]
Name: 0, dtype: object

但通常更好地对群体本身采取行动。 Series和DataFrames并不真正用于存储对象列表。

Answer 2

In [374]:

import pandas as pd
df = pd.DataFrame({'a':range(4)})
df = df.append(df)
df

Out[374]:
   a
0  0
1  1
2  2
3  3
0  0
1  1
2  2
3  3

[8 rows x 1 columns]

In [379]:
import numpy as np
# loop over the index values and flatten them using numpy.ravel and cast to a list
for index in df.index.values:
    # use loc to select the values at that index
    print(index, list((np.ravel(df.loc[index].values))))
    # handle condition where we have reached the max value of the index, otherwise we output the values twice
    if index == max(df.index.values):
        break
0 [0, 0]
1 [1, 1]
2 [2, 2]
3 [3, 3]

根据索引将项目合并到列表中

2 个答案: