Question

尝试对groupby操作的结果使用apply时遇到问题。

我有以下2个数据帧：

>>> df1.head()
         col1  col2  col3
id1                      
2001991   0.0     0     0
1501102   3.0     1     1
1701072   0.0     0     0
2001022   0.1    20    50
2001212   3.0     2     4
>>> df2.head()
     id2  value      id1
0  24400   6.28  2001022
1  24400   3.40  2001011
2  24037  12.30  2002011
3  24037   3.00  2001382
4  24037  20.00  1701071

我首先为df2做了一个groupby和一个总和：

>>> df2 = df2.groupby(['id2', 'id1']).sum()
>>> df2.head()
              value
id2 id1            
81  1701071   49.94
88  1701071  759.22
    2001011   73.26
    2001382  199.70
    2003071   25.00

我现在想使用apply，但我需要将它作为索引的一部分提供给id1，所以当我尝试执行以下操作时出现错误：

df2['new'] = df2.apply(lambda row: min(row['value'], df1.loc[row['id1'], 'col1']), axis=1)

这样做的正确方法是什么？

[顺便说一句，我也尝试在一个表中合并df1和df2（这样df2中的每一行都有一个字段，其中包含来自df1的相应col1，col2和col3），但是当我执行groupby和sum（）时它汇总了col1，col2和col3值（我不想要）]

Answer 1

您可以重置索引，然后它将是一个普通列：

df2.reset_index(level='id1')

或者将as_index=False提供给groupby：

df2.groupby(['id2', 'id1'], as_index=False).sum()

熊猫：提供适用的指数值

1 个答案: