Question

我的数据是以年为单位的，以年份为指数。我有someFunc（）在groupsBy数据上做一些事情。但是，它将返回两个值（两个浮点数，而不是列）。我想将这两个值放入旧数据帧中的两个新列中。使用一个简单的演示功能，我想到的是

def someFunc(group):
    a = 1
    b = 2
    return pd.DataFrame([[a, b]], columns={'colA', 'colB'}, index=[group['year'][0]])
results = df.groupby(level=0).apply(someFunc)
pd.merge(df, results, left_index=True, right_index=True)

但是，这将创建一个双索引值：一个因为我添加了一个索引，一个索引来自apply（）：

results
                colA        colB
year                            
1961 1961          1           2
1962 1962          1           2
1963 1963          1           2

因此，当然，合并将无效。我尝试了其他各种方法（包括返回numpy数组），但所有方法都不整齐。我该怎么办？我知道我可以将函数拆分为两次运行代码，每列一次 - 但这并不是真正有效的。要清楚，我的预期结果（对于变量结果）是

results
                colA        colB
year                            
1961               1           2
1962               1           2
1963               1           2

在此之前，数据看起来像

           c      a        b  
year                                                                          
1983     722   1001  1.06300  
1984     722   1001  1.24225   
1985     722   1001  2.78925   
1986     722   1001  0.59600   
1982  442110   1003  1.86300

中级结果

return pd.DataFrame([[a, b]], columns=['colA', 'colB'], index=[group['year'].max()])

返回

           colA       colB
1961         30   2.434379

那么这是关键问题，对吧？它返回带索引的内容，然后apply()将自己的索引堆叠在顶部。由于无法在没有索引的情况下返回数据帧，我猜测解决方案必须影响apply()

解决方案

发布在某个地方的评论中：

results = df.groupby(level=0).apply(someFunc).reset_index(level=0, drop=True)

Answer 1

这对我使用您的数据

In [57]:

temp="""year           c      a        b                                                                
1983     722   1001  1.06300  
1984     722   1001  1.24225   
1985     722   1001  2.78925   
1986     722   1001  0.59600   
1982  442110   1003  1.86300 """

df = pd.read_csv(io.StringIO(temp), sep='\s+')
df
Out[57]:
   year       c     a        b
0  1983     722  1001  1.06300
1  1984     722  1001  1.24225
2  1985     722  1001  2.78925
3  1986     722  1001  0.59600
4  1982  442110  1003  1.86300

[5 rows x 4 columns]
In [66]:

def someFunc(group):
    a = 1
    b = 2
    #print(group['year'].values)
    return pd.DataFrame([[a, b]], columns={'colA', 'colB'}, index=[group['year'].max()])
df.groupby(level=0).apply(someFunc)
Out[66]:
        colA  colB
0 1983     1     2
1 1984     1     2
2 1985     1     2
3 1986     1     2
4 1982     1     2

[5 rows x 2 columns]

修改

经过进一步讨论后，上面的代码还会显示您所面对的重复索引，以便您可以调用reset_index来摆脱重复：

In [91]: def someFunc(group): a = 1 b = 2 return pd.DataFrame([[a, b]], columns={'colA', 'colB'}, index=[group['year'].max()]) df.groupby(level=0).apply(someFunc).reset_index(level=0, drop=True) Out[91]: colA colB 1983 1 2 1984 1 2 1985 1 2 1986 1 2 1982 1 2 [5 rows x 2 columns]

Pandas：Apply（）：返回多个值

1 个答案: