Question

假设我有一个DataFrame，看起来像：

import pandas as pd
import numpy as np

df = pd.DataFrame({'Week' : [1, 2, 1, 2, 1, 2, 1, 2],
                           'Rabbits' : np.random.randn(8),
                           'Donkeys' : np.random.randn(8) * 4,
                           'Mice'   :  np.random.randn(8) * 4})

哪个使df：

然后我要根据天分组，并每天进行一次基本的corr测试：

week_group = df.groupby('Week')
week_group = week_group[df.columns.difference(["Week"])]
week_cor = week_group.corr()

第1周和第2周将week_cor设为MultiIndex，

因此，现在我要执行以下操作：我想基于“两个” DataFrame创建一个DataFrame。详细说明：让我们将第1周视为df1，将第2周视为df2。现在，我们考虑一下df1 entry1中的一个条目和df2，entry2中的一个条目。结果DataFrame的构造如下：

def collapse(entry1, entry2):
    if abs(entry1) >= 0.6 and abs(entry2) >= 0.6:
        return 1
    else:
        return 0

所以在这种情况下，我想要类似的东西：

         Donkeys   Mice      Rabbits                              
Donkeys  1.000000  0.000000  0.000000
Mice     0.000000  1.000000  0.000000
Rabbits  0.000000  0.000000  1.000000

在python中，我通常会执行reduce嵌套列表，但是它不起作用：

from functools import reduce

def collapse(entry1, entry2):
    if abs(entry1) >= 0.6 and abs(entry2) >= 0.6:
        return 1
    else:
        return 0

reduce(collapse, week_cor)

哪个给：

TypeError: bad operand type for abs(): 'str'

这很有意义，因为它有点像是带有字符串键的数组。

我可能会误解pandas的目的，但是我觉得这种在reduce上执行类似MultiIndex的操作的想法会很普遍，并且pandas将有办法做到这一点。如果我对此假设有误，请纠正我；否则，请问纠正MultiIndex的标准缩减方法是什么？

通常：我要使用一个DataFrame，然后按某个时间点对数据进行分组。然后，我将执行一个操作（在本示例中为corr）以根据时间获得MultiIndex。我想以类似于MultiIndex在Python中创建列表的方式“折叠”或减少reduce。结果，我将MultiIndex减少为DataFrame。

Answer 1

在这种情况下，我认为您可以在groupby的第一级上再做一个week_cor，检查所有绝对值是否都大于或等于0.6

print(week_cor)

               Donkeys      Mice   Rabbits
Week                                      
1    Donkeys  1.000000 -0.118953 -0.235307
     Mice    -0.118953  1.000000  0.803987
     Rabbits -0.235307  0.803987  1.000000
2    Donkeys  1.000000  0.229929 -0.593603
     Mice     0.229929  1.000000 -0.645369
     Rabbits -0.593603 -0.645369  1.000000

代码：

week_cor.groupby(level=1).apply(lambda x: x.abs().ge(0.6).all())  

         Donkeys   Mice  Rabbits
Donkeys     True  False    False
Mice       False   True     True
Rabbits    False   True     True

Answer 2

因此，我认为最简单的解决方案是使用pandas.DataFrame.reset_index删除MultiIndex，如下所示：

week_cor = week_cor.reset_index()

现在，您可以在Week列中选择所需的相关子集。这样，您可以更轻松地对它们两个执行进一步的操作。这是一个numpy解决方案，您也许可以以此为基础。

cols = ['Donkeys','Mice','Rabbits']
df1 = week_cor[week_cor['Week'] == 1][cols].values #ndarray
df2 = week_cor[week_cor['Week'] == 2][cols].values #ndarray

def collapse(A, B):
    return np.where((A >= 0.6) & (B >= 0.6), 1, 0)

new_df = pd.DataFrame(collapse(df1, df2), index=cols, columns=cols)

让我知道您是否可以reduce上班，因为我有兴趣了解。

Answer 3

注意：我在看到Ben.T的评论之前就发布了这个答案，他的方法更加简洁，应该使用。

我在扩展Dascienz的答案以使其更笼统：

正如达西恩茨所说：

因此，我认为最简单的解决方案是使用pandas.DataFrame.reset_index
删除MultiIndex

因此，来自：

animal_group = week_cor.reset_index()

我们得到：

然后可以通过"level_1"再次将其分组，以进行说明（看起来像片）：

animal_group = week_cor.reset_index().groupby("level_1")
animal_group.get_group("Donkeys")

给予：

这可以使用agg来减少（尽管我不确定这是否是最好的选择），而"Week"列可以放在最后：

from math import floor

def collapse(x):
    x = x.map(lambda elem: 1 if abs(elem) > 0.6 else 0)
    # A little bit of a math trick here...
    return floor(x.abs().sum() / 2)

animal_group.agg(collapse).drop("Week", axis=1)

仍然显得有些冗长（或者我对Python期望太高了）。但最后：

根据需要。

是否有减少MultiIndex的功能？

3 个答案:

代码：