Question

我希望在给定特定条件的情况下将Pandas的DataFrame中的列聚合为一个。这个想法是节省DF中的空间并将一些列聚合成一个，只要它们回答了某个条件。一个例子可能会更容易解释：

import pandas as pd
import seaborn as sns     # for sample data set

# load some sample data
titanic = sns.load_dataset('titanic')

# round the age to an integer for convenience
titanic['age_round'] = titanic['age'].round(0)

# crosstabulate
crtb = pd.crosstab(titanic['embark_town'], titanic['age_round'], margins=True)
crtb

的产率：

我想要做的是，例如，将所有＆gt; = 20（例如）的列聚合到一个名为'20 +'的列，并且值将是每行的所有值的总和列聚合。当列标题<20时，它们将保持分离且不受影响。解决这个问题的一种方法是在原始DF中创建另一个列，如果它是＆lt; 20和'20 +'，则给出age_rounded的原始值，或者使用.cut，然后使用它。

想知道是否有办法更巧妙地完成它而不创建新列。谢谢！

Answer 1

对于这个具体示例，我认为您不需要添加列，只需更新您已有的列中的值：

import pandas as pd
import seaborn as sns     # for sample data set

# load some sample data
titanic = sns.load_dataset('titanic')

# round the age to an integer for convenience
titanic['age_round'] = titanic['age'].round(0)
titanic.loc[titanic['age_round']>=20, 'age_round'] = '20+'

# crosstabulate
crtb = pd.crosstab(titanic['embark_town'], titanic['age_round'], margins=True)

你的问题一般是怎么做的？在pandas中聚合数据有许多不同的方法，最标准的是使用.groupby（）构造。交叉表基本上是这两个变量分组的快捷方式，然后调用.unstack（）。

熊猫：如何在Pandas的DataFrame中聚合* * 的 *

1 个答案:

熊猫：如何在Pandas的DataFrame中聚合* * *的* *

1 个答案:

熊猫：如何在Pandas的DataFrame中聚合* * 的 *