Question

所以我做了一个DataFrame：

dfgrp=df.groupby(['CCS_Category_ICD9','Gender'])['f0_'].sum()
ndf=pd.DataFrame(dfgrp)
ndf
                            f0_
CCS_Category_ICD9   Gender  
1                      F    889
                       M    796
                       U    2
2                      F    32637
                       M    33345
                       U    34

其中f0_是按性别划分的总数我真正想要的只是一个简单的单级数据帧，类似于我通过

获得的数据帧

ndf=ndf.unstack(level=1)
ndf
                   f0_
   Gender          F        M        U
CCS_Category_ICD9           
1                    889.0     796.0    2.0
2                    32637.0   33345.0  34.0
3                    2546.0    1812.0   NaN
4                   347284.0   213782.0 34.0

但我想要的是：

CCS_Category_ICD9    F         M         U      
1                    889.0     796.0    2.0
2                    32637.0   33345.0  34.0
3                    2546.0    1812.0   NaN
4                   347284.0   213782.0 34.0

我无法弄清楚如何压扁或摆脱与f0_和性别相关的水平所有我需要的是“M”，“F”，“U”列标题，所以我有一个简单的一级数据帧。我已经尝试过reset_index和set_index以及其他几个变种，没有运气......

最后我想要一个带行和列总数的简单交叉表（我的例子没有显示..

我做了（如一个答案所示）：

ndf = ndf.f0_.unstack()
ndf

哪位给了我：

Gender  F      M            U
CCS_Category_ICD9           
1   889.0     796.0     2.0
2   32637.0   33345.0   34.0
3   2546.0    1812.0    NaN
4   347284.0  213782.0  34.0

其次是：

 nndf=ndf.reset_index(['CCS_Category_ICD9','F','M','U'])
 nndf
 Gender CCS_Category_ICD9   F     M         U
  0     1                889.0    796.0     2.0
  1     2                32637.0  33345.0   34.0
  2     3                2546.0   1812.0    NaN
  3     4                347284.0 213782.0  34.0
  4     5                3493.0   7964.0    1.0
  5     6                12295.0  9998.0    4.0

这就是关于它但是我无法将性别指数从性别更改为像Idx这样的东西，无论我做什么我都会添加一个额外的行添加新名称，即在性别下面标题为Idx的行。还有一个更直接的解决方案？

Answer 1

你可以

df.loc[:, 'f0_']

来自DataFrame的{{1}}，即选择仅.unstack()级别的MultiIndex列的第一级，或者

gender

请参阅df.columns = df.columns.droplevel() docs

Answer 2

由于ndf是pd.DataFrame，因此它具有列索引。执行unstack()时，它会将行索引中的最后一级附加到列索引。由于列已经有f0_，因此您获得了第二级。要展开您喜欢的方式，请改为在列上调用unstack()。

ndf = ndf.f0_.unstack()

文本Gender是列索引的名称。如果你想摆脱它，你必须覆盖该对象的name属性。

ndf.columns.name = None

在ndf.f0_.unstack()

之后立即使用此功能

Answer 3

通常，如果要使用列作为行索引而使用另一列作为列索引，请使用df.pivot。如果由于具有重复（行，列）对的行而需要聚合值，请使用df.pivot_table。

在这种情况下，您可以使用df.groupby(...)[...].sum().unstack()而不是df.pivot_table import numpy as np import pandas as pd N = 100 df = pd.DataFrame({'CCS': np.random.choice([1,2], size=N), 'Gender':np.random.choice(['F','M','U'], size=N), 'f0':np.random.randint(10, size=N)}) result = df.pivot_table(index='CCS', columns='Gender', values='f0', aggfunc='sum') result.columns.name = None result = result.reset_index()：

   CCS   F    M   U
0    1  89  104  90
1    2  66   65  65

产量

pivot_table()

请注意，在致电result后，DataFrame Indexes的名为索引和列In [176]: result = df.pivot_table(index='CCS', columns='Gender', values='f0', aggfunc='sum'); result Out[176]: Gender F M U CCS 1 89 104 90 2 66 65 65：

CSS

索引名为In [177]: result.index Out[177]: Int64Index([1, 2], dtype='int64', name='CCS')：

Gender

并且列索引名为In [178]: result.columns Out[178]: Index(['F', 'M', 'U'], dtype='object', name='Gender') # <-- notice the name='Gender'：

Index

要从None中删除名称，请将name分配给In [179]: result.columns.name = None In [180]: result Out[180]: F M U CCS 1 95 68 67 2 82 63 68属性：

None

虽然这里不需要，但要从MultiIndex的级别中删除名称，将names的列表分配给result.columns.names = [None]*numlevels（复数）属性：

@Getters @Setters @FieldDefaults(level=AccessLevel.PRIVATE)
public class ExternalData {
  TypeEnum type;
  String data;
  List<ExternalData> children;
}

摆脱Pandas DataFrames上多余的标签

3 个答案: