Question

我想重新格式化数据框，以便显示两列组合的计数。这是一个示例数据框：

my_df = pd.DataFrame({'a': ['first', 'second', 'first', 'first', 'third', 'first'],
               'b': ['foo', 'foo', 'bar', 'bar', 'baz', 'baz'],
               'c': ['do', 're', 'mi', 'do', 're', 'mi'],
               'e': ['this', 'this', 'that', 'this', 'those', 'this']})

看起来像这样：

        a    b   c      e
0   first  foo  do   this
1  second  foo  re   this
2   first  bar  mi   that
3   first  bar  do   this
4   third  baz  re  those
5   first  baz  mi   this

我希望它创建一个新的数据框，计算列a和c之间的组合，如下所示：

c        do   mi   re
a                    
first   2.0  2.0  NaN
second  NaN  NaN  1.0
third   NaN  NaN  1.0

如果我将pivot_table参数设置为等于其他列，我可以使用values执行此操作：

my_pivot_count1 = my_df.pivot_table(values='b', index='a', columns='c', aggfunc='count')

这个问题是列'b'中可能包含nan个值，在这种情况下，该组合不会被计算在内。例如，如果my_df如下所示：

        a    b   c      e
0   first  foo  do   this
1  second  foo  re   this
2   first  bar  mi   that
3   first  bar  do   this
4   third  baz  re  those
5   first  NaN  mi   this

我对my_df.pivot_table的电话给出了这个：

first   2.0  1.0  NaN
second  NaN  NaN  1.0
third   NaN  NaN  1.0

我现在使用b作为values参数，将values参数设置为我引入my_df的新列保证使用my_df['count'] = 1或my_df.reset_index()来获取值，但是只使用列a和c，有没有办法获得我想要的内容而无需添加列？

Answer 1

pandas.crosstab有一个dropna参数，默认设置为True，但在您的情况下，您可以传递False：

pd.crosstab(df['a'], df['c'], dropna=False)
# c       do  mi  re
# a                 
# first    2   2   0
# second   0   0   1
# third    0   0   1

Answer 2

我只需使用groupby / unstack：

df.groupby(by=['a', 'c']).size().unstack(level='c')

c        do   mi   re
a                    
first   2.0  2.0  NaN
second  NaN  NaN  1.0
third   NaN  NaN  1.0

您可以使用fillna和astype

获得幻想

N = (
    df.groupby(by=['a', 'c'])
      .size()
      .unstack(level='c')
      .fillna(0)
      .astype(int)
)

c       do  mi  re
a                 
first    2   2   0
second   0   0   1
third    0   0   1

Answer 3

您可以在.fillna('x')之后添加my_df，而无需更改基础数据框本身。

my_pivot_count1 = my_df.fillna('x').pivot_table(values='b', index='a', columns='c',aggfunc='count')

计算两个Dataframe列之间的组合

3 个答案: