Question

我有这种数据框df：

df = pd.DataFrame({'c':[1,1,2,2,3,3],'L0':['a','a','b','c','d','e'],'L1':['a','b','c','e','f','e']})

我现在正在尝试为L0的每个值获取L1和c列中每个值的频率。预期的输出是：

c    a    b    c    d    e    f
1    3    1    0    0    0    0
2    0    1    2    0    1    0
3    0    0    0    1    2    1

我以为我可以使用类似的东西：

df.pivot_table(index='c', columns=np.unique(['L0','L1']), aggfunc=f)

但我无法弄清楚如何描述f应该是一个能够让value_counts()超过多列的函数。

Answer 1

您可以使用crosstab方法，该方法默认情况下会计算因子的频率表，如下所示：

exec('echo ' . $message . ' | mailx -S smtp=10.0.8.8:25 -r ' . $from . ' -s  ' . $subject . ' ' . $to);

Answer 2

没有尝试描述f但是想以另一种方式解决你的问题

In [356]: df.set_index('c').stack().reset_index().groupby(['c', 0]).count().unstack().fillna(0)
Out[356]: 
  level_1                         
0       a    b    c    d    e    f
c                                 
1     3.0  1.0  0.0  0.0  0.0  0.0
2     0.0  1.0  2.0  0.0  1.0  0.0
3     0.0  0.0  0.0  1.0  2.0  1.0

Answer 3

修改：这稍微简单一些：

In[48]: df.groupby('c').apply(lambda df1: 
            df1.drop('c', axis=1).unstack().value_counts().to_frame().transpose()
        ).reset_index(level=1, drop=True).fillna(0)

请参阅下文的解释。

您正在寻找的功能是groupby，而不是pivot。然后，您可以在每个数据框上使用value_counts，分别按c值进行分组。

这与你想要的很接近：

In[39] : df.groupby('c').apply(lambda df1: 
             df1.drop('c', axis=1).apply(pd.Series.value_counts).transpose()
         )
Out[39]: 
       a   b   c   d   e   f
c                           
1 L0   2 NaN NaN NaN NaN NaN
  L1   1   1 NaN NaN NaN NaN
2 L0 NaN   1   1 NaN NaN NaN
  L1 NaN NaN   1 NaN   1 NaN
3 L0 NaN NaN NaN   1   1 NaN
  L1 NaN NaN NaN NaN   1   1

为了对这些值求和，最终结果非常复杂：

In[46]: df.groupby('c').apply(lambda df1: 
            df1.drop('c', axis=1).apply(pd.Series.value_counts).transpose().sum().to_frame().transpose()
        ).reset_index(level=1, drop=True).fillna(0)
Out[46]: 
   a  b  c  d  e  f
c                  
1  3  1  0  0  0  0
2  0  1  2  0  1  0
3  0  0  0  1  2  1

pandas获取多列

3 个答案: