查找数据框中最频繁的字符串

时间:2018-08-24 12:54:39

标签: python string pandas nlp

我是Python编程的新手。我有一个熊猫数据框,其中有两个字符串列。

数据框如下:

Case    Action
Create   Create New Account
         Create New Account
         Create New Account
         Create New Account
         Create Old Account
Delete   Delete New Account
         Delete New Account
         Delete Old Account
         Delete Old Account
         Delete Old Account

在这里,我们可以在Create中看到5个动作,其中Create New Account是4个动作。平均值为4/5(= 80%)。类似地,在Delete情况下,最大情况为Delete Old Account。因此,我的要求是,下次遇到任何情况Create时,我应该以频率得分将o / p设为Crate New Account

预期的O / P:

Case    Action              Score
Create  Create New Account  80
Delete  Delete Old Account  60

1 个答案:

答案 0 :(得分:1)

crosstab groupby之前使用tail

pd.crosstab(df.Case,df.Action,normalize='index').stack().sort_values().groupby(level=0).tail(1)
Out[769]: 
Case    Action          
Delete  DeleteOldAccount    0.6
Create  CreateNewAccount    0.8
dtype: float64

或者使用where

pdf=pd.crosstab(df.Case,df.Action,normalize='index')
pdf.where(pdf.eq(pdf.max(1),axis=0)).stack()
Out[781]: 
Case    Action          
Create  CreateNewAccount    0.8
Delete  DeleteOldAccount    0.6
dtype: float64