每个用户明智地按条件频率对熊猫进行分组

时间:2018-07-07 18:38:13

标签: python pandas

我有一个像这样的数据框

<html>
<head>
<link rel="stylesheet" href="https://maxcdn.bootstrapcdn.com/bootstrap/4.1.0/css/bootstrap.min.css">
<script src="https://ajax.googleapis.com/ajax/libs/jquery/3.3.1/jquery.min.js"> 
</script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/popper.js/1.14.0/umd/popper.min.js"></script>
<script src="https://maxcdn.bootstrapcdn.com/bootstrap/4.1.0/js/bootstrap.min.js"> 
</script>
<script defer src="https://use.fontawesome.com/releases/v5.0.6/js/all.js"></script>

</head>
<script>
$(document).ready(function(){
$('[data-toggle="tooltip"]').tooltip();   
});
</script>
<body>
<h2>Bottom Tooltip w/ Top Arrow</h2>
<a href="https://bitcoin.org/" target="_blank" data-placement="bottom" data-toggle="tooltip" title="Official website">      
   <i class="fas fa-home"></i>
</a>
<a href="https://bitcoin.org/" target="_blank" data-placement="bottom" data-toggle="tooltip" title="Official website">      
   <i class="fas fa-home"></i>
</a>
<a href="https://bitcoin.org/" target="_blank" data-placement="bottom" data-toggle="tooltip" title="Official website">      
   <i class="fas fa-home"></i>
</a>
</body>
</html>

我想按如下所示按用户明智的方式获取国家和产品组合计数

首先拆分国家/地区,然后将其与产品结合起来并进行计数。

想要的输出:

enter image description here

2 个答案:

答案 0 :(得分:3)

这是在SO上组合其他答案的一种方法(这仅显示了搜索:D的力量)

dplyr()

返回:

import pandas as pd

df = pd.DataFrame({
    'User':['101','101','102','102','102'],
    'Product':['x','x','x','z','z'],
    'Country':['India,Brazil','India','India,Brazil,Japan','India,Brazil','Brazil']
})

# Making use of: https://stackoverflow.com/a/37592047/7386332
j = (df.Country.str.split(',', expand=True).stack()
                                           .reset_index(drop=True, level=1)
                                           .rename('Country'))
df = df.drop('Country', axis=1).join(j)

# Reformat to get desired Country_Product
df = (df.drop(['Country','Product'], 1)
      .assign(Country_Product=['_'.join(i) for i in zip(df['Country'], df['Product'])]))

df2 = df.groupby(['User','Country_Product'])['User'].count().rename('Count').reset_index()

print(df2)

答案 1 :(得分:3)

get_dummies

df.set_index(['User','Product']).Country.str.get_dummies(sep=',').replace(0,np.nan).stack().sum(level=[0,1,2])
Out[658]: 
User  Product        
101   x        Brazil    1.0
               India     2.0
102   x        Brazil    1.0
               India     1.0
               Japan     1.0
      z        Brazil    2.0
               India     1.0
dtype: float64