Question

我想将我的数据集过滤给听过最少数量的独特艺术家的用户。我的目标是专注于具有相对较高听众的用户作为艺术家选择的功能。

下面是一个数据示例和我的初始代码方法：

full_df.head()

   user artist              plays  gender  age    Country
0   a   devendra banhart    456    m       28.0   United States
1   b   boards of canada    407    m       28.0   United States
2   a   cocorosie           386    m       28.0   United States
3   c   aphex twin          213    m       28.0   United States
4   d   animal collective   203    m       28.0   United States

代码：

eda_df = full_df.groupby('users')['artist'].filter(lambda x: len(x) >= 20)

在这种情况下，用户a将显示最高的艺术家数量。

Answer 1

您可以将groupby.nunique与pd.DataFrame.transform一起使用。

此示例使用您的数据过滤用户1：

的最小唯一艺术家数量

<button id="user-button" class="Sign-in"> Sign in </button>

<style>

 .Sign-in{
background-color: springgreen;
border: none;
color: white;
padding: 15px 25px;
text-align: center;
font-size: 18px;
cursor: pointer;
border-radius: 10px;
font-family: Lobster, helvetica;      

 }
</style>

熊猫 - 过滤最少艺术家听过的用户

1 个答案: