Question

在python pandas数据框“df”中，我有以下三列：

play_count | rating
1-33       | 1
34-66      | 2
67-99      | 3   
100-199    | 4
>200       | 5

我有一个基于play_count发明的评级表（用户听过一首歌的次数）：

song_id | user_id | play_count | rating
X232    | u8347   | 2          | 1
X987    | u3701   | 50         | 2
X271    | u9327   | 10         | 1
X523    | u1398   | 175        | 4

我正在尝试根据播放次数向此表添加列“评级”。例如，如果play_count = 2，则评级将为“1”。

所以看起来像这样

<h2>Enter your data</h2>
<form action="script.php" method="post">
    Data 1:<input type="text" name="data1" /></p>
    Data 2:<textarea name="data2"></textarea></p>
    <input type="submit" name="submit" value="Add Data" />
</form>

在excel中我会用match / index做这个，但我不知道如何在python / pandas中做到这一点。

它是if / else循环和isin的组合吗？

Answer 1

您需要在Excel中使用这些范围的端点：

import numpy as np
bins = [1, 33, 66, 99, 199, np.inf]

然后您可以使用pd.cut查找相应的评分：

pd.cut(df['play_count'], bins=bins, include_lowest=True, labels=[1, 2, 3, 4, 5]).astype(int)

我最后添加了astype(int)，因为pd.cut返回一个分类系列，因此您无法对其进行算术计算。

Answer 2

我认为如果您将play_count表更改为使用最小/最大值，请执行以下操作：

playcount：

min | max | rating
1   |33   | 1
34  |66   | 2
67  |99   | 3   
100 |199  | 4
200 |np.inf  | 5

当然你需要import numpy as np

然后你可以这样做：

df['rating'] = play_count[(df['play_count'] >= play_count['min']) & (df['play_count'] <= play_count['max'])].rating

python pandas数据帧索引匹配

2 个答案: