movie_id user_id rating
0 1 [5, 2, 1, 6] [4, 4, 5, 4]
1 2 [5, 1] [3, 3]
2 3 [1] [4]
3 4 [1] [3]
4 5 [1] [3]
5 6 [1] [5]
6 7 [6, 1] [2, 4]
7 8 [1, 6] [1, 4]
8 9 [1, 6] [5, 4]
我正在尝试获取“评级”中每行大于3的数字计数。例如[4,4,5,5] => 4 / [3,3] => 0。
这是我到目前为止所做的:
appr = df.copy()
appr['approval'] = appr['rating'].map(Counter)
appr
它输出:
movie_id user_id rating approval
0 1 [5, 2, 1, 6][4, 4, 5, 4] {4: 3, 5: 1}
1 2 [5, 1] [3, 3] {3: 2}
2 3 [1] [4] {4: 1}
3 4 [1] [3] {3: 1}
4 5 [1] [3] {3: 1}
5 6 [1] [5] {5: 1}
6 7 [6, 1] [2, 4] {2: 1, 4: 1}
7 8 [1, 6] [1, 4] {1: 1, 4: 1}
8 9 [1, 6] [5, 4] {5: 1, 4: 1}
我的目标是在每一行的“评级”中过滤不大于3的数字,并对它们的出现求和:
movie_id user_id rating approval appr_sum
0 1 [5, 2, 1, 6][4, 4, 5, 4] {4: 3, 5: 1} 4
1 2 [5, 1] [3, 3] {3: 2} 0
2 3 [1] [4] {4: 1} 1
3 4 [1] [3] {3: 1} 0
4 5 [1] [3] {3: 1} 0
5 6 [1] [5] {5: 1} 1
6 7 [6, 1] [2, 4] {2: 1, 4: 1} 1
7 8 [1, 6] [1, 4] {1: 1, 4: 1} 1
8 9 [1, 6] [5, 4] {5: 1, 4: 1} 2
我尝试过:
s = appr['rating'].map
t = [x for x in s if x > 3]
t
但是有一个TypeError
:'method'对象是不可迭代的,并且如果这部分代码正确出现,则不会总结它们的出现。
答案 0 :(得分:0)
对过滤和 ▿ 2 elements
▿ 0 : 2 elements
- key : "promoId"
- value : 6
▿ 1 : 2 elements
- key : "isFavorite"
- value : true
使用嵌套列表理解:
?isFavorite=1&promoId=6
答案 1 :(得分:0)
表达不起作用的原因是因为您错误地遍历了熊猫系列。完成这项工作的一种更简单的方法是:
import pandas as pd
df = pd.DataFrame({'A': [1, 3, 4]})
a = [x for _, x in df.iterrows() if x['A'] > 3]
print(a)
> [A]
[4]
答案 2 :(得分:0)
一个更好的主意是避免串联列表。而是:
这两个选项均启用矢量化计算。采取第一种选择:
rats = pd.DataFrame(df.pop('rating').values.tolist()).add_suffix('rat')
appr = appr.join(rats).assign(appr_sum=rats.gt(3).sum(1))
答案 3 :(得分:0)
您还可以在评级列上使用apply
方法:
appr['appr_sum'] = \
appr['rating'].apply(lambda ratings: len([x for x in ratings if x > 3]))
print(appr)
movie_id user_id rating count
0 1 [5, 2, 1, 6] [4, 4, 5, 4] 4
1 2 [5, 1] [3, 3] 0
2 3 [1] [4] 1
3 4 [1] [3] 0
4 5 [1] [3] 0
5 6 [1] [5] 1
6 7 [6, 1] [2, 4] 1
7 8 [1, 6] [1, 4] 1
8 9 [1, 6] [5, 4] 2