Question

在以下数据框DF中，用户对“电影”和“存在”列具有不同的值。例如，用户2有10个值，用户5有9个值。我想要第一个真正的＆＃39; Exist列的值（相对于用户向量长度）除以用户向量长度以及用户ID放入单独的数据框中：想象一下这是数据框：

    User    Movie       Exist
0   2       172         False
1   2       2717        False
2   2       150         False
3   2       2700        False
4   2       2699        True
5   2       2616        False
6   2       112         False
7   2       2571        True
8   2       2657        True
9   2       2561        False
10  5       3471        False
11  5       187         False
12  5       2985        False
13  5       3388        False
14  5       3418        False
15  5       32          False
16  5       1673        False
17  5       3740        True
18  5       1693        False

因此目标数据框应如下所示：

5/10 =0.5
8/9= 0.88


User  Location
 2      0.5
 5      0.88

用户2的第一个True值在相对索引5（用户2向量中的第5个值）中，而用户5的第一个True值在索引8中（用户5向量中的第8个值）。请注意，我不想要真正的索引是4和17.

Answer 1

选项1

def first_ratio(x):
    x = x.reset_index(drop=True)
    i = x.any() * (x.idxmax() + 1.)
    l = len(x)
    return i / l

df.groupby('User').Exist.apply(first_ratio).rename('Location').to_frame()

User
2    0.500000
5    0.888889
Name: Exist, dtype: float64

选项2

def first_ratio(x):
    v = x.values
    i = v.any() * (v.argmax() + 1.)
    l = v.shape[0]
    return i / l

df.groupby('User').Exist.apply(first_ratio).rename('Location').to_frame()

查找数据框中某些值的索引，并将其作为单独的列

1 个答案:

时序