Question

我目前有一个看起来像这样的df：

Word     Score    Other
This      10       1
is        10       2    
an        20       5
example   50       3
great     20       2

我正在做的事情是根据Word列中的字词创建排列，并将所排列的字词的总分加起来。由于我的数据集相当大，我想只创建分数高于设定总数（在本例中为50）的那些排列，以限制可能的排列的总和。

预期输出：

**Permutations**         **Score**
an example                  70
example great               70
This example                60
etc...

问题如何添加排列字词的分数并将其叠加

我的代码缺少此

import itertools
word = exact['Word']
score = exact['Score']
perm = list(itertools.permutations(word, 3))


removal = perm[perm['Score'] >= 50]

有什么想法吗？

编辑基于加勒特的帮助：

exact = stuff[stuff['Other'] < 6 ]
def find_perms(df, min_score):
    perm = itertools.permutations(df.Word.unique(), 2)
    score = df.Score.to_dict()
    for p in perm:
        s = sum(score[w] for w in p)
        if s >= min_score:
           yield p, s

df = pd.DataFrame(list(find_perms(exact, 50000)),
              columns=['Permutations', 'Score'])

Answer 1

为避免为不符合所需阈值的排列分配内存，可能会计算分数＆＃34;即时＆＃34;在构建pandas对象之前？

def find_perms(df, min_score):
    perm = itertools.permutations(df.Word.unique(), 2)
    score = df.Score.to_dict()
    for p in perm:
        s = sum(score[w] for w in p)
        if s >= min_score:
            yield p, s

df = pd.DataFrame(list(find_perms(df, 50)),
                  columns=['Permutations', 'Score'])

创建Word排列并将他们的个人得分加在一起（Pandas，Python 3）

1 个答案: