Question

我有两个数据框：

In [6]: df1 = pd.DataFrame({'word':['laugh','smile','frown','cry'],'score':[7,2,-3,-8]}, columns=['word','score'])
        df1

Out[6]:     word    score
        0   laugh   7
        1   smile   2
        2   frown   -3
        3   cry -8

In [8]: df2 = pd.DataFrame({'word':['frown','laugh','play']})
        df2

Out[8]:
            word
        0   frown
        1   laugh
        2   play

我知道我可以将它们合并在一起并获得每个单词的分数：

In [10]: pd.merge(df1,df2)

Out[10]:    word    score
         0  laugh   7
         1  frown   -3

但是，我无法完全理解如何：

i）自动为没有分数的单词分配零分。因此，“play”在df2中，但在合并后被删除，但我想在合并后将其保留在结果中。我希望df2包含许多没有分数的单词，所以不可能简单地将这些单词添加到df1并将它们指定为零。所以，我希望合并代替：

Out[10]:    word    score
         0  laugh   7
         1  frown   -3
         2  play    0

ii）我现在如何获得多个单词的平均分数。所以，如果我的数据框看起来像这样：

In [14]: df3 = pd.DataFrame({'words':['frown cry','laugh smile','play laugh', 'cry laugh play smile']})
         df3

Out[14]:    words
        0   frown cry
        1   laugh smile
        2   play laugh
        3   cry laugh play smile

我希望能够与df1交叉引用df3来获取：

Out[14]:    words                 average_score
        0   frown cry              -5.5
        1   laugh smile            4.5
        2   play laugh             3.5
        3   cry laugh play smile   0.25

希望我做的数学合适！我猜在Pandas中可能还有其他/更好的方法吗？

Answer 1

对于（i）您只需要指定right join，并填充空值：

>>> pd.merge(df1, df2, how='right').fillna(0)
    word  score
0  laugh      7
1  frown     -3
2   play      0

（ii）你可以这样做：

>>> def grpavg(ws):
...     i = df1['word'].isin(ws)
...     return df1.loc[i, 'score'].sum() / len(ws)
... 
>>> df3['avg-score'] = df3['words'].str.split().map(grpavg)
>>> df3
                  words  avg-score
0             frown cry      -5.50
1           laugh smile       4.50
2            play laugh       3.50
3  cry laugh play smile       0.25

编辑：回答评论，明确传递框架，然后使用lambda或functools.partial进行绑定：

>>> def grpavg(ws, df):
...     i = df['word'].isin(ws)
...     return df.loc[i, 'score'].sum() / len(ws)
... 
>>> from functools import partial
>>> f = partial(grpavg, df=df1)
>>> df3['avg-score'] = df3['words'].str.split().map(f)
>>> df3
                  words  avg-score
0             frown cry      -5.50
1           laugh smile       4.50
2            play laugh       3.50
3  cry laugh play smile       0.25

Python Pandas查找/交叉引用

1 个答案: