如何在pandas中实现内连接

时间:2017-09-28 14:51:26

标签: python pandas join

我需要有效地在Python中实现内连接。

我有2个数据集,这些数据集来自不同的来源,但共享一个公共密钥。

让我们说(为了争论)他们看起来像这样:

person_likes = [{'person_id': '1', 'food': 'ice_cream', 'pastimes': 'swimming'},
                {'person_id': '2', 'food': 'paella', 'pastimes': 'banjo'}]

person_accounts = [{'person_id': '1', 'blogs': ['swimming digest', 'cooking puddings']},
                   {'person_id': '2', 'blogs': ['learn flamenca']}]

我如何才能最好地加入这两组数据。我有这样的事情:

joins = []
for like in person_likes:
    for acc in person_accounts:
        if like['person_id'] == acc['person_id']:
            join = {}
            join.update(like)
            join.update(acc)
            joins.append(join)

print(joins)

这似乎工作得很好(我没有对它进行过广泛的测试),乍一看看起来就像我们能做的最好 - 但我想知道是否有更高效的知识算法,如果还有更多这样做的惯用法或Pythonic法吗?

1 个答案:

答案 0 :(得分:2)

熊猫似乎是一个明显的答案。

import pandas as pd
accounts = pd.DataFrame(person_accounts)
likes = pd.DataFrame(person_likes)
pd.merge(accounts, likes, on='person_id')

                             blogs person_id       food  pastimes
# 0  [swimming digest, cooking puddings]         1  ice_cream  swimming
# 1                     [learn flamenca]         2     paella     banjo