如何加入熊猫列表?

时间:2021-03-03 12:15:06

标签: python pandas dataframe join merge

假设我有这两个数据框:

df1:

ID    Strings
1     'hello, how are you?'
2     'I like the red one.'
3     'You? I think so.'
df2:

range      Strings
[1]        'hello, how are you?'
[2,3]      'I like the red one. You? I think so.'

我的目标是获取 df1 中的句子并将它们分组,以便它们与 df2 匹配。为此,我设法找到了一种方法来标记我希望他们加入的组,因此在此示例中,1 是独立的,但句子 2 和 3 需要合并。

我可以通过加入来做到这一点吗?

2 个答案:

答案 0 :(得分:1)

假设您在列表中加入了连接,您可以执行以下操作:

df = pd.DataFrame(['hello, how are you?','I like the red one.', 'You? I think so.'], columns=['sentence'])
 
# rows 1 and 2 are to be merged
join = [[0], [1,2]]

# check if the indexes are in the list items
df['joincol'] = pd.Series(df.index).apply(lambda x: [x in j for j in join]).astype(str)

df

sentence    joincol
0   hello, how are you? [True, False] # this is your grouping column
1   I like the red one. [False, True]
2   You? I think so.    [False, True]


# group by and keep uniques
df.groupby('joincol')['sentence'].transform(lambda x: ' '.join(x)).drop_duplicates()

# result

0                     hello, how are you?
1    I like the red one. You? I think so.
Name: sentence, dtype: object

答案 1 :(得分:1)

您可以采取中间步骤,创建一个 group 列以加入。

让我们使用 explodepd.merge

s = df2['range'].explode().reset_index().rename(columns={'index' : 'grp'})

df1a = pd.merge(df1,s,left_on=['ID'],
                right_on=['range'],
                how='left')\
       .groupby('grp')['Strings'].agg(' '.join).to_frame('strings')

然后你可以简单地加入索引:

final = pd.merge(df2,df1a,left_index=True,right_index=True)

print(final)

enter image description here


print(s)
  grp range
0    0     1
1    1     2
2    1     3

print(df1a)

                                      strings
grp                                          
0                       'hello, how are you?'
1    'I like the red one.' 'You? I think so.'