Question

假设我有这两个数据框：

df1:

ID    Strings
1     'hello, how are you?'
2     'I like the red one.'
3     'You? I think so.'

df2:

range      Strings
[1]        'hello, how are you?'
[2,3]      'I like the red one. You? I think so.'

我的目标是获取 df1 中的句子并将它们分组，以便它们与 df2 匹配。为此，我设法找到了一种方法来标记我希望他们加入的组，因此在此示例中，1 是独立的，但句子 2 和 3 需要合并。

我可以通过加入来做到这一点吗？

Answer 1

假设您在列表中加入了连接，您可以执行以下操作：

df = pd.DataFrame(['hello, how are you?','I like the red one.', 'You? I think so.'], columns=['sentence'])
 
# rows 1 and 2 are to be merged
join = [[0], [1,2]]

# check if the indexes are in the list items
df['joincol'] = pd.Series(df.index).apply(lambda x: [x in j for j in join]).astype(str)

df

sentence    joincol
0   hello, how are you? [True, False] # this is your grouping column
1   I like the red one. [False, True]
2   You? I think so.    [False, True]


# group by and keep uniques
df.groupby('joincol')['sentence'].transform(lambda x: ' '.join(x)).drop_duplicates()

# result

0                     hello, how are you?
1    I like the red one. You? I think so.
Name: sentence, dtype: object

Answer 2

您可以采取中间步骤，创建一个 group 列以加入。

让我们使用 explode 和 pd.merge：

s = df2['range'].explode().reset_index().rename(columns={'index' : 'grp'})

df1a = pd.merge(df1,s,left_on=['ID'],
                right_on=['range'],
                how='left')\
       .groupby('grp')['Strings'].agg(' '.join).to_frame('strings')

然后你可以简单地加入索引：

final = pd.merge(df2,df1a,left_index=True,right_index=True)

print(final)

print(s)
  grp range
0    0     1
1    1     2
2    1     3

print(df1a)

                                      strings
grp                                          
0                       'hello, how are you?'
1    'I like the red one.' 'You? I think so.'

如何加入熊猫列表？

2 个答案: