假设我有这两个数据框:
df1:
ID Strings
1 'hello, how are you?'
2 'I like the red one.'
3 'You? I think so.'
df2:
range Strings
[1] 'hello, how are you?'
[2,3] 'I like the red one. You? I think so.'
我的目标是获取 df1 中的句子并将它们分组,以便它们与 df2 匹配。为此,我设法找到了一种方法来标记我希望他们加入的组,因此在此示例中,1 是独立的,但句子 2 和 3 需要合并。
我可以通过加入来做到这一点吗?
答案 0 :(得分:1)
假设您在列表中加入了连接,您可以执行以下操作:
df = pd.DataFrame(['hello, how are you?','I like the red one.', 'You? I think so.'], columns=['sentence'])
# rows 1 and 2 are to be merged
join = [[0], [1,2]]
# check if the indexes are in the list items
df['joincol'] = pd.Series(df.index).apply(lambda x: [x in j for j in join]).astype(str)
df
sentence joincol
0 hello, how are you? [True, False] # this is your grouping column
1 I like the red one. [False, True]
2 You? I think so. [False, True]
# group by and keep uniques
df.groupby('joincol')['sentence'].transform(lambda x: ' '.join(x)).drop_duplicates()
# result
0 hello, how are you?
1 I like the red one. You? I think so.
Name: sentence, dtype: object
答案 1 :(得分:1)
您可以采取中间步骤,创建一个 group
列以加入。
让我们使用 explode
和 pd.merge
:
s = df2['range'].explode().reset_index().rename(columns={'index' : 'grp'})
df1a = pd.merge(df1,s,left_on=['ID'],
right_on=['range'],
how='left')\
.groupby('grp')['Strings'].agg(' '.join).to_frame('strings')
然后你可以简单地加入索引:
final = pd.merge(df2,df1a,left_index=True,right_index=True)
print(final)
print(s)
grp range
0 0 1
1 1 2
2 1 3
print(df1a)
strings
grp
0 'hello, how are you?'
1 'I like the red one.' 'You? I think so.'