Question

我有两个如下所示的数据框，但有更多行：

import pandas as pd

text1 = {'first_text': ['she is cool', 'they are nice', 'he is good', 'we are friendly'],
         'change_adj': ['she is neat', 'NaN', 'NaN', 'we are nice'],
         'change_pro': ['NaN', 'she is nice', 'NaN', 'she is friendly'],
         'change_verb': ['she was cool', 'they were nice', 'he was good', 'NaN'], }

df1 = pd.DataFrame(text1, columns=['first_text', 'change_adj', 'change_pro', 'change_verb'])

text2 = {
    'Domain': ['change_adj', 'change_pro', 'change_verb', 'change_adj', 'change_pro', 'change_verb', 'change_verb'],
    'info': ['she is neat', 'she is nice', 'she was cool', 'we are nice', 'she is friendly', 'they were nice',
             'he was good']}

df2 = pd.DataFrame(text2, columns=['Domain', 'info'])

因此，实质上第二个数据帧是第一个数据帧的堆叠版本减去“ first_text”列。我想做的是将“ first_text”列添加到第二个数据框中，使“ first_text”中的句子与第二个数据框中的info列匹配，如下所示：

所需的输出：

            first_text       Domain              info

 0        she is cool     change_adj        she is neat
 1      they are nice     change_pro        she is nice
 2        she is cool     change_verb       she was cool
 3      we are friendly   change_adj        we are nice
 4      we are friendly   change_pro        she is friendly
 5      they are nice     change_verb       they were nice
 6      he is good        change_verb       he was good

Answer 1

您可以将pandas.melt与pandas.merge组合使用

melt = df1.melt(id_vars='first_text', var_name="Domain", value_name="info")

df2.merge(melt, on=['Domain', 'info'], how='left')

        Domain             info       first_text
0   change_adj      she is neat      she is cool
1   change_pro      she is nice    they are nice
2  change_verb     she was cool      she is cool
3   change_adj      we are nice  we are friendly
4   change_pro  she is friendly  we are friendly
5  change_verb   they were nice    they are nice
6  change_verb      he was good       he is good

Answer 2

将pandas.DataFrame.query与itertuples结合使用的一种方式：

res = []
for x, y in df2.itertuples(False, None):
    res.append(df1.query("%s == '%s'" % (x, y))["first_text"].iloc[0])    
df2["first_text"] = res
print(df2)

输出：

        Domain             info       first_text
0   change_adj      she is neat      she is cool
1   change_pro      she is nice    they are nice
2  change_verb     she was cool      she is cool
3   change_adj      we are nice  we are friendly
4   change_pro  she is friendly  we are friendly
5  change_verb   they were nice    they are nice
6  change_verb      he was good       he is good

大熊猫：如果列名称与第二个数据框中的单元格值匹配，则将相应的值添加到第二个数据框中

2 个答案: