Question

我有两个DataFrame，一个包含感兴趣的主要数据，另一个包含我希望附加到前者的列的查找。例如：

df1

    Name   Date_1       Date_2
0   John   2019-11-13   2019-12-28
1   Amy    2019-11-13   2019-12-28
2   Sarah  2019-11-14   2019-12-29
3   Dennis 2019-11-14   2019-12-29
4   Austin 2019-11-15   2019-12-30
5   Jenn   2019-11-08   2019-12-23

df2

    Var_1  Var_2
0   1x2    test_1
1   3x4    test_2   
2   5x6    test_3

对于df2中的每一行，我想将Var_1和Var_2附加到df1并用该行中的所有值填充。对于df2中的每个唯一行都将重复此操作，从而导致连接的数据帧如下：

df_final

    Name   Date_1       Date_2      Var_1  Var_2
0   John   2019-11-13   2019-12-28  1x2    test_1
1   Amy    2019-11-13   2019-12-28  1x2    test_1
2   Sarah  2019-11-14   2019-12-29  1x2    test_1
3   Dennis 2019-11-14   2019-12-29  1x2    test_1
4   Austin 2019-11-15   2019-12-30  1x2    test_1
5   Jenn   2019-11-08   2019-12-23  1x2    test_1
6   John   2019-11-13   2019-12-28  3x4    test_2
7   Amy    2019-11-13   2019-12-28  3x4    test_2
8   Sarah  2019-11-14   2019-12-29  3x4    test_2
9   Dennis 2019-11-14   2019-12-29  3x4    test_2
10  Austin 2019-11-15   2019-12-30  3x4    test_2
11  Jenn   2019-11-08   2019-12-23  3x4    test_2
12  John   2019-11-13   2019-12-28  5x6    test_3
13  Amy    2019-11-13   2019-12-28  5x6    test_3
14  Sarah  2019-11-14   2019-12-29  5x6    test_3
15  Dennis 2019-11-14   2019-12-29  5x6    test_3
16  Austin 2019-11-15   2019-12-30  5x6    test_3
17  Jenn   2019-11-08   2019-12-23  5x6    test_3

我最初的解决方案是遍历df2中的每一行，将Var_1和Var_2列附加到df1并带有该行的值。然后，我将连接结果数据框以创建df_final。

虽然此解决方案有效，但数据帧最终将变得更大，所以我觉得确实存在更有效的解决方案。

Answer 1

我会稍作更改。

我要遍历df1（较大数据框）的行，而不是遍历df2（较小数据框）的行。
我不考虑df2的行，而是将df2中的各个列压缩，并迭代这些列中的值。

尝试两种方法并确定差异的时间可能很有趣。

import pandas as pd

# step 1: create dataframe 1
df_1 = pd.DataFrame({
    'Name': ['John', 'Amy', 'Sarah'],
    'Date_1': ['2019-11-13', '2019-11-13', '2019-11-13'],
    'Date_2': ['2019-12-28', '2019-12-28', '2019-12-28', ]
})

print('df_1: ')
print(df_1)
print()

# step 2: create dataframe 2
df_2 = pd.DataFrame({
    'Var_1': ['1x2', '3x4', '5x6'],
    'Var_2': ['test_1', 'test_2', 'test_3']
})

print('df_2: ')
print(df_2)
print()

# step 3: create empty master dataframe to store results
df_new = pd.DataFrame()

# loop through the columns in df_2
for each_var1, each_var2 in zip(df_2['Var_1'], df_2['Var_2']):

    # create a copy of df_1
    temp_df = df_1.copy()

    # add 2 new columns to the dataframe with Var_1 and Var_2
    temp_df['Var_1'] = each_var1
    temp_df['Var_2'] = each_var2

    # concatenate the temp dataframe to master
    df_new = pd.concat([df_new, temp_df])

print('new master dataframe: ')
print(df_new)
print()

为另一个DataFrame中的每个唯一值复制整个DataFrame

1 个答案: