我有两个DataFrame,一个包含感兴趣的主要数据,另一个包含我希望附加到前者的列的查找。例如:
df1
Name Date_1 Date_2
0 John 2019-11-13 2019-12-28
1 Amy 2019-11-13 2019-12-28
2 Sarah 2019-11-14 2019-12-29
3 Dennis 2019-11-14 2019-12-29
4 Austin 2019-11-15 2019-12-30
5 Jenn 2019-11-08 2019-12-23
df2
Var_1 Var_2
0 1x2 test_1
1 3x4 test_2
2 5x6 test_3
对于df2
中的每一行,我想将Var_1
和Var_2
附加到df1
并用该行中的所有值填充。对于df2
中的每个唯一行都将重复此操作,从而导致连接的数据帧如下:
df_final
Name Date_1 Date_2 Var_1 Var_2
0 John 2019-11-13 2019-12-28 1x2 test_1
1 Amy 2019-11-13 2019-12-28 1x2 test_1
2 Sarah 2019-11-14 2019-12-29 1x2 test_1
3 Dennis 2019-11-14 2019-12-29 1x2 test_1
4 Austin 2019-11-15 2019-12-30 1x2 test_1
5 Jenn 2019-11-08 2019-12-23 1x2 test_1
6 John 2019-11-13 2019-12-28 3x4 test_2
7 Amy 2019-11-13 2019-12-28 3x4 test_2
8 Sarah 2019-11-14 2019-12-29 3x4 test_2
9 Dennis 2019-11-14 2019-12-29 3x4 test_2
10 Austin 2019-11-15 2019-12-30 3x4 test_2
11 Jenn 2019-11-08 2019-12-23 3x4 test_2
12 John 2019-11-13 2019-12-28 5x6 test_3
13 Amy 2019-11-13 2019-12-28 5x6 test_3
14 Sarah 2019-11-14 2019-12-29 5x6 test_3
15 Dennis 2019-11-14 2019-12-29 5x6 test_3
16 Austin 2019-11-15 2019-12-30 5x6 test_3
17 Jenn 2019-11-08 2019-12-23 5x6 test_3
我最初的解决方案是遍历df2
中的每一行,将Var_1
和Var_2
列附加到df1
并带有该行的值。然后,我将连接结果数据框以创建df_final
。
虽然此解决方案有效,但数据帧最终将变得更大,所以我觉得确实存在更有效的解决方案。
答案 0 :(得分:1)
我会稍作更改。
尝试两种方法并确定差异的时间可能很有趣。
import pandas as pd
# step 1: create dataframe 1
df_1 = pd.DataFrame({
'Name': ['John', 'Amy', 'Sarah'],
'Date_1': ['2019-11-13', '2019-11-13', '2019-11-13'],
'Date_2': ['2019-12-28', '2019-12-28', '2019-12-28', ]
})
print('df_1: ')
print(df_1)
print()
# step 2: create dataframe 2
df_2 = pd.DataFrame({
'Var_1': ['1x2', '3x4', '5x6'],
'Var_2': ['test_1', 'test_2', 'test_3']
})
print('df_2: ')
print(df_2)
print()
# step 3: create empty master dataframe to store results
df_new = pd.DataFrame()
# loop through the columns in df_2
for each_var1, each_var2 in zip(df_2['Var_1'], df_2['Var_2']):
# create a copy of df_1
temp_df = df_1.copy()
# add 2 new columns to the dataframe with Var_1 and Var_2
temp_df['Var_1'] = each_var1
temp_df['Var_2'] = each_var2
# concatenate the temp dataframe to master
df_new = pd.concat([df_new, temp_df])
print('new master dataframe: ')
print(df_new)
print()