我有2个具有相同结构的输入数据框(df1
和df2
),我想创建第3个输入数据框(output_df
),并具有输入数据框的所有行组合
df1 = pd.DataFrame([["John","18","a"],["Jane","19","b"],["Jim","20","c"]],columns=['Name','Age','Function'])
df2 = pd.DataFrame([["Don","21","d"],["Diana","22","e"],["Dave","23","f"]],columns=['Name','Age','Function'])
output_df=pd.DataFrame([["John_Don","18_21","a_d"],
["John_Diana","18_22","a_e"],
["John_Dave","18_23","a_f"],
["Jane_Don","19_21","b_d"],
["Jane_Diana","19_22","b_e"],
["Jane_Dave","19_23","b_f"],
["Jim_Don","20_21","c_d"],
["Jim_Diana","20_22","c_e"],
["Jim_Dave","20_23","c_f"]],columns=['Name','Age','Function'])
新数据框将具有初始数据框相应列的总和(“ +”)。 (我知道字符串是串联的-如果输入是字符串,这就是我要追求的目标)
以下代码创建了output_df
,但它为空,并且代码花费的时间太长了。下面的示例代码仅作为2x10记录运行。最终,我将处理来自每个数据框的数千条记录。
Q1:填充输出数据框时我缺少什么?
Q2:如何提高代码效率?
output_df=pandas.DataFrame(columns=['Name','Age','Function'])
i=0
for lendf1 in range (10):
for lendf2 in range(10):
output_df=output_df.append(pandas.Series(),ignore_index=True)
i=i+1
for column in output_df:
output_df[column][i]=df1[column][lendf1:lendf1+1]+df2[column][lendf2:lendf2+1]
答案 0 :(得分:3)
我相信您正在寻找这个:
first = pd.Series(['a', 'b', 'c', 'd', 'e'])
second = pd.Series(['f', 'g', 'h', 'i', 'j'])
pd.DataFrame(np.add.outer(first, second))
输出:
0 1 2 3 4
0 af ag ah ai aj
1 bf bg bh bi bj
2 cf cg ch ci cj
3 df dg dh di dj
4 ef eg eh ei ej
请注意,输入应为pd.Series
类型,而不是数据帧。
答案 1 :(得分:1)
我认为您正在尝试同时连接数据框的两个列。请尝试以下代码为您工作。
import pandas as pd
df1 = pd.DataFrame([["John","18","a"],["Jane","19","b"],["Jim","20","c"]],columns=['Name','Age','Function'])
df2 = pd.DataFrame([["Don","21","d"],["Diana","22","e"],["Dave","23","f"]],columns=['Name','Age','Function'])
cols = list(df1)
out_list = []
for ind1, row1 in df1.iterrows():
for ind2, row2 in df2.iterrows():
in_list = []
for i in range(0, len(cols)):
in_list.append(row1[cols[i]] + '_' + row2[cols[i]])
out_list.append(in_list)
outdf = pd.DataFrame(out_list, columns=cols)
print outdf