我有两个数据帧(左边的数据帧是1,另一个是2),我想根据以下条件获得最终的数据帧:
1。按Col1将Col2分组,例如A 566,788,888,999,1212
2。在数据帧2中,我可以找到(A组)788,888,999,1212,所以我想保留它,而566不在数据帧2中,请忽略它。
3个数据框1和2中所有成员的总和,例如A-788(2),A-888(3),A-999(4),A-1212(5),788-888 (12),999-1212(13),所以2 + 3 + 4 + 5 + 12 + 13 = 39
Col1 Col2 Total Col3 Col4 Total
A 566 1 788 888 12
A 788 2 999 1212 13
A 888 3 700 707 14
A 999 4 701 702 15
A 1212 5
B 700 6
B 701 7
B 702 8
B 703 9
B 704 10
B 705 11
预期结果
ResultCol1 ResultCol2 ResultTotal
A 788,888,999,1212 39
B 700,701,702 50
答案 0 :(得分:0)
这是您想要的吗?
using DataFrames
m1 = ["A" 566 1
"A" 788 2
"A" 888 3
"A" 999 4
"A" 1212 5
"B" 700 6
"B" 701 7
"B" 702 8
"B" 703 9
"B" 704 10
"B" 705 11]
m2 = [788 888 12
999 1212 13
700 707 14
701 702 15]
df1 = DataFrame(m1, [:Col1, :Col2, :Total])
df2 = DataFrame(m2, [:Col3, :Col4, :Total])
df1f = filter(x -> x.Col2 in df2.Col3 || x.Col2 in df2.Col4, df1)
df3 = by(df1f, :Col1) do x
DataFrame(Col2=Tuple(x.Col2), Total=sum(x.Total))
end
for r3 in eachrow(df3), r2 in eachrow(df2)
if any(in.([r2.Col3, r2.Col4], [r3.Col2]))
r3.Total += r2.Total
end
end
现在df3
保留了您要的数据。我尚未针对性能进行优化-执行速度是否与您相关?