当我尝试将SAS代码转换为python时,我发现了问题。假设我有2个数据框,如下所示:
df = pd.DataFrame({"monthkey": [1, 2, 3, 4, 5]})
df2 = pd.DataFrame({"name": ['foo','foo','bar']})
我希望表格如下:
monthkey name
1 foo
2 foo
3 foo
4 foo
5 foo
1 bar
2 bar
3 bar
4 bar
5 bar
我在下面编写了SAS代码作为参考,但是如何使用python创建结果?
proc sql;
create table want as select a.*,b.*from
df as a left join df2 as b on a.monthkey;
quit;
对此有何建议?谢谢。
答案 0 :(得分:0)
您可以尝试以下
df.assign(foo=1).merge(df2.drop_duplicates().assign(foo=1) ).drop('foo', 1)
答案 1 :(得分:0)
您还可以尝试在pd.MultiIndex中使用from_product
:
pd.DataFrame(index = pd.MultiIndex.from_product([df2['name'].drop_duplicates(),df['monthkey']])).reset_index()
输出:
level_0 level_1
0 foo 1
1 foo 2
2 foo 3
3 foo 4
4 foo 5
5 bar 1
6 bar 2
7 bar 3
8 bar 4
9 bar 5
答案 2 :(得分:0)
df = pd.DataFrame({'monthkey': (list(range(1, 6)) * 2),
'name': ['foo' for i in range(1, 6)] +
['bar' for i in range(6, 11)]})
Using ranges to build one DataFrame made it more straight-forward. Python datastructure documentation contains information regarding this method.
output:
monthkey name
1 foo
2 foo
3 foo
4 foo
5 foo
1 bar
2 bar
3 bar
4 bar
5 bar