我有一个名为df
的数据框,格式如下:
match_up result
0 1985_1116_1234 1
1 1985_1120_1345 1
2 1985_1207_1250 1
3 1985_1229_1425 1
我有另一个名为df1
team win percentage sum_of_last_six seed_frequency
0 1116 0.700 5 7
1 1234 0.667 3 10
2 1120 0.636 4 9
3 1207 0.615 2 11
4 1229 0.345 2 3
5 1345 0.621 5 11
6 1425 0.572 1 2
7 1250 0.968 4 12
我需要形成2个名为df2
和df3
的新数据框,其中df2
包含列{{1}的所有左侧值(1985_之后的成功)在数据帧matchup
中。 df
。 1116, 1120, 1207, 1229
的值应位于df3
列的右侧。
matchup
最后我需要一个新的数据框,它结合了三个数据框( team_df2 win_df2 sum_df2 seed_df2
0 1116 0.700 5 7
1 1120 0.636 4 9
2 1207 0.615 2 11
3 1229 0.345 2 3
team_df3 win_df3 sum_df3 seed_df3
1 1234 0.667 3 10
5 1345 0.621 5 11
7 1250 0.968 4 12
6 1425 0.572 1 2
,df
和df2
)
我需要以下列格式组建一个名为df3
的新数据框:
combi
我如何在熊猫中这样做?
答案 0 :(得分:1)
您可以在' match_up'上调用矢量化str
方法。用于拆分字符串的列,将这些映射到int并创建一个列表,以便我们可以过滤第二个df以创建df2和df3:
In [90]:
left = list(map(int,(df['match_up'].str.split('_').str[1])))
right = list(map(int,(df['match_up'].str.split('_').str[2])))
print(left)
right
[1116, 1120, 1207, 1229]
Out[90]:
[1234, 1345, 1250, 1425]
In [91]:
df2 = df1[df1.win.isin(left)]
df2
Out[91]:
team win percentage sum_of_last_six seed_frequency
0 0 1116 0.700 5 7
2 2 1120 0.636 4 9
3 3 1207 0.615 2 11
4 4 1229 0.345 2 3
In [92]:
df3 = df1[df1.win.isin(right)]
df3
Out[92]:
team win percentage sum_of_last_six seed_frequency
1 1 1234 0.667 3 10
5 5 1345 0.621 5 11
6 6 1425 0.572 1 2
7 7 1250 0.968 4 12
如果需要,您可以重命名调用rename
的列。
使用重命名的列获取所需的合并输出:
In [95]:
df2 = df2.rename(columns={'team':'team_df2', 'win':'win_df2', 'sum_of_last_six':'sum_df2', 'seed_frequency':'seed_df2'})
df3 = df3.rename(columns={'team':'team_df3', 'win':'win_df3', 'sum_of_last_six':'sum_df3', 'seed_frequency':'seed_df3'})
In [101]:
pd.concat([df,df2,df3],axis=1)
Out[101]:
match_up result team_df2 win_df2 percentage sum_df2 seed_df2 \
0 1985_1116_1234 1 0 1116 0.700 5 7
1 1985_1120_1345 1 NaN NaN NaN NaN NaN
2 1985_1207_1250 1 2 1120 0.636 4 9
3 1985_1229_1425 1 3 1207 0.615 2 11
4 NaN NaN 4 1229 0.345 2 3
5 NaN NaN NaN NaN NaN NaN NaN
6 NaN NaN NaN NaN NaN NaN NaN
7 NaN NaN NaN NaN NaN NaN NaN
team_df3 win_df3 percentage sum_df3 seed_df3
0 NaN NaN NaN NaN NaN
1 1 1234 0.667 3 10
2 NaN NaN NaN NaN NaN
3 NaN NaN NaN NaN NaN
4 NaN NaN NaN NaN NaN
5 5 1345 0.621 5 11
6 6 1425 0.572 1 2
7 7 1250 0.968 4 12