如何在熊猫中组合并形成复杂的数据框架

时间:2015-03-10 15:27:12

标签: python python-2.7 pandas

我有一个名为df的数据框,格式如下:

       match_up     result
0   1985_1116_1234      1
1   1985_1120_1345      1
2   1985_1207_1250      1
3   1985_1229_1425      1

我有另一个名为df1

的数据框
  team       win percentage     sum_of_last_six  seed_frequency
0  1116           0.700                5               7
1  1234           0.667                3              10
2  1120           0.636                4               9
3  1207           0.615                2              11
4  1229           0.345                2               3
5  1345           0.621                5              11
6  1425           0.572                1               2
7  1250           0.968                4              12

我需要形成2个名为df2df3的新数据框,其中df2包含列{{1}的所有左侧值(1985_之后的成功)在数据帧matchup中。 df1116, 1120, 1207, 1229的值应位于df3列的右侧。

matchup

最后我需要一个新的数据框,它结合了三个数据框( team_df2 win_df2 sum_df2 seed_df2 0 1116 0.700 5 7 1 1120 0.636 4 9 2 1207 0.615 2 11 3 1229 0.345 2 3 team_df3 win_df3 sum_df3 seed_df3 1 1234 0.667 3 10 5 1345 0.621 5 11 7 1250 0.968 4 12 6 1425 0.572 1 2 dfdf2

我需要以下列格式组建一个名为df3的新数据框:

combi

我如何在熊猫中这样做?

1 个答案:

答案 0 :(得分:1)

您可以在' match_up'上调用矢量化str方法。用于拆分字符串的列,将这些映射到int并创建一个列表,以便我们可以过滤第二个df以创建df2和df3:

In [90]:

left = list(map(int,(df['match_up'].str.split('_').str[1])))
right = list(map(int,(df['match_up'].str.split('_').str[2])))
print(left)
right
[1116, 1120, 1207, 1229]
Out[90]:
[1234, 1345, 1250, 1425]
In [91]:

df2 = df1[df1.win.isin(left)]
df2
Out[91]:
   team   win  percentage  sum_of_last_six  seed_frequency
0     0  1116       0.700                5               7
2     2  1120       0.636                4               9
3     3  1207       0.615                2              11
4     4  1229       0.345                2               3
In [92]:

df3 = df1[df1.win.isin(right)]
df3
Out[92]:
   team   win  percentage  sum_of_last_six  seed_frequency
1     1  1234       0.667                3              10
5     5  1345       0.621                5              11
6     6  1425       0.572                1               2
7     7  1250       0.968                4              12

如果需要,您可以重命名调用rename的列。

使用重命名的列获取所需的合并输出:

In [95]:

df2 = df2.rename(columns={'team':'team_df2', 'win':'win_df2', 'sum_of_last_six':'sum_df2', 'seed_frequency':'seed_df2'})
df3 = df3.rename(columns={'team':'team_df3', 'win':'win_df3', 'sum_of_last_six':'sum_df3', 'seed_frequency':'seed_df3'})
In [101]:

pd.concat([df,df2,df3],axis=1)
Out[101]:
         match_up  result  team_df2  win_df2  percentage  sum_df2  seed_df2  \
0  1985_1116_1234       1         0     1116       0.700        5         7   
1  1985_1120_1345       1       NaN      NaN         NaN      NaN       NaN   
2  1985_1207_1250       1         2     1120       0.636        4         9   
3  1985_1229_1425       1         3     1207       0.615        2        11   
4             NaN     NaN         4     1229       0.345        2         3   
5             NaN     NaN       NaN      NaN         NaN      NaN       NaN   
6             NaN     NaN       NaN      NaN         NaN      NaN       NaN   
7             NaN     NaN       NaN      NaN         NaN      NaN       NaN   

   team_df3  win_df3  percentage  sum_df3  seed_df3  
0       NaN      NaN         NaN      NaN       NaN  
1         1     1234       0.667        3        10  
2       NaN      NaN         NaN      NaN       NaN  
3       NaN      NaN         NaN      NaN       NaN  
4       NaN      NaN         NaN      NaN       NaN  
5         5     1345       0.621        5        11  
6         6     1425       0.572        1         2  
7         7     1250       0.968        4        12