熊猫合并具有差异大小形状和列名称以及频率(没有重复项)的DF

时间:2019-01-17 15:50:27

标签: python pandas dataframe

合并具有2列共同索引的DF的熊猫(团队名称(32个团队),2018-2015年); DF1有9列年度NFL球队年度统计数据AVG,DF2拥有相同的指数球队(32)和年份(2018-2015),但对于17场比赛(或“周”)中的每一项都有11列差异统计:合并每个团队和年份,它会打印该团队和年份的年度平均1st(或Df1的9列),然后是DF2每17(ROWS)个游戏的11列统计数据(“周”) )的每个团队和年份的索引。

lvl0 = result.Tm_name.values
lvl1 = result.Year.values
newidx = pd.MultiIndex.from_arrays([lvl0, lvl1], names = ["Tm_name", "Year"])
result.set_index(newidx, inplace = True)
result.drop(["Year", "Tm_name"], axis = 1, inplace = True)
print(result)


                W     L   W_L_Pct   PD     MoV  SoS   SRS  OSRS  DSRS
Tm_name Year                                                         
1       2015  13.0   3.0   0.813   176.0  11.0  1.3  12.3   9.0   3.4
        2016   7.0   8.0   0.469    56.0   3.5 -1.9   1.6   2.4  -0.8
        2017   8.0   8.0   0.500   -66.0  -4.1  0.4  -3.7  -4.0   0.2
        2018   3.0  13.0   0.188  -200.0 -12.5  1.0 -11.5  -9.6  -1.9
2       2015   8.0   8.0   0.500    -6.0  -0.4 -3.4  -3.8  -4.0   0.3
        2016  11.0   5.0   0.688   134.0   8.4  0.1   8.5  10.5  -2.0
        2017  10.0   6.0   0.625    38.0   2.4  1.9   4.3   1.1   3.2
        2018   7.0   9.0   0.438    -9.0  -0.6  0.4  -0.1   2.5  -2.6
3       2015   5.0  11.0   0.313   -73.0  -4.6  2.6  -1.9  -0.7  -1.2
        2016   8.0   8.0   0.500    22.0   1.4  0.2   1.5  -1.1   2.6
        2017   9.0   7.0   0.563    92.0   5.8 -2.4   3.4   2.2   1.2
        2018  10.0   6.0   0.625   102.0   6.4  0.6   7.0   0.6   6.4
4       2015   8.0   8.0   0.500    20.0   1.3 -1.2   0.0   0.3  -0.2
        2016   7.0   9.0   0.438    21.0   1.3 -1.6  -0.3   1.8  -2.2
        2017   9.0   7.0   0.563   -57.0  -3.6 -0.5  -4.0  -3.0  -1.0
        2018   6.0  10.0   0.375  -105.0  -6.6 -0.3  -6.9  -6.3  -0.6
5       2015  15.0   1.0   0.938   192.0  12.0 -3.9   8.1   6.0   2.1
        2016   6.0  10.0   0.375   -33.0  -2.1  1.1  -1.0  -0.2  -0.8
        2017  11.0   5.0   0.688    36.0   2.3  2.1   4.3   1.7   2.7
        2018   7.0   9.0   0.438    -6.0  -0.4  1.3   0.9   0.1   0.8
6       2015   6.0  10.0   0.375   -62.0  -3.9  2.6  -1.3  -0.1  -1.2
        2016   3.0  13.0   0.188  -120.0  -7.5  0.0  -7.5  -5.2  -2.3
        2017   5.0  11.0   0.313   -56.0  -3.5  2.2  -1.3  -4.6   3.3
        2018  12.0   4.0   0.750   138.0   8.6 -2.3   6.3   1.5   4.8
7       2015  12.0   4.0   0.750   140.0   8.8  1.9  10.6   4.8   5.8


lvl_0 = result2.Tm_name.values
lvl_1 = result2.Year.values
newidx_2 = newidx = pd.MultiIndex.from_arrays([lvl_0, lvl_1], names=["Tm_name", "Year"])
result2.set_index(newidx, inplace=True)
result2.drop(["Year", "Tm_name"], axis=1, inplace=True)
print(result2)

             Week    Date     win_loss  home_away  Opp1_team  Tm_Pnts  \
Tm_name Year                                                            
1       2018   1  2018-09-09     0.0       1.0       32.0       6.0     
        2018   2  2018-09-16     0.0       0.0       18.0       0.0     
        2018   3  2018-09-23     0.0       1.0        6.0      14.0     
        2018   4  2018-09-30     0.0       1.0       28.0      17.0     
        2018   5  2018-10-07     1.0       0.0       29.0      28.0     
        2018   6  2018-10-14     0.0       0.0       20.0      17.0     
        2018   7  2018-10-18     0.0       1.0       10.0      10.0     
        2018   8  2018-10-28     1.0       1.0       29.0      18.0     
        2018  10  2018-11-11     0.0       0.0       16.0      14.0     
        2018  11  2018-11-18     0.0       1.0       25.0      21.0     
        2018  12  2018-11-25     0.0       0.0       17.0      10.0     
        2018  13  2018-12-02     1.0       0.0       12.0      20.0     
        2018  14  2018-12-09     0.0       1.0       11.0       3.0     
        2018  15  2018-12-16     0.0       0.0        2.0      14.0     
        2018  16  2018-12-23     0.0       1.0       18.0       9.0     
        2018  17  2018-12-30     0.0       0.0       28.0      24.0     
        2017   1  2017-09-10     0.0       0.0       11.0      23.0     
        2017   2  2017-09-17     1.0       0.0       14.0      16.0     
        2017   3  2017-09-25     0.0       1.0        9.0      17.0     
        2017   4  2017-10-01     1.0       1.0       29.0      18.0     
        2017   5  2017-10-08     0.0       0.0       26.0       7.0     
        2017   6  2017-10-15     1.0       1.0       30.0      38.0     
        2017   7  2017-10-22     0.0       0.0       18.0       0.0     
        2017   9  2017-11-05     1.0       0.0       29.0      20.0     
        2017  10  2017-11-09     0.0       1.0       28.0      16.0     
        2017  11  2017-11-19     0.0       0.0       13.0      21.0     
        2017  12  2017-11-26     1.0       1.0       15.0      27.0     
        2017  13  2017-12-03     0.0       1.0       18.0      16.0     
        2017  14  2017-12-10     1.0       1.0       31.0      12.0     
        2017  15  2017-12-17     0.0       0.0       32.0      15.0     
...           ...        ...       ...        ...        ...      ...   
        2016   5  2016-10-06     1.0       0.0       29.0      33.0     
        2016   6  2016-10-17     1.0       1.0       24.0      28.0     
        2016   7  2016-10-23     NaN       1.0       28.0       6.0     
        2016   8  2016-10-30     0.0       0.0        5.0      20.0     
        2016  10  2016-11-13     1.0       1.0       29.0      23.0     
        2016  11  2016-11-20     0.0       0.0       20.0      24.0     
        2016  12  2016-11-27     0.0       0.0        2.0      19.0     
        2016  13  2016-12-04     1.0       1.0       32.0      31.0     
        2016  14  2016-12-11     0.0       0.0       19.0      23.0     
        2016  15  2016-12-18     0.0       1.0       22.0      41.0     
        2016  16  2016-12-24     1.0       0.0       28.0      34.0     
        2016  17  2016-01-01     1.0       0.0       18.0      44.0     
        2015   1  2015-09-13     1.0       1.0       22.0      31.0     
        2015   2  2015-09-20     1.0       0.0        6.0      48.0     
        2015   3  2015-09-27     1.0       1.0       29.0      47.0     
        2015   4  2015-10-04     0.0       1.0        NaN      22.0     
        2015   5  2015-10-11     1.0       0.0       11.0      42.0     
        2015   6  2015-10-18     0.0       0.0       27.0      13.0     
        2015   7  2015-10-26     1.0       1.0        3.0      26.0     
        2015   8  2015-11-01     1.0       0.0        8.0      34.0     
        2015  10  2015-11-15     1.0       0.0       28.0      39.0     
        2015  11  2015-11-22     1.0       1.0        7.0      34.0     
        2015  12  2015-11-29     1.0       0.0       29.0      19.0     
        2015  13  2015-12-06     1.0       0.0        NaN      27.0     
        2015  14  2015-12-10     1.0       1.0       20.0      23.0     
        2015  15  2015-12-20     1.0       0.0       26.0      40.0     
        2015  16  2015-12-27     1.0       1.0       12.0      38.0     
        2015  17  2015-01-03     0.0       1.0       28.0       6.0     
2       2018   1  2018-09-06     0.0       0.0       26.0      12.0     
        2018   2  2018-09-16     1.0       1.0        5.0      31.0     

              Opp2_pnts  Off_1stD  Off_TotYd  Def_1stD_All  Def_TotYd_All  
Tm_name Year                                                               
1       2018    24.0       14.0      213.0        30.0          429.0      
        2018    34.0        5.0      137.0        24.0          432.0      
        2018    16.0       13.0      221.0        21.0          316.0      
        2018    20.0       18.0      263.0        19.0          331.0      
        2018    18.0       10.0      220.0        33.0          447.0      
        2018    27.0       16.0      268.0        20.0          411.0      
        2018    45.0       14.0      223.0        15.0          309.0      
        2018    15.0       20.0      321.0        16.0          267.0      
        2018    26.0       21.0      260.0        20.0          330.0      
        2018    23.0       13.0      282.0        19.0          325.0      
        2018    45.0       10.0      149.0        30.0          414.0      
        2018    17.0       18.0      315.0        22.0          325.0      
        2018    17.0       22.0      279.0        16.0          218.0      
        2018    40.0       18.0      253.0        23.0          435.0      
        2018    31.0       15.0      263.0        33.0          461.0      
        2018    27.0       12.0      198.0        16.0          291.0      
        2017    35.0       24.0      308.0        19.0          367.0      
        2017    13.0       17.0      389.0        18.0          266.0      
        2017    28.0       22.0      332.0        15.0          273.0      
        2017    15.0       25.0      368.0        20.0          305.0      
        2017    34.0       16.0      307.0        19.0          419.0      
        2017    33.0       23.0      432.0        21.0          412.0      
        2017    33.0       10.0      196.0        28.0          425.0      
        2017    10.0       20.0      368.0        17.0          329.0      
        2017    22.0       24.0      290.0        14.0          287.0      
        2017    31.0       17.0      292.0        22.0          357.0      
        2017    24.0       20.0      344.0        19.0          219.0      
        2017    32.0       19.0      305.0        18.0          303.0      
        2017     7.0       16.0      261.0        14.0          204.0      
        2017    20.0       19.0      286.0        14.0          218.0      
...                 ...       ...        ...           ...            ...  
        2016    21.0       17.0      288.0        25.0          286.0      
        2016     3.0       28.0      396.0        11.0          230.0      
        2016     6.0       23.0      443.0        11.0          257.0      
        2016    30.0       22.0      340.0        19.0          349.0      
        2016    20.0       26.0      443.0        15.0          281.0      
        2016    30.0       24.0      290.0        16.0          217.0      
        2016    38.0       23.0      332.0        28.0          360.0      
        2016    23.0       24.0      369.0        19.0          333.0      
        2016    26.0       21.0      300.0        15.0          314.0      
        2016    48.0       26.0      425.0        33.0          488.0      
        2016    31.0       21.0      370.0        24.0          391.0      
        2016     6.0       21.0      344.0         9.0          123.0      
        2015    19.0       25.0      427.0        18.0          408.0      
        2015    23.0       21.0      300.0        18.0          335.0      
        2015     7.0       28.0      446.0        10.0          156.0      
        2015    24.0       26.0      447.0        13.0          328.0      
        2015    17.0       15.0      345.0        29.0          435.0      
        2015    25.0       21.0      469.0        14.0          310.0      
        2015    18.0       21.0      414.0        18.0          276.0      
        2015    20.0       25.0      491.0        16.0          254.0      
        2015    32.0       30.0      451.0        18.0          343.0      
        2015    31.0       21.0      383.0        24.0          377.0      
        2015    13.0       26.0      337.0        17.0          368.0      
        2015     3.0       29.0      524.0         9.0          212.0      
        2015    20.0       22.0      393.0        23.0          389.0      
        2015    17.0       28.0      493.0        19.0          424.0      
        2015     8.0       19.0      381.0        16.0          178.0      
        2015    36.0       16.0      232.0        22.0          354.0      
2       2018    18.0       16.0      299.0        18.0          232.0      
        2018    24.0       23.0      442.0        27.0          439.0

-我不能只是合并它们,我尝试了许多不同的方法,但是它永远不会正确,因此我认为我可以进入列表并从数组中创建数据框。该数组看起来像我想要的,但是当我将其放入数据框时,它不会这样做{在下面检查}

app = []

for row in result.itertuples():
    app.append(row)
    for row_1 in result2.itertuples():
        if row[0] == row_1[0]:
            app.append(row_1)

9.0, Off_1stD=25.0, Off_TotYd=427.0, Def_1stD_All=18.0, Def_TotYd_All=408.0)
Pandas(Index=(1, 2015), Week='2', Date=Timestamp('2015-09-20 00:00:00'), win_loss=1.0, home_away=0.0, Opp1_team=6.0, Tm_Pnts=48.0, Opp2_pnts=23.0, Off_1stD=21.0, Off_TotYd=300.0, Def_1stD_All=18.0, Def_TotYd_All=335.0)
Pandas(Index=(1, 2015), Week='3', Date=Timestamp('2015-09-27 00:00:00'), win_loss=1.0, home_away=1.0, Opp1_team=29.0, Tm_Pnts=47.0, Opp2_pnts=7.0, Off_1stD=28.0, Off_TotYd=446.0, Def_1stD_All=10.0, Def_TotYd_All=156.0)
Pandas(Index=(1, 2015), Week='4', Date=Timestamp('2015-10-04 00:00:00'), win_loss=0.0, home_away=1.0, Opp1_team=nan, Tm_Pnts=22.0, Opp2_pnts=24.0, Off_1stD=26.0, Off_TotYd=447.0, Def_1stD_All=13.0, Def_TotYd_All=328.0)
Pandas(Index=(1, 2015), Week='5', Date=Timestamp('2015-10-11 00:00:00'), win_loss=1.0, home_away=0.0, Opp1_team=11.0, Tm_Pnts=42.0, Opp2_pnts=17.0, Off_1stD=15.0, Off_TotYd=345.0, Def_1stD_All=29.0, Def_TotYd_All=435.0)
Pandas(Index=(1, 2015), Week='6', Date=Timestamp('2015-10-18 00:00:00'), win_loss=0.0, home_away=0.0, Opp1_team=27.0, Tm_Pnts=13.0, Opp2_pnts=25.0, Off_1stD=21.0, Off_TotYd=469.0, Def_1stD_All=14.0, Def_TotYd_All=310.0)
Pandas(Index=(1, 2015), Week='7', Date=Timestamp('2015-10-26 00:00:00'), win_loss=1.0, home_away=1.0, Opp1_team=3.0, Tm_Pnts=26.0, Opp2_pnts=18.0, Off_1stD=21.0, Off_TotYd=414.0, Def_1stD_All=18.0, Def_TotYd_All=276.0)
Pandas(Index=(1, 2015), Week='8', Date=Timestamp('2015-11-01 00:00:00'), win_loss=1.0, home_away=0.0, Opp1_team=8.0, Tm_Pnts=34.0, Opp2_pnts=20.0, Off_1stD=25.0, Off_TotYd=491.0, Def_1stD_All=16.0, Def_TotYd_All=254.0)
Pandas(Index=(1, 2015), Week='10', Date=Timestamp('2015-11-15 00:00:00'), win_loss=1.0, home_away=0.0, Opp1_team=28.0, Tm_Pnts=39.0, Opp2_pnts=32.0, Off_1stD=30.0, Off_TotYd=451.0, Def_1stD_All=18.0, Def_TotYd_All=343.0)
Pandas(Index=(1, 2015), Week='11', Date=Timestamp('2015-11-22 00:00:00'), win_loss=1.0, home_away=1.0, Opp1_team=7.0, Tm_Pnts=34.0, Opp2_pnts=31.0, Off_1stD=21.0, Off_TotYd=383.0, Def_1stD_All=24.0, Def_TotYd_All=377.0)
Pandas(Index=(1, 2015), Week='12', Date=Timestamp('2015-11-29 00:00:00'), win_loss=1.0, home_away=0.0, Opp1_team=29.0, Tm_Pnts=19.0, Opp2_pnts=13.0, Off_1stD=26.0, Off_TotYd=337.0, Def_1stD_All=17.0, Def_TotYd_All=368.0)
Pandas(Index=(1, 2015), Week='13', Date=Timestamp('2015-12-06 00:00:00'), win_loss=1.0, home_away=0.0, Opp1_team=nan, Tm_Pnts=27.0, Opp2_pnts=3.0, Off_1stD=29.0, Off_TotYd=524.0, Def_1stD_All=9.0, Def_TotYd_All=212.0)
Pandas(Index=(1, 2015), Week='14', Date=Timestamp('2015-12-10 00:00:00'), win_loss=1.0, home_away=1.0, Opp1_team=20.0, Tm_Pnts=23.0, Opp2_pnts=20.0, Off_1stD=22.0, Off_TotYd=393.0, Def_1stD_All=23.0, Def_TotYd_All=389.0)
Pandas(Index=(1, 2015), Week='15', Date=Timestamp('2015-12-20 00:00:00'), win_loss=1.0, home_away=0.0, Opp1_team=26.0, Tm_Pnts=40.0, Opp2_pnts=17.0, Off_1stD=28.0, Off_TotYd=493.0, Def_1stD_All=19.0, Def_TotYd_All=424.0)
Pandas(Index=(1, 2015), Week='16', Date=Timestamp('2015-12-27 00:00:00'), win_loss=1.0, home_away=1.0, Opp1_team=12.0, Tm_Pnts=38.0, Opp2_pnts=8.0, Off_1stD=19.0, Off_TotYd=381.0, Def_1stD_All=16.0, Def_TotYd_All=178.0)
Pandas(Index=(1, 2015), Week='17', Date=Timestamp('2015-01-03 00:00:00'), win_loss=0.0, home_away=1.0, Opp1_team=28.0, Tm_Pnts=6.0, Opp2_pnts=36.0, Off_1stD=16.0, Off_TotYd=232.0, Def_1stD_All=22.0, Def_TotYd_All=354.0)
Pandas(Index=(1, 2016), W=7.0, L=8.0, W_L_Pct=0.469, PD=56.0, MoV=3.5, SoS=-1.9, SRS=1.6, OSRS=2.4, DSRS=-0.8)
Pandas(Index=(1, 2016), Week=1, Date=Timestamp('2016-09-11 00:00:00'), win_loss=0.0, home_away=1.0, Opp1_team=21.0, Tm_Pnts=21.0, Opp2_pnts=23.0, Off_1stD=21.0, Off_TotYd=344.0, Def_1stD_All=19.0, Def_TotYd_All=363.0)
Pandas(Index=(1, 2016), Week=2, Date=Timestamp('2016-09-18 00:00:00'), win_loss=1.0, home_away=1.0, Opp1_team=30.0, Tm_Pnts=40.0, Opp2_pnts=7.0, Off_1stD=20.0, Off_TotYd=416.0, Def_1stD_All=21.0, Def_TotYd_All=306.0)
Pandas(Index=(1, 2016), Week=3, Date=Timestamp('2016-09-25 00:00:00'), win_loss=0.0, home_away=0.0, Opp1_team=4.0, Tm_Pnts=18.0, Opp2_pnts=33.0, Off_1stD=25.0, Off_TotYd=348.0, Def_1stD_All=16.0, Def_TotYd_All=297.0)
Pandas(Index=(1, 2016), Week=4, Date=Timestamp('2016-10-02 00:00:00'), win_loss=0.0, home_away=1.0, Opp1_team=18.0, Tm_Pnts=13.0, Opp2_pnts=17.0, Off_1stD=26.0, Off_TotYd=420.0, Def_1stD_All=12.0, Def_TotYd_All=288.0)
Pandas(Index=(1, 2016), Week=5, Date=Timestamp('2016-10-06 00:00:00'), win_loss=1.0, home_away=0.0, Opp1_team=29.0, Tm_Pnts=33.0, Opp2_pnts=21.0, Off_1stD=17.0, Off_TotYd=288.0, Def_1stD_All=25.0, Def_TotYd_All=286.0)
Pandas(Index=(1, 2016), Week=6, Date=Timestamp('2016-10-17 00:00:00'), win_loss=1.0, home_away=1.0, Opp1_team=24.0, Tm_Pnts=28.0, Opp2_pnts=3.0, Off_1stD=28.0, Off_TotYd=396.0, Def_1stD_All=11.0, Def_TotYd_All=230.0)
Pandas(Index=(1, 2016), Week=7, Date=Timestamp('2016-10-23 00:00:00'), win_loss=nan, home_away=1.0, Opp1_team=28.0, Tm_Pnts=6.0, Opp2_pnts=6.0, Off_1stD=23.0, Off_TotYd=443.0, Def_1stD_All=11.0, Def_TotYd_All=257.0)
Pandas(Index=(1, 2016), Week=8, Date=Timestamp('2016-10-30 00:00:00'), win_loss=0.0, home_away=0.0, Opp1_team=5.0, Tm_Pnts=20.0, Opp2_pnts=30.0, Off_1stD=22.0, Off_TotYd=340.0, Def_1stD_All=19.0, Def_TotYd_All=349.0)
Pandas(Index=(1, 2016), Week=10, Date=Timestamp('2016-11-13 00:00:00'), win_loss=1.0, home_away=1.0, Opp1_team=29.0, Tm_Pnts=23.0, Opp2_pnts=20.0, Off_1stD=26.0, Off_TotYd=443.0, Def_1stD_All=15.0, Def_TotYd_All=281.0)
Pandas(Index=(1, 2016), Week=11, Date=Timestamp('2016-11-20 00:00:00'), win_loss=0.0, home_away=0.0, Opp1_team=20.0, Tm_Pnts=24.0, Opp2_pnts=30.0, Off_1stD=24.0, Off_TotYd=290.0, Def_1stD_All=16.0, Def_TotYd_All=217.0)
Pandas(Index=(1, 2016), Week=12, Date=Timestamp('2016-11-27 00:00:00'), win_loss=0.0, home_away=0.0, Opp1_team=2.0, Tm_Pnts=19.0, Opp2_pnts=38.0, Off_1stD=23.0, Off_TotYd=332.0, Def_1stD_All=28.0, Def_TotYd_All=360.0)
Pandas(Index=(1, 2016), Week=13, Date=Timestamp('2016-12-04 00:00:00'), win_loss=1.0, home_away=1.0, Opp1_team=32.0, Tm_Pnts=31.0, Opp2_pnts=23.0, Off_1stD=24.0, Off_TotYd=369.0, Def_1stD_All=19.0, Def_TotYd_All=333.0)
Pandas(Index=(1, 2016), Week=14, Date=Timestamp('2016-12-11 00:00:00'), win_loss=0.0, home_away=0.0, Opp1_team=19.0, Tm_Pnts=23.0, Opp2_pnts=26.0, Off_1stD=21.0, Off_TotYd=300.0, Def_1stD_All=15.0, Def_TotYd_All=314.0)
Pandas(Index=(1, 2016), Week=15, Date=Timestamp('2016-12-18 00:00:00'), win_loss=0.0, home_away=1.0, Opp1_team=22.0, Tm_Pnts=41.0, Opp2_pnts=48.0, Off_1stD=26.0, Off_TotYd=425.0, Def_1stD_All=33.0, Def_TotYd_All=488.0)
Pandas(Index=(1, 2016), Week=16, Date=Timestamp('2016-12-24 00:00:00'), win_loss=1.0, home_away=0.0, Opp1_team=28.0, Tm_Pnts=34.0, Opp2_pnts=31.0, Off_1stD=21.0, Off_TotYd=370.0, Def_1stD_All=24.0, Def_TotYd_All=391.0)
Pandas(Index=(1, 2016), Week=17, Date=Timestamp('2016-01-01 00:00:00'), win_loss=1.0, home_away=0.0, Opp1_team=18.0, Tm_Pnts=44.0, Opp2_pnts=6.0, Off_1stD=21.0, Off_TotYd=344.0, Def_1stD_All=9.0, Def_TotYd_All=123.0)....

这是我每年想要的设置,从连续一年的平均平均值开始,然后是当年每个游戏的17行统计数据。---现在,当我尝试放入DataFrame时

new = pd.DataFrame(app, index=["Tm_name", "Year"])
AssertionError: 10 cols passed, passed data had 12 cols

有人可以帮我iv玩了2个星期,尝试做多索引,差异合并,合并,只是看起来好像它看起来不像APP ARRAY并且没有重复... < / p>

再次感谢

0 个答案:

没有答案