创建用于汇总另一列的最后x次出现的列

时间:2017-12-25 19:50:04

标签: python pandas pandas-groupby

我正在尝试创建一个新列,我们称之为“HomeForm”,即“HomeTeam”列中每个条目的“FTHG”的最后5个值的总和。

对于团队0来说,我们的想法是在新列上填充单元格,其中包含与团队0对应的“FTHG”的最后5个值的总和。该表按日期排序。

如何在Python中完成?

        HomeTeam  FTHG   HomeForm
Date                
136          0     4
135          2     0
135          4     2
135          5     0
135          6     1
135         13     0
135         17     3
135         18     1
134         11     4
134         12     0
128          1     0
128          3     0
128          8     2
128          9     1
128         13     3
128         14     1
128         15     0
127          7     1
127         16     1
126         10     1

感谢。

1 个答案:

答案 0 :(得分:2)

groupby HomeTeam rolling并在此处执行sum def f(x): return x.shift().rolling(window=5, min_periods=1).sum() ,总计至少1个句点,最多5个。

首先,定义一个函数 -

shift

此功能执行前5场比赛的滚动总和(因此dfGroupBy.transform)。将此功能传递给df['HomeForm'] = df.groupby('HomeTeam', sort=False).FTHG.transform(f) df HomeTeam FTHG HomeForm Date 136 0 4 NaN 135 2 0 NaN 135 4 2 NaN 135 5 0 NaN 135 6 1 NaN 135 13 0 NaN 135 17 3 NaN 135 18 1 NaN 134 11 4 NaN 134 12 0 NaN 128 1 0 NaN 128 3 0 NaN 128 8 2 NaN 128 9 1 NaN 128 13 3 0.0 128 14 1 NaN 128 15 0 NaN 127 7 1 NaN 127 16 1 NaN 126 10 1 NaN -

df['HomeForm'] = df['HomeForm'].fillna(0).astype(int)

如果需要,用零填充NaN并转换为整数 -

CREATE TABLE Teachers (
    BSN int primary key,
    Surname varchar(40) NOT NULL,
    Name varchar(40) NOT NULL   
);

CREATE TABLE Courses  (
    CourseCode varchar(10) primary key,
    Name varchar(20) NOT NULL
);

CREATE TABLE Assignments (
    CourseCode varchar(10) REFERENCES Courses ON DELETE CASCADE,
    AssCode varchar(10),
    primary key(CourseCode,AssCode),
    DependOn varchar(10),
    Year date,  
    week int
);

CREATE TABLE WorkOn ( 
    BSN int REFERENCES Teachers(BSN),
    CourseCode varchar(10)  REFERENCES Assignments(CourseCode),
    AssCode varchar(10) REFERENCES Assignments(AssCode),
    primary key (CourseCode,BSN,AssCode)
 );