我正在尝试创建一个新列,我们称之为“HomeForm”,即“HomeTeam”列中每个条目的“FTHG”的最后5个值的总和。
对于团队0来说,我们的想法是在新列上填充单元格,其中包含与团队0对应的“FTHG”的最后5个值的总和。该表按日期排序。
如何在Python中完成?
HomeTeam FTHG HomeForm
Date
136 0 4
135 2 0
135 4 2
135 5 0
135 6 1
135 13 0
135 17 3
135 18 1
134 11 4
134 12 0
128 1 0
128 3 0
128 8 2
128 9 1
128 13 3
128 14 1
128 15 0
127 7 1
127 16 1
126 10 1
感谢。
答案 0 :(得分:2)
您groupby
HomeTeam
rolling
并在此处执行sum
def f(x):
return x.shift().rolling(window=5, min_periods=1).sum()
,总计至少1个句点,最多5个。
首先,定义一个函数 -
shift
此功能执行前5场比赛的滚动总和(因此dfGroupBy.transform
)。将此功能传递给df['HomeForm'] = df.groupby('HomeTeam', sort=False).FTHG.transform(f)
df
HomeTeam FTHG HomeForm
Date
136 0 4 NaN
135 2 0 NaN
135 4 2 NaN
135 5 0 NaN
135 6 1 NaN
135 13 0 NaN
135 17 3 NaN
135 18 1 NaN
134 11 4 NaN
134 12 0 NaN
128 1 0 NaN
128 3 0 NaN
128 8 2 NaN
128 9 1 NaN
128 13 3 0.0
128 14 1 NaN
128 15 0 NaN
127 7 1 NaN
127 16 1 NaN
126 10 1 NaN
-
df['HomeForm'] = df['HomeForm'].fillna(0).astype(int)
如果需要,用零填充NaN并转换为整数 -
CREATE TABLE Teachers (
BSN int primary key,
Surname varchar(40) NOT NULL,
Name varchar(40) NOT NULL
);
CREATE TABLE Courses (
CourseCode varchar(10) primary key,
Name varchar(20) NOT NULL
);
CREATE TABLE Assignments (
CourseCode varchar(10) REFERENCES Courses ON DELETE CASCADE,
AssCode varchar(10),
primary key(CourseCode,AssCode),
DependOn varchar(10),
Year date,
week int
);
CREATE TABLE WorkOn (
BSN int REFERENCES Teachers(BSN),
CourseCode varchar(10) REFERENCES Assignments(CourseCode),
AssCode varchar(10) REFERENCES Assignments(AssCode),
primary key (CourseCode,BSN,AssCode)
);