我的数据框有13列,900行。 我将其中一列设置为索引,这对于多个事件是相同的。 我想要做的是在现有第一行上方添加两个新行,包括该索引的13列,并复制当前列中的所有值。
我如何添加它。
我想在每个 gsm_id
的最后一行之后添加两个新行在下图中,我想在第一行之前和最后一行之后添加一个新行。 gsm_id 设置为索引,我将在每个 gsm_id 之前和之后添加新行
答案 0 :(得分:1)
使用:
#create new column for last sorting
df['sort'] = df.groupby('gsm_id').cumcount() + 2
#get first 2 rows per each group
df1 = df.groupby('gsm_id').head(2).copy()
#modify values
df1[['PreviousEventTime','Goal_Flag','Union_level']] = np.nan
df1[['Run_score','Run_sum']] = 0
df1['Match_sta'] = 'Started'
#subtract for 0,1 values - first rows per groups
df1['sort'] -= 2
#print (df1)
#get last 2 rows per groups
df2 = df.groupby('gsm_id').tail(2).copy()
#change datetimes
df2['eventdatetime'] = df2['matchdatetime'] + pd.Timedelta(90, unit='m')
#add 2 for last 2 rows
df2['sort'] += 2
#print (df2)
#join all together and sort for correct ordering
df = (pd.concat([df1, df2, df])
.sort_values(['gsm_id','sort'])
.reset_index(drop=True)
.drop('sort', axis=1))
print (df)
print (df)
gsm_id comp ht at team matchdatetime \
0 2462794 EngPr Arsenal Leicester A 2017-08-11 18:45:00
1 2462794 EngPr Arsenal Leicester L 2017-08-11 18:45:00
2 2462794 EngPr Arsenal Leicester A 2017-08-11 18:45:00
3 2462794 EngPr Arsenal Leicester L 2017-08-11 18:45:00
4 2462794 EngPr Arsenal Leicester A 2017-08-11 18:45:00
5 2462794 EngPr Arsenal Leicester L 2017-08-11 18:45:00
6 2462794 EngPr Arsenal Leicester A 2017-08-11 18:45:00
7 2462794 EngPr Arsenal Leicester L 2017-08-11 18:45:00
8 2462794 EngPr Arsenal Leicester A 2017-08-11 18:45:00
9 2462795 EngPr1 Arsenal Leicester A 2017-08-11 18:45:00
10 2462795 EngPr1 Arsenal Leicester L 2017-08-11 18:45:00
11 2462795 EngPr1 Arsenal Leicester A 2017-08-11 18:45:00
12 2462795 EngPr1 Arsenal Leicester L 2017-08-11 18:45:00
13 2462795 EngP1r Arsenal Leicester A 2017-08-11 18:45:00
14 2462795 EngP1r Arsenal Leicester L 2017-08-11 18:45:00
15 2462795 EngPr1 Arsenal Leicester A 2017-08-11 18:45:00
16 2462795 EngP1r Arsenal Leicester L 2017-08-11 18:45:00
17 2462795 EngPr1 Arsenal Leicester A 2017-08-11 18:45:00
eventdatetime PreviousEventTime Goal_Flag Union_level Team_SR \
0 2017-08-11 18:46:00 NaT NaN NaN A
1 2017-08-11 18:49:00 NaT NaN NaN L
2 2017-08-11 18:46:00 2017-08-11 18:45:00 First Goal Scored A
3 2017-08-11 18:49:00 2017-08-11 18:46:00 First Goal Conceded L
4 2017-08-11 19:13:00 2017-08-11 18:49:00 Other Goal Scored A
5 2017-08-11 19:31:00 2017-08-11 19:13:00 Last Goal Scored A
6 2017-08-11 19:40:00 2017-08-11 19:31:00 Last Goal Conceded L
7 2017-08-11 20:15:00 2017-08-11 19:13:00 Last Goal Scored A
8 2017-08-11 20:15:00 2017-08-11 19:31:00 Last Goal Conceded L
9 2017-08-11 18:46:00 NaT NaN NaN A
10 2017-08-11 18:49:00 NaT NaN NaN L
11 2017-08-11 18:46:00 2017-08-11 18:45:00 First Goal Scored A
12 2017-08-11 18:49:00 2017-08-11 18:46:00 First Goal Conceded L
13 2017-08-11 19:13:00 2017-08-11 18:49:00 Other Goal Scored A
14 2017-08-11 19:31:00 2017-08-11 19:13:00 Last Goal Scored A
15 2017-08-11 19:40:00 2017-08-11 19:31:00 Last Goal Conceded L
16 2017-08-11 20:15:00 2017-08-11 19:13:00 Last Goal Scored A
17 2017-08-11 20:15:00 2017-08-11 19:31:00 Last Goal Conceded L
Run_score Run_sum Match_sta
0 0 0 Started
1 0 0 Started
2 1 1 Winning
3 -1 -1 Losing
4 1 1 Winning
5 1 1 Winning
6 -1 -1 Losing
7 1 1 Winning
8 -1 -1 Losing
9 0 0 Started
10 0 0 Started
11 1 1 Winning
12 -1 -1 Losing
13 1 1 Winning
14 1 1 Winning
15 -1 -1 Losing
16 1 1 Winning
17 -1 -1 Losing
示例数据:
c = ['gsm_id', 'comp', 'ht', 'at', 'team', 'matchdatetime','eventdatetime', 'PreviousEventTime', 'Goal_Flag', 'Union_level', 'Team_SR', 'Run_score', 'Run_sum', 'Match_sta']
df = pd.DataFrame({'Team_SR': ['A', 'L', 'A', 'A', 'L', 'A', 'L', 'A', 'A', 'L'],
'team': ['A', 'L', 'A', 'L', 'A', 'A', 'L', 'A', 'L', 'A'],
'matchdatetime': [pd.Timestamp('2017-08-11 18:45:00'), pd.Timestamp('2017-08-11 18:45:00'), pd.Timestamp('2017-08-11 18:45:00'), pd.Timestamp('2017-08-11 18:45:00'), pd.Timestamp('2017-08-11 18:45:00'), pd.Timestamp('2017-08-11 18:45:00'), pd.Timestamp('2017-08-11 18:45:00'), pd.Timestamp('2017-08-11 18:45:00'), pd.Timestamp('2017-08-11 18:45:00'), pd.Timestamp('2017-08-11 18:45:00')],
'at': ['Leicester', 'Leicester', 'Leicester', 'Leicester', 'Leicester', 'Leicester', 'Leicester', 'Leicester', 'Leicester', 'Leicester'],
'Union_level': ['Scored', 'Conceded', 'Scored', 'Scored', 'Conceded', 'Scored', 'Conceded', 'Scored', 'Scored', 'Conceded'],
'Run_score': [1, -1, 1, 1, -1, 1, -1, 1, 1, -1],
'eventdatetime': [pd.Timestamp('2017-08-11 18:46:00'), pd.Timestamp('2017-08-11 18:49:00'), pd.Timestamp('2017-08-11 19:13:00'), pd.Timestamp('2017-08-11 19:31:00'), pd.Timestamp('2017-08-11 19:40:00'), pd.Timestamp('2017-08-11 18:46:00'), pd.Timestamp('2017-08-11 18:49:00'), pd.Timestamp('2017-08-11 19:13:00'), pd.Timestamp('2017-08-11 19:31:00'), pd.Timestamp('2017-08-11 19:40:00')],
'ht': ['Arsenal', 'Arsenal', 'Arsenal', 'Arsenal', 'Arsenal', 'Arsenal', 'Arsenal', 'Arsenal', 'Arsenal', 'Arsenal'],
'Match_sta': ['Winning', 'Losing', 'Winning', 'Winning', 'Losing', 'Winning', 'Losing', 'Winning', 'Winning', 'Losing'],
'gsm_id': [2462794, 2462794, 2462794, 2462794, 2462794, 2462795, 2462795, 2462795, 2462795, 2462795],
'Goal_Flag': ['First Goal', 'First Goal', 'Other Goal', 'Last Goal', 'Last Goal', 'First Goal', 'First Goal', 'Other Goal', 'Last Goal', 'Last Goal'], 'Run_sum': [1, -1, 1, 1, -1, 1, -1, 1, 1, -1],
'comp': ['EngPr', 'EngPr', 'EngPr', 'EngPr', 'EngPr', 'EngPr1', 'EngPr1', 'EngP1r', 'EngP1r', 'EngPr1'], 'PreviousEventTime': [pd.Timestamp('2017-08-11 18:45:00'), pd.Timestamp('2017-08-11 18:46:00'), pd.Timestamp('2017-08-11 18:49:00'), pd.Timestamp('2017-08-11 19:13:00'), pd.Timestamp('2017-08-11 19:31:00'), pd.Timestamp('2017-08-11 18:45:00'), pd.Timestamp('2017-08-11 18:46:00'), pd.Timestamp('2017-08-11 18:49:00'), pd.Timestamp('2017-08-11 19:13:00'), pd.Timestamp('2017-08-11 19:31:00')]}, columns=c)