根据时差计算日期是否已过去,如果日期已过,则插入更新的日期

时间:2018-09-30 18:08:11

标签: python pandas datetime python-datetime

编辑:不可能解决,需要考虑更好的解决方法。

我正在抓取此网页(http://www.oddsportal.com/american-football/usa/nfl-2017-2018/results/#/page/6/),并试图将游戏日期(页面上的灰色部分)插入每个相应的游戏时间行中。

我正在努力实现这种逻辑。

此页面的抓取日期列表如下...

file_days=[['17 Sep 2017'],['15 Sep 2017'],['12 Sep 2017'], ['11 Sep 2017'],['10 Sep 2017'], ['08 Sep 2017'],['01 Sep 2017'],['31 Aug 2017'],
           ['28 Aug 2017'],['27 Aug 2017'],['26 Aug 2017'],['25 Aug 2017'],['24 Aug 2017']]

file_days=file_days[::-1]

我正在尝试将这些日期插入包含每个已抓取游戏开始时间的以下数据框中。

import pandas as pd
data = {'game_time': ['23:00','23:30','23:00','00:00','23:00','23:00','23:00','23:30','23:30','00:00','00:00','00:00','01:00','17:00','20:30','00:00','23:00','23:00','23:00','23:00',                 '23:00','23:30','23:30','23:30','00:00','00:00','00:00','00:00','00:30','01:00','02:00','02:00','00:30','17:00','17:00','17:00','17:00','17:00','17:00','17:00','17:00','20:05','20:25','20:25','00:30','23:10','02:20','00:25','17:00','17:00']}
df = pd.DataFrame.from_dict(data)

到目前为止,我有以下代码,但是我似乎无法弄清楚如果时间过去了新的一天,尝试插入新日期的逻辑。

df.game_time = pd.to_datetime(df.game_time)
df['game'] = df.game_time.dt.strftime('%H:%M')
df['previous_game'] = df.game_time.dt.strftime('%H:%M').shift(1)
df['previous_game'] = df['previous_game'].fillna(str('00:00'))

matchup_day = []

for a,b in zip(df['game'],df['previous_game']):
    if a >= b:
        matchup_day.append(file_days[0]) #if time of current game is greater than time of previous game than use the current date

    else:
        matchup_day.append(file_days[1]) #if time of current game is less than time of previous game, then use the next date and delete the most recently used date
        file_days.pop(0)  

输出如下...

 matchup_day
 [['24 Aug 2017'],
 ['24 Aug 2017'],
 ['25 Aug 2017'],
 ['26 Aug 2017'],
 ['26 Aug 2017'],
 ['26 Aug 2017'],
 ['26 Aug 2017'],
 ['26 Aug 2017'],
 ['26 Aug 2017'],
 ['27 Aug 2017'],
 ['27 Aug 2017'],
 ['27 Aug 2017'],
 ['27 Aug 2017'],
 ['27 Aug 2017'],
 ['27 Aug 2017'],
 ['28 Aug 2017'],
 ['28 Aug 2017'],
 ['28 Aug 2017'],
 ['28 Aug 2017'],
 ['28 Aug 2017'],
 ['28 Aug 2017'],
 ['28 Aug 2017'],
 ['28 Aug 2017'],
 ['28 Aug 2017'],
 ['31 Aug 2017'],
 ['31 Aug 2017'],
 ['31 Aug 2017'],
 ['31 Aug 2017'],
 ['31 Aug 2017'],
 ['31 Aug 2017'],
 ['31 Aug 2017'],
 ['31 Aug 2017'],
 ['01 Sep 2017'],
 ['01 Sep 2017'],
 ['01 Sep 2017'],
 ['01 Sep 2017'],
 ['01 Sep 2017'],
 ['01 Sep 2017'],
 ['01 Sep 2017'],
 ['01 Sep 2017'],
 ['01 Sep 2017'],
 ['01 Sep 2017'],
 ['01 Sep 2017'],
 ['01 Sep 2017'],
 ['08 Sep 2017'],
 ['08 Sep 2017'],
 ['10 Sep 2017'],
 ['11 Sep 2017'],
 ['11 Sep 2017'],
 ['11 Sep 2017']]

此输出明显不正确,因为它在数据帧的第15行或网站的8月28日跳闸。有人对如何改善这种逻辑有任何想法吗?

对于如何实现这一目标,我也持完全不同的想法。 预先感谢您,我对此深感困惑。

1 个答案:

答案 0 :(得分:1)

这里不需要手动循环。您可以将系列本身与移位版本进行比较,然后使用pd.Series.cumsum并通过字典进行映射。

这是一个演示:

from itertools import chain

file_days = [['17 Sep 2017'], ['15 Sep 2017'], ['12 Sep 2017'], ['11 Sep 2017'],
             ['10 Sep 2017'], ['08 Sep 2017'], ['01 Sep 2017'], ['31 Aug 2017'],
             ['28 Aug 2017'], ['27 Aug 2017'], ['26 Aug 2017'], ['25 Aug 2017'],
             ['24 Aug 2017']]

d = dict(enumerate(chain.from_iterable(file_days[::-1])))

df['date'] = (df['game'] < df['game'].shift()).cumsum().map(d)

结果:

print(df[['game', 'date']])

     game         date
0   23:00  24 Aug 2017
1   23:30  24 Aug 2017
2   23:00  25 Aug 2017
3   00:00  26 Aug 2017
4   23:00  26 Aug 2017
5   23:00  26 Aug 2017
6   23:00  26 Aug 2017
7   23:30  26 Aug 2017
8   23:30  26 Aug 2017
9   00:00  27 Aug 2017
10  00:00  27 Aug 2017
11  00:00  27 Aug 2017
12  01:00  27 Aug 2017
13  17:00  27 Aug 2017
14  20:30  27 Aug 2017
15  00:00  28 Aug 2017
16  23:00  28 Aug 2017
17  23:00  28 Aug 2017
18  23:00  28 Aug 2017
19  23:00  28 Aug 2017
20  23:00  28 Aug 2017
21  23:30  28 Aug 2017
22  23:30  28 Aug 2017
23  23:30  28 Aug 2017
24  00:00  31 Aug 2017
25  00:00  31 Aug 2017
26  00:00  31 Aug 2017
27  00:00  31 Aug 2017
28  00:30  31 Aug 2017
29  01:00  31 Aug 2017
30  02:00  31 Aug 2017
31  02:00  31 Aug 2017
32  00:30  01 Sep 2017
33  17:00  01 Sep 2017
34  17:00  01 Sep 2017
35  17:00  01 Sep 2017
36  17:00  01 Sep 2017
37  17:00  01 Sep 2017
38  17:00  01 Sep 2017
39  17:00  01 Sep 2017
40  17:00  01 Sep 2017
41  20:05  01 Sep 2017
42  20:25  01 Sep 2017
43  20:25  01 Sep 2017
44  00:30  08 Sep 2017
45  23:10  08 Sep 2017
46  02:20  10 Sep 2017
47  00:25  11 Sep 2017
48  17:00  11 Sep 2017
49  17:00  11 Sep 2017