numpy:摆脱循环

时间:2015-12-11 18:53:48

标签: python arrays numpy

我通过练习学习numpy。我遇到这个问题。我必须编写一个函数,它将np_array作为参数并返回一个新的np_array。 参数看起来像:

>> log
array([['2015-05-08T15:46:06+0200', '2015-05-08T17:21:36+0200'],
       ['2015-05-08T17:10:53+0200', '2015-05-09T06:30:08+0200'],
       ['2015-08-09T22:38:45+0200', '2015-08-09T22:38:45+0200'],
       ['2015-08-09T22:41:33+0200', '2015-08-10T08:39:26+0200'],
       ['2015-08-11T17:25:52+0200', '2015-08-12T08:14:30+0200'],
       ['2015-08-13T13:12:08+0200', '2015-08-13T19:42:50+0200'],
       ['2015-08-13T17:30:18+0200', '2015-08-14T10:13:10+0200'],
       ['2015-10-20T13:42:07+0200', '2015-10-20T16:13:37+0200'],
       ['2015-10-21T10:27:05+0200', '2015-10-21T16:13:11+0200'],
       ['2015-12-05T13:28:51+0100', '2015-12-05T22:43:20+0200']], dtype='datetime64[s]')

Log包含有关连接到服务器的信息。每行的第一个元素是登录日期,第二个元素是相应的注销日期。

新的np_array应该返回服务器连接的小时数,每周之间,第一个连接之前的星期一和连接之后的星期一。

>> func(log)
array([[time_connected_week1,
        time_connected_week2,
        time_connected_week3,

               ...
        time_connected_weekn]], dtype='timedelta64[s]'

week1(weekn)必须适合日志数组的第一个(最后一周)。

我写了以下代码:

def func(log):
    begin = np.datetime64("2015-05-04")        # first monday
    end = np.datetime64("2015-12-07")      # last monday

    week_td64 = np.timedelta64(1, 'W') 
    nbWeek_td64 = int((end - begin) / week_td64)

    week = begin + np.arange(nbWeek_td64) * week_td64    # arange(week1, weekn)

    weekHours = []       # list to store return values

    for w in week:    
        mask1 = log[:,0] > w
        mask2 = log[:,0] < w  + week_td64
        l = log[mask1 & mask2]     # get log row matching the current week 

        totalweek = (l[:,1] - l[:,0]).sum()    #compute sum of result

        weekHours.append(totalweek)     #save value

    return np.array(weekHours)

关于我的代码我有两个问题:
1 /我如何自动找到第一个星期一? np.datetime64不支持weekday()。我是否必须使用datetime.datetime?
2 /我如何摆脱循环?我曾经说过numpy很多关于摆脱循环。我相信我们可以用花式切片来做到这一点。

2 个答案:

答案 0 :(得分:1)

关于自动获得第一个星期一的第一个问题,你可以使用busday_offset这样做来定义一个工作日掩码,只考虑星期一作为工作日:

firstDay = np.min(log[:, 0])
firstMonday = first_monday(firstDay)

def first_monday(firstDay):
    firstEntry = firstDay.astype('M8[D]')
    beforeMonday = np.busday_offset(firstEntry, -1, 'forward', [1,0,0,0,0,0,0])
    if firstEntry - beforeMonday == np.timedelta64(7, 'D'):
        return firstEntry
    else:
        return beforeMonday

提示:您可以通过np.tile()删除循环日志和np.repeat()一周。

最终答案:除非你放弃,否则不要阅读。

首先定义一个GetMonday函数:

def GetMonday(firstDay, forward=False):
    firstEntry = firstDay.astype('M8[D]')
    beforeMonday = np.busday_offset(firstEntry, forward*2-1, 'forward', [1,0,0,0,0,0,0])
    if abs(firstEntry-beforeMonday) == np.timedelta64(7, 'D'):
        return firstEntry.astype('M8[s]')
    else:
        return beforeMonday.astype('M8[s]')

然后你可以编码:

log = np.array([['2015-05-08T15:46:06+0200', '2015-05-08T17:21:36+0200'],
   ['2015-05-08T17:10:53+0200', '2015-05-09T06:30:08+0200'],
   ['2015-08-09T22:38:45+0200', '2015-08-09T22:38:45+0200'],
   ['2015-08-09T22:41:33+0200', '2015-08-10T08:39:26+0200'],
   ['2015-08-11T17:25:52+0200', '2015-08-12T08:14:30+0200'],
   ['2015-08-13T13:12:08+0200', '2015-08-13T19:42:50+0200'],
   ['2015-08-13T17:30:18+0200', '2015-08-14T10:13:10+0200'],
   ['2015-10-20T13:42:07+0200', '2015-10-20T16:13:37+0200'],
   ['2015-10-21T10:27:05+0200', '2015-10-21T16:13:11+0200'],
   ['2015-12-05T13:28:51+0100', '2015-12-05T22:43:20+0200']], dtype='datetime64[s]')

login = log[:,0]
logoff = log[:,1]
begin = GetMonday(np.min(login))
end = GetMonday(np.max(logoff), True)

n_logs = log.shape[0]*1.0
week_td64 = np.timedelta64(1, 'W')
nbWeek_td64 = int((end - begin) / week_td64)

week = begin + np.arange(nbWeek_td64) * week_td64

tiledLogin = np.tile(login, nbWeek_td64)
repeatedWeek = np.repeat(week, n_logs)
repeatedWeek_order = np.repeat(np.arange(nbWeek_td64), n_logs)

loginWeekMask = (tiledLogin >= repeatedWeek) & (tiledLogin < repeatedWeek+np.timedelta64(1,'W'))

hours_spent = (logoff-login).astype('timedelta64[h]')
weeks_entry = repeatedWeek_order[loginWeekMask]

print np.bincount(weeks_entry.astype('int64'), hours_spent.astype('float64'))
#[ 14.   0.   0.   0.   0.   0.   0.   0.   0.   0.   0.   0.   0.   9.  36.
#   0.   0.   0.   0.   0.   0.   0.   0.   0.   7.   0.   0.   0.   0.   0.
#   8.]

这将为您提供一周一小时的数组。这不是正确的最终答案,因为您可能会在一周以上的时间内进行注销登录,但我会留下让您想出办法。

答案 1 :(得分:0)

抱歉,我错过了。实际上,如果没有np.tile和np.repeat,有一种更简单的方法可以知道日志条目属于哪一周。

你唯一需要做的就是从星期一开始计算timedelta64,然后你将拥有它所属的那一周:

log = np.array([['2015-05-08T15:46:06+0200', '2015-05-08T17:21:36+0200'],
   ['2015-05-08T17:10:53+0200', '2015-05-09T06:30:08+0200'],
   ['2015-08-09T22:38:45+0200', '2015-08-09T22:38:45+0200'],
   ['2015-08-09T22:41:33+0200', '2015-08-10T08:39:26+0200'],
   ['2015-08-11T17:25:52+0200', '2015-08-12T08:14:30+0200'],
   ['2015-08-13T13:12:08+0200', '2015-08-13T19:42:50+0200'],
   ['2015-08-13T17:30:18+0200', '2015-08-14T10:13:10+0200'],
   ['2015-10-20T13:42:07+0200', '2015-10-20T16:13:37+0200'],
   ['2015-10-21T10:27:05+0200', '2015-10-21T16:13:11+0200'],
   ['2015-12-05T13:28:51+0100', '2015-12-05T22:43:20+0200']], dtype='datetime64[s]')

login = log[:,0]
logoff = log[:,1]
begin = GetMonday(np.min(login))
end = GetMonday(np.max(logoff), True)

n_logs = log.shape[0]*1.0
week_td64 = np.timedelta64(1, 'W')

weeks_entry = np.floor((login-begin)/week_td64)
hours_spent = (logoff-login).astype('timedelta64[h]')

print np.bincount(weeks_entry.astype('int64'), hours_spent.astype('float64'))
#[ 14.   0.   0.   0.   0.   0.   0.   0.   0.   0.   0.   0.   0.   9.  36.
#   0.   0.   0.   0.   0.   0.   0.   0.   0.   7.   0.   0.   0.   0.   0.
#   8.]