日期比较/连续日期分组

时间:2012-04-12 16:23:22

标签: python date compare iteration

我正在尝试编写一个识别日期组的函数,并测量组的大小。

该函数将采用按日期顺序排序的元素列表(元素是具有日期的CSV文件中的各行)。该列表可以是0到n个元素长。我希望按照输入的内容编写列表,并添加日期组的大小。

例如,列表

Bill 01/01/2011

Bill 02/01/2011

Bill 03/01/2011

Bill 05/01/2011

Bill 07/01/2011

应输出(理想情况下打印到文件中)

Bill 01/01/2011 3

Bill 02/01/2011  3

Bill 03/01/2011  3

Bill 05/01/2011  1

Bill 07/01/2011  1.

我有一个名为isBeside(string1, string2)的函数,它返回两者之间的差值。

到目前为止,我的尝试是这样的(一个迭代的混乱,我确定python可以比这更优雅)

注意coll[i][1]包含CSV行的日期元素。

def printSet(coll):
    setSize = len(coll)
    if setSize == 0:
    #dont need to do anything
elif setSize == 1:

    for i in coll:
        print i, 1

elif setSize > 1:

    printBuffer = [] ##new buffer list which will hold sequential dates, 
                        until a non-sequential one is found
    printBuffer.append(coll[0]) #add the first item
    print 'Adding ' + str(coll[0])

    for i in range(0, len(coll)-1):

        print 'Comparing ', coll[i][1], coll[i+1][1], isBeside(coll[i][1],  coll[i+1][1])

        if isBeside(coll[i][1],  coll[i+1][1]) == 1:
            printBuffer.append(coll[i+1])
            print 'Adding ' + str(coll[i+1])
        else:
            for j in printBuffer:
                print j, len(printBuffer)
            printBuffer = []
            printBuffer.append(coll[i])

return

2 个答案:

答案 0 :(得分:1)

这样的东西?

from datetime import date, timedelta

coll = [['Bill', date(2011,1,1)],
        ['Bill', date(2011,1,2)],
        ['Bill', date(2011,1,3)],
        ['Bill', date(2011,1,5)],
        ['Bill', date(2011,1,7)]]

res = []
group = [coll[0]]
i = 1

while i < len(coll):
    row = coll[i]
    last_in_group = group[-1]

    # use your isBeside() function here...
    if row[1] - last_in_group[1] == timedelta(days=1):
        # consecutive, append to current group..
        group.append(row)
    else:
        # not consecutive, start new group.
        res.append(group)
        group = [row]
    i += 1

res.append(group)

for group in res:
    for row in group:
        for item in row:
            print item,
        print len(group)

打印:

Bill 2011-01-01 3
Bill 2011-01-02 3
Bill 2011-01-03 3
Bill 2011-01-05 1
Bill 2011-01-07 1

答案 1 :(得分:0)

datetime模块非常适合处理日期,这比使用当前使用的字符串比较要清晰得多。

以下是一个例子:

from datetime import datetime

def add_month(dt):
    # Normally you would use timedelta, but timedelta doesn't work with months
    return dt.replace(year=dt.year + (dt.month==12), month=(dt.month%12) + 1)

data = ['Bill 01/01/2011', 'Bill 02/01/2011', 'Bill 03/01/2011', 'Bill 05/01/2011', 'Bill 07/01/2011']
dates = [datetime.strptime(line.split(' ')[1], '%m/%d/%Y') for line in data]
buffer = [data[0]]
for i, date in enumerate(dates[1:]):
    if add_month(dates[i]) == date:
        buffer.append(data[i+1])
    else:
        print '\n'.join(line + ' ' + str(len(buffer)) for line in buffer)
        buffer = [data[i+1]]

print '\n'.join(line + ' ' + str(len(buffer)) for line in buffer)

我假设您的日期采用month/day/year格式,如果它们实际上是day/month/year,那么您可以将from datetime import timedelta添加到顶部,更改{{1}中的格式转到datetime.strptime(),而不是'%d/%m/%y',请使用add_month(dates[i]) == date