我有一个"开始日期"和"结束日期"。在任何给定的时间段内,我的目标是找到我有多少客户。如果客户的开始日期在x之前且结束日期在x之后,则客户处于活动状态。我写了一个蛮力版本:
from datetime import datetime
import pandas as pd
#dates of interest
dates = ['2016-01-31','2016-02-29','2016-03-31','2016-04-30','2016-05-31']
dates = [datetime.strptime(x, '%Y-%m-%d') for x in dates]
#sample records
df = pd.DataFrame( [['A','2016-01-01','2016-04-23'],['B','2016-02-05','2016-04-30'],['C','2016-02-02','2016-05-25']],columns = ['customerId','startDate','endDate'])
df['startDate'] = pd.to_datetime(df['startDate'])
df['endDate'] = pd.to_datetime(df['endDate'])
output = []
#is there a better way to do this?
for currDate in dates:
record_count = len(df[(df['startDate']<= currDate) & (df['endDate']>= currDate)])
output.append([currDate,record_count])
output = pd.DataFrame(output, columns = ['date','active count'])
有没有更好的方法来查找每个感兴趣的日期之间有多少客户活跃?现在我只是遍历所有的日期,但这并不是非常感觉到#pythonic&#34;对我来说。
任何想法或帮助都将不胜感激!
答案 0 :(得分:1)
一种方法是:
ERROR: column "nil" does not exist