如果计算开始日期和结束日期的数据框,如何计算一年中每一天的总占用天数?

时间:2017-09-18 14:53:56

标签: python pandas

我有一个csv文件,因此列表或数据框包含访问露营地的开始和结束日期。

import types from '../utilities/types.js'

export const actions = {
    loginDefault: (username, password) => ({
        type: types.LOGIN_DEFAULT,
        meta: {
            type: 'api',
            path: '/users/token',
            method: 'POST'
        },
        payload: {username, password}
    })
};

export default actions

我想计算一个时间段内每天有一行的数据框,其中一列计算累计访问者,一列表示当天居住的访客数量和累计访客天数。

我目前有一些hacky代码将访问者数据读入普通的python列表 start_date end_date 0 2016-01-21 2016-01-24 1 2016-01-28 2016-01-29 2 2016-02-02 2016-02-10 3 2016-02-08 2016-02-12 ... ,并为句点/年中的每个日期创建另一个列表visitor_array。然后,它会针对year_array中的每个日期进行循环,内部循环超过year_array,并将visitor_array的当前元素附加到当天新访问者数和常驻访问者数。

year_array

然后我将temp_day = datetime.date(2016,1,1) year_array = [[temp_day + datetime.timedelta(days=d)] for d in range(365)] for day in year_array: new_visitors = 0 occupancy = 0 for visitor in visitor_array: if visitor[0] = day: new_visitors +=1 if (visitor[0] <= day[0]) and (day[0] <= visitor[1]): occupancy +=1 day = day.append(new_visitors) day = day.append(occupancy) 转换为pandas数据框,创建一些cumsum列并忙于绘图等等。

在熊猫中有没有更优雅的pythonic / pandasic方式吗?

1 个答案:

答案 0 :(得分:0)

考虑df数据框的开始/结束值和d最终数据框,我会做出类似这样的事情:

代码:

import numpy as np
import pandas as pd
import datetime

# ---- Create df sample
df = pd.DataFrame([['21/01/2016','24/01/2016'],
                    ['28/01/2016','29/01/2016'],
                    ['02/02/2016','10/02/2016'],
                    ['08/02/2016','12/02/2016']], columns=['start','end'] )
df['start'] = pd.to_datetime(df['start'])
df['end'] = pd.to_datetime(df['end'])

# ---- Create day index
temp_day = datetime.date(2016,1,1)
index = [(temp_day + datetime.timedelta(days=d)) for d in range(365)]

# ---- Create empty result df
# initialize df, set days as datetime in index
d = pd.DataFrame(np.zeros((365,3)),
                 index=pd.to_datetime(index),
                 columns=['new_visitor','occupancy','occupied_day'])

# ---- Iterrate over df to fill d (final df)
for i, row in df.iterrows():
    # Add 1 if first day for new visitor
    d.loc[row.start,'new_visitor'] += 1
    # 1 if some visitor in df.start, df.end
    d.loc[row.start:row.end,'occupied_day'] = 1
    # Add 1 for visitor occupancy these days
    d.loc[row.start:row.end,'occupancy'] += 1

#cumulated days = some of occupied days
d['cumul_days'] = d.occupied_day.cumsum()
#cumulated visitors = some of occupancy
d['cumul_visitors'] = d.occupancy.cumsum()

结果输出print(d.loc['2016-01-21':'2016-01-29'])的一些摘录:

index         new_visitor  occupancy  occupied_day  cumul_days  cumul_visitors
2016-01-21          1.0        1.0           1.0         1.0             1.0
2016-01-22          0.0        1.0           0.0         1.0             2.0
2016-01-23          0.0        1.0           0.0         1.0             3.0
2016-01-24          0.0        1.0           0.0         1.0             4.0
2016-01-25          0.0        0.0           0.0         1.0             4.0
2016-01-26          0.0        0.0           0.0         1.0             4.0
2016-01-27          0.0        0.0           0.0         1.0             4.0
2016-01-28          1.0        1.0           1.0         2.0             5.0
2016-01-29          0.0        1.0           0.0         2.0             6.0

愿这段代码有用!