根据日期范围在df中创建新行

时间:2020-05-21 07:21:57

标签: python pandas

下面是有关df简化版本的脚本:

import pandas as pd
import numpy as np
from datetime import date
from datetime import datetime

df = pd.DataFrame({'date':pd.date_range(start='2020-01-01', end='2020/01/07'), 
                   'id' : range(1,8), 
                   'product': ['list_3','list_1','list_2', 'list_3','list_2','list_1','list_1'],
                  'duration' : [3,1,2,3,2,1,1],
                  'product_in_use': ('true'),
                  'end_date':['2020-01-03','2020-01-02','2020-01-04','2020-01-06','2020-01-06','2020-01-06',
                                '2020-01-07']})

df['duration']= pd.to_timedelta(df['duration'], unit='D')
df['date'] = pd.to_datetime(df['date'])
df['end_date'] = pd.to_datetime(df['end_date'])
df

df:

    date       id   product duration product_in_use end_date
0   2020-01-01  1   list_3  3 days     true        2020-01-03
1   2020-01-02  2   list_1  1 days     true        2020-01-02
2   2020-01-03  3   list_2  2 days     true        2020-01-04
3   2020-01-04  4   list_3  3 days     true        2020-01-06
4   2020-01-05  5   list_2  2 days     true        2020-01-06
5   2020-01-06  6   list_1  1 days     true        2020-01-06
6   2020-01-07  7   list_1  1 days     true        2020-01-07

如您在上面的df中所看到的,每个ID都在使用一种产品,并且每种产品的使用期限都是特定的。产品投入使用的日期没有行,用户购买产品的日期只有行。

因此,我想针对每个ID使用产品的所有日期创建新行。

因此,我想要的df是这样的:

    date       id   product  duration   product_in_use
0   2020-01-01  1   list_3   3 days         true
1   2020-01-02  1   list_3   3 days         true
2   2020-01-03  1   list_3   3 days         true
3   2020-01-02  2   list_1   1 days         true
4   2020-01-03  3   list_2   2 days         true
5   2020-01-04  3   list_2   2 days         true
6   2020-01-04  4   list_3   3 days         true
7   2020-01-05  4   list_3   3 days         true
8   2020-01-06  4   list_3   3 days         true
9   2020-01-05  5   list_2   3 days         true
10  2020-01-06  5   list_2   2 days         true
11  2020-01-06  6   list_1   2 days         true
12  2020-01-07  7   list_1   1 days         true

4 个答案:

答案 0 :(得分:2)

使用 starmap chain 为每个ID创建从开始日期到结束日期的日期范围,并通过时间,然后将新日期指定为数据框的索引。

from itertools import starmap,chain

#create date ranges from date to end_date for each id
start_end = zip(df.date.array,df.end_date.array)
date_ranges = starmap(pd.date_range,start_end)
date_ranges = chain.from_iterable(date_ranges)

#get all columns except date and end_date
res = df.filter(['id','product','duration','product_in_use'])

#expand the dataframe by repeating the indexes based on the duration
#so index 0 will be repeated 3 times, 1 once, 2 twice, ...
res = res.reindex(res.index.repeat(res.duration.dt.days))

#assign the new date_ranges to the dataframe
res.index = date_ranges
res

           id   product duration    product_in_use
2020-01-01  1   list_3    3 days    true
2020-01-02  1   list_3    3 days    true
2020-01-03  1   list_3    3 days    true
2020-01-02  2   list_1    1 days    true
2020-01-03  3   list_2    2 days    true
2020-01-04  3   list_2    2 days    true
2020-01-04  4   list_3    3 days    true
2020-01-05  4   list_3    3 days    true
2020-01-06  4   list_3    3 days    true
2020-01-05  5   list_2    2 days    true
2020-01-06  5   list_2    2 days    true
2020-01-06  6   list_1    1 days    true
2020-01-07  7   list_1    1 days    true

答案 1 :(得分:1)

创建另一个DataFrame,然后执行外部联接以添加新行。

答案 2 :(得分:1)

如果您不将“ duration”字段转换为timedelta,那么这对我有用:

df1 = pd.DataFrame()

for idx in df.index:
    print(idx, df['duration'][idx])
    for i in range(df['duration'][idx]):
        temp_df = df[idx:idx+1]
        temp_df['date'] = pd.to_datetime(temp_df['date']) + timedelta(days=i)
        df1 = df1.append(temp_df)

df1.reset_index(inplace=True)
df1.drop(['end_date', 'index'], axis=1, inplace=True)

print(df1)

输出:

         date  id product  duration product_in_use
0  2020-01-01   1  list_3         3           true
1  2020-01-02   1  list_3         3           true
2  2020-01-03   1  list_3         3           true
3  2020-01-02   2  list_1         1           true
4  2020-01-03   3  list_2         2           true
5  2020-01-04   3  list_2         2           true
6  2020-01-04   4  list_3         3           true
7  2020-01-05   4  list_3         3           true
8  2020-01-06   4  list_3         3           true
9  2020-01-05   5  list_2         2           true
10 2020-01-06   5  list_2         2           true
11 2020-01-06   6  list_1         1           true
12 2020-01-07   7  list_1         1           true

答案 3 :(得分:1)

java.lang.ClassNotFoundException: com.google.wireless.android.sdk.stats.IntellijIndexingStats$Index
    at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
    at com.intellij.util.indexing.counters.IndexCounters.<clinit>(IndexCounters.java:34)
    at com.intellij.util.indexing.impl.MapReduceIndex.<init>(MapReduceIndex.java:94)
    at com.intellij.util.indexing.impl.MapReduceIndex.<init>(MapReduceIndex.java:110)
    at org.jetbrains.jps.backwardRefs.index.CompilerReferenceIndex$CompilerMapReduceIndex.<init>(CompilerReferenceIndex.java:248)
    at org.jetbrains.jps.backwardRefs.index.CompilerReferenceIndex.<init>(CompilerReferenceIndex.java:84)
    at org.jetbrains.jps.backwardRefs.JavaCompilerBackwardReferenceIndex.<init>(JavaCompilerBackwardReferenceIndex.java:12)
    at org.jetbrains.jps.backwardRefs.JavaBackwardReferenceIndexWriter.initialize(JavaBackwardReferenceIndexWriter.java:80)
    at org.jetbrains.jps.incremental.java.JavaBuilder.buildStarted(JavaBuilder.java:149)
    at org.jetbrains.jps.incremental.IncProjectBuilder.runBuild(IncProjectBuilder.java:359)
    at org.jetbrains.jps.incremental.IncProjectBuilder.build(IncProjectBuilder.java:178)
    at org.jetbrains.jps.cmdline.BuildRunner.runBuild(BuildRunner.java:139)
    at org.jetbrains.jps.cmdline.BuildSession.runBuild(BuildSession.java:288)
    at org.jetbrains.jps.cmdline.BuildSession.run(BuildSession.java:121)
    at org.jetbrains.jps.cmdline.BuildMain$MyMessageHandler.lambda$channelRead0$0(BuildMain.java:228)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)