下面是有关df简化版本的脚本:
import pandas as pd
import numpy as np
from datetime import date
from datetime import datetime
df = pd.DataFrame({'date':pd.date_range(start='2020-01-01', end='2020/01/07'),
'id' : range(1,8),
'product': ['list_3','list_1','list_2', 'list_3','list_2','list_1','list_1'],
'duration' : [3,1,2,3,2,1,1],
'product_in_use': ('true'),
'end_date':['2020-01-03','2020-01-02','2020-01-04','2020-01-06','2020-01-06','2020-01-06',
'2020-01-07']})
df['duration']= pd.to_timedelta(df['duration'], unit='D')
df['date'] = pd.to_datetime(df['date'])
df['end_date'] = pd.to_datetime(df['end_date'])
df
df:
date id product duration product_in_use end_date
0 2020-01-01 1 list_3 3 days true 2020-01-03
1 2020-01-02 2 list_1 1 days true 2020-01-02
2 2020-01-03 3 list_2 2 days true 2020-01-04
3 2020-01-04 4 list_3 3 days true 2020-01-06
4 2020-01-05 5 list_2 2 days true 2020-01-06
5 2020-01-06 6 list_1 1 days true 2020-01-06
6 2020-01-07 7 list_1 1 days true 2020-01-07
如您在上面的df中所看到的,每个ID都在使用一种产品,并且每种产品的使用期限都是特定的。产品投入使用的日期没有行,用户购买产品的日期只有行。
因此,我想针对每个ID使用产品的所有日期创建新行。
因此,我想要的df是这样的:
date id product duration product_in_use
0 2020-01-01 1 list_3 3 days true
1 2020-01-02 1 list_3 3 days true
2 2020-01-03 1 list_3 3 days true
3 2020-01-02 2 list_1 1 days true
4 2020-01-03 3 list_2 2 days true
5 2020-01-04 3 list_2 2 days true
6 2020-01-04 4 list_3 3 days true
7 2020-01-05 4 list_3 3 days true
8 2020-01-06 4 list_3 3 days true
9 2020-01-05 5 list_2 3 days true
10 2020-01-06 5 list_2 2 days true
11 2020-01-06 6 list_1 2 days true
12 2020-01-07 7 list_1 1 days true
答案 0 :(得分:2)
使用 starmap 和 chain 为每个ID创建从开始日期到结束日期的日期范围,并通过时间,然后将新日期指定为数据框的索引。
from itertools import starmap,chain
#create date ranges from date to end_date for each id
start_end = zip(df.date.array,df.end_date.array)
date_ranges = starmap(pd.date_range,start_end)
date_ranges = chain.from_iterable(date_ranges)
#get all columns except date and end_date
res = df.filter(['id','product','duration','product_in_use'])
#expand the dataframe by repeating the indexes based on the duration
#so index 0 will be repeated 3 times, 1 once, 2 twice, ...
res = res.reindex(res.index.repeat(res.duration.dt.days))
#assign the new date_ranges to the dataframe
res.index = date_ranges
res
id product duration product_in_use
2020-01-01 1 list_3 3 days true
2020-01-02 1 list_3 3 days true
2020-01-03 1 list_3 3 days true
2020-01-02 2 list_1 1 days true
2020-01-03 3 list_2 2 days true
2020-01-04 3 list_2 2 days true
2020-01-04 4 list_3 3 days true
2020-01-05 4 list_3 3 days true
2020-01-06 4 list_3 3 days true
2020-01-05 5 list_2 2 days true
2020-01-06 5 list_2 2 days true
2020-01-06 6 list_1 1 days true
2020-01-07 7 list_1 1 days true
答案 1 :(得分:1)
创建另一个DataFrame
,然后执行外部联接以添加新行。
答案 2 :(得分:1)
如果您不将“ duration”字段转换为timedelta,那么这对我有用:
df1 = pd.DataFrame()
for idx in df.index:
print(idx, df['duration'][idx])
for i in range(df['duration'][idx]):
temp_df = df[idx:idx+1]
temp_df['date'] = pd.to_datetime(temp_df['date']) + timedelta(days=i)
df1 = df1.append(temp_df)
df1.reset_index(inplace=True)
df1.drop(['end_date', 'index'], axis=1, inplace=True)
print(df1)
输出:
date id product duration product_in_use
0 2020-01-01 1 list_3 3 true
1 2020-01-02 1 list_3 3 true
2 2020-01-03 1 list_3 3 true
3 2020-01-02 2 list_1 1 true
4 2020-01-03 3 list_2 2 true
5 2020-01-04 3 list_2 2 true
6 2020-01-04 4 list_3 3 true
7 2020-01-05 4 list_3 3 true
8 2020-01-06 4 list_3 3 true
9 2020-01-05 5 list_2 2 true
10 2020-01-06 5 list_2 2 true
11 2020-01-06 6 list_1 1 true
12 2020-01-07 7 list_1 1 true
答案 3 :(得分:1)
java.lang.ClassNotFoundException: com.google.wireless.android.sdk.stats.IntellijIndexingStats$Index
at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at com.intellij.util.indexing.counters.IndexCounters.<clinit>(IndexCounters.java:34)
at com.intellij.util.indexing.impl.MapReduceIndex.<init>(MapReduceIndex.java:94)
at com.intellij.util.indexing.impl.MapReduceIndex.<init>(MapReduceIndex.java:110)
at org.jetbrains.jps.backwardRefs.index.CompilerReferenceIndex$CompilerMapReduceIndex.<init>(CompilerReferenceIndex.java:248)
at org.jetbrains.jps.backwardRefs.index.CompilerReferenceIndex.<init>(CompilerReferenceIndex.java:84)
at org.jetbrains.jps.backwardRefs.JavaCompilerBackwardReferenceIndex.<init>(JavaCompilerBackwardReferenceIndex.java:12)
at org.jetbrains.jps.backwardRefs.JavaBackwardReferenceIndexWriter.initialize(JavaBackwardReferenceIndexWriter.java:80)
at org.jetbrains.jps.incremental.java.JavaBuilder.buildStarted(JavaBuilder.java:149)
at org.jetbrains.jps.incremental.IncProjectBuilder.runBuild(IncProjectBuilder.java:359)
at org.jetbrains.jps.incremental.IncProjectBuilder.build(IncProjectBuilder.java:178)
at org.jetbrains.jps.cmdline.BuildRunner.runBuild(BuildRunner.java:139)
at org.jetbrains.jps.cmdline.BuildSession.runBuild(BuildSession.java:288)
at org.jetbrains.jps.cmdline.BuildSession.run(BuildSession.java:121)
at org.jetbrains.jps.cmdline.BuildMain$MyMessageHandler.lambda$channelRead0$0(BuildMain.java:228)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)