这就是我拥有的:
ID PRICE VOLUME PRODUC FROM_DATE TO_DATE NUMDAYS
1 20.5 15.0 prod_1 2018-08-06 2018-08-13 7
2 15.6 10.0 prod_2 2018-08-06 2018-08-08 2
这是我想要实现的:
ID PRICE VOLUME PRODUC FROM_DATE TO_DATE NUMDAYS
1 20.5 15.0 prod_1 2018-08-06 2018-08-07 1
1 20.5 15.0 prod_1 2018-08-07 2018-08-08 1
1 20.5 15.0 prod_1 2018-08-08 2018-08-09 1
1 20.5 15.0 prod_1 2018-08-09 2018-08-10 1
1 20.5 15.0 prod_1 2018-08-10 2018-08-11 1
1 20.5 15.0 prod_1 2018-08-11 2018-08-12 1
1 20.5 15.0 prod_1 2018-08-12 2018-08-13 1
2 15.6 10.0 prod_2 2018-08-06 2018-08-07 1
2 15.6 10.0 prod_2 2018-08-07 2018-08-08 1
所以我有一个数据框,其中包含有关影响不同日期的产品的信息。
我该怎么办?
我尝试过: -对数据框的每个元素进行for循环,但
df_results = pd.DataFrame(columns=df.columns)
for index, row in df.iterrows():
day = row.to_dict()
for i in range(0,int(row['numdays'])):
day['NUMDAYS'] = 1
day['FROM_DATE'] = row['FROM_DATE']+datetime.timedelta(days=i)
day['TO_DATE'] = day['FROM_DATE'] + datetime.timedelta(days=1)
df_aux = pd.DataFrame.from_dict(day)
df_results .append(df_aux)
但是我无法使其正常工作。
答案 0 :(得分:1)
在熊猫中最好避免循环,因为slow:
#convert columns to datetimes if necessary
df['FROM_DATE'] = pd.to_datetime(df['FROM_DATE'])
df['TO_DATE'] = pd.to_datetime(df['TO_DATE'])
#repeat rows
df = df.loc[np.repeat(df.index, df['NUMDAYS'])]
#add timedeltas by counter
df['FROM_DATE'] += pd.to_timedelta(df.groupby('ID').cumcount(), unit='d')
#add one dau
df['TO_DATE'] = df['FROM_DATE'] + pd.Timedelta(1, unit='d')
#assign scalar
df['NUMDAYS'] = 1
#create default unique index
df = df.reset_index(drop=True)
print (df)
ID PRICE VOLUME PRODUC FROM_DATE TO_DATE NUMDAYS
0 1 20.5 15.0 prod_1 2018-08-06 2018-08-07 1
1 1 20.5 15.0 prod_1 2018-08-07 2018-08-08 1
2 1 20.5 15.0 prod_1 2018-08-08 2018-08-09 1
3 1 20.5 15.0 prod_1 2018-08-09 2018-08-10 1
4 1 20.5 15.0 prod_1 2018-08-10 2018-08-11 1
5 1 20.5 15.0 prod_1 2018-08-11 2018-08-12 1
6 1 20.5 15.0 prod_1 2018-08-12 2018-08-13 1
7 2 15.6 10.0 prod_2 2018-08-06 2018-08-07 1
8 2 15.6 10.0 prod_2 2018-08-07 2018-08-08 1