改写的问题:
我实际上是从几个监控我每日卡路里摄入量的 excel 文件中提取数据。我设法使用列表理解来生成日期。
我的输入:
Date Time calories duration
# 0 22/5/2021 Morning 420 50
# 1 22/5/2021 Afternoon 380 40
# 2 24/5/2021 Morning 390 45
# 3 24/5/2021 Afternoon 400 50
# 4 26/5/2021 Morning 350 45
# 5 26/5/2021 Afternoon 280 50
# 6 27/5/2021 Morning 300 44
# 7 27/5/2021 Afternoon 430 58
输出应该是这样的:
Date Time calories duration
0 22/5/2021 Morning 420 50
1 22/5/2021 Afternoon 380 40
2 23/5/2021 Morning Nan Nan
3 23/5/2021 Afternoon Nan Nan
4 24/5/2021 Morning 390 45
5 24/5/2021 Afternoon 400 50
6 25/5/2021 Morning Nan Nan
7 25/5/2021 Afternoon Nan Nan
8 26/5/2021 Morning 350 45
9 26/5/2021 Afternoon 280 50
10 27/5/2021 Morning 300 44
11 27/5/2021 Afternoon 430 58
答案 0 :(得分:2)
构建 2 DatetimeIndex:一个来自原始数据框的第一个和最后一个日期(完整索引),另一个来自现有的 Date
/ Time
列(稀疏索引)。最后,您可以合并两个数据框并保留 calories
和 duration
列中的数据。
# full index from first and last dates
dti = pd.date_range(df["Date"].min(),
df["Date"].max() + pd.DateOffset(hours=12),
freq="12H")
# new dataframe with the full index
df1 = pd.DataFrame({"Date": dti.date,
"Time": dti.map(lambda dt: "Afternoon" if dt.hour == 12 else "Morning")},
index=dti)
# set index from existing Date / Time columns
df2 = df.set_index(pd.to_datetime(df["Date"].astype(str) + " " + df["Time"]
.replace({"Morning": "00:00:00", "Afternoon": "12:00:00"})))
# merge dataframes and keep data
out = df1.join(df2[["calories", "duration"]]).reset_index(drop=True)
>>> out
Date Time calories duration
0 2021-05-22 Morning 420.0 50.0
1 2021-05-22 Afternoon 380.0 40.0
2 2021-05-23 Morning NaN NaN
3 2021-05-23 Afternoon NaN NaN
4 2021-05-24 Morning 390.0 45.0
5 2021-05-24 Afternoon 400.0 50.0
6 2021-05-25 Morning NaN NaN
7 2021-05-25 Afternoon NaN NaN
8 2021-05-26 Morning 350.0 45.0
9 2021-05-26 Afternoon 280.0 50.0
10 2021-05-27 Morning 300.0 44.0
11 2021-05-27 Afternoon 430.0 58.0
答案 1 :(得分:2)
使用 .stack()
和 .unstack()
方法的解决方案:
用于创建示例数据框的代码:
import numpy as np
import pandas as pd
from io import StringIO
data = StringIO("""
Date Time calories duration
22/5/2021 Morning 420 50
22/5/2021 Afternoon 380 40
24/5/2021 Morning 390 45
24/5/2021 Afternoon 400 50
26/5/2021 Morning 350 45
26/5/2021 Afternoon 280 50
27/5/2021 Morning 300 44
27/5/2021 Afternoon 430 58
""")
df = pd.read_table(data, sep='\s+')
df
Date Time calories duration
# 0 22/5/2021 Morning 420 50
# 1 22/5/2021 Afternoon 380 40
# 2 24/5/2021 Morning 390 45
# 3 24/5/2021 Afternoon 400 50
# 4 26/5/2021 Morning 350 45
# 5 26/5/2021 Afternoon 280 50
# 6 27/5/2021 Morning 300 44
# 7 27/5/2021 Afternoon 430 58
解决方案:
# convert date column to datetime
df['Date'] = pd.to_datetime(df.Date, format="%d/%m/%Y")
(df
.set_index(['Date', 'Time'])
.unstack(fill_value=np.nan)
.asfreq('D', fill_value=np.nan)
.stack(dropna=False)
.sort_index(ascending=[True, False])
.reset_index()
)
# Date Time calories duration
# 0 2021-05-22 Morning 420.0 50.0
# 1 2021-05-22 Afternoon 380.0 40.0
# 2 2021-05-23 Morning NaN NaN
# 3 2021-05-23 Afternoon NaN NaN
# 4 2021-05-24 Morning 390.0 45.0
# 5 2021-05-24 Afternoon 400.0 50.0
# 6 2021-05-25 Morning NaN NaN
# 7 2021-05-25 Afternoon NaN NaN
# 8 2021-05-26 Morning 350.0 45.0
# 9 2021-05-26 Afternoon 280.0 50.0
# 10 2021-05-27 Morning 300.0 44.0
# 11 2021-05-27 Afternoon 430.0 58.0