我有一个像 df 这样的行:
p_id m_id x_id g_id u_id
0 2 NaN 1408 7 121
1 3 1259 117 23 315
2 3 1259 221 9 718
3 3 1259 397 76 367
和两个日期时间对象:
开始日期:
datetime.datetime(2021, 5, 25, 0, 0)
结束日期:
datetime.datetime(2021, 5, 29, 0, 0)
我如何得到一个 df ,基本上(添加从 start_date 到 end_date 的每一行的周日期):
p_id m_id x_id g_id u_id s_date
0 2 NaN 1408 7 121 2021-05-25
1 2 NaN 1408 7 121 2021-05-26
2 2 NaN 1408 7 121 2021-05-27
3 2 NaN 1408 7 121 2021-05-28
4 2 NaN 1408 7 121 2021-05-29
5 3 1259 117 23 315 2021-05-25
6 3 1259 117 23 315 2021-05-26
7 3 1259 117 23 315 2021-05-27
8 3 1259 117 23 315 2021-05-28
9 3 1259 117 23 315 2021-05-29
.
.
15 3 1259 397 76 367 2021-05-25
16 3 1259 397 76 367 2021-05-26
17 3 1259 397 76 367 2021-05-27
18 3 1259 397 76 367 2021-05-28
19 3 1259 397 76 367 2021-05-29
答案 0 :(得分:5)
date_range
并交叉merge
1.2x
中,为了执行交叉合并,我们现在可以将可选参数 how='cross'
传递给合并函数dates = pd.date_range(start_date, end_date)
df.merge(dates.to_series(name='s_date'), how='cross')
1.2x
,我们必须创建一个临时合并键以执行 cross
合并dates = pd.date_range(start_date, end_date)
df.assign(k=1).merge(dates.to_frame(name='s_date').assign(k=1), on='k').drop('k', 1)
p_id m_id x_id g_id u_id s_date
0 2 NaN 1408 7 121 2021-05-25
1 2 NaN 1408 7 121 2021-05-26
2 2 NaN 1408 7 121 2021-05-27
3 2 NaN 1408 7 121 2021-05-28
4 2 NaN 1408 7 121 2021-05-29
5 3 1259.0 117 23 315 2021-05-25
6 3 1259.0 117 23 315 2021-05-26
7 3 1259.0 117 23 315 2021-05-27
8 3 1259.0 117 23 315 2021-05-28
9 3 1259.0 117 23 315 2021-05-29
10 3 1259.0 221 9 718 2021-05-25
11 3 1259.0 221 9 718 2021-05-26
12 3 1259.0 221 9 718 2021-05-27
13 3 1259.0 221 9 718 2021-05-28
14 3 1259.0 221 9 718 2021-05-29
15 3 1259.0 397 76 367 2021-05-25
16 3 1259.0 397 76 367 2021-05-26
17 3 1259.0 397 76 367 2021-05-27
18 3 1259.0 397 76 367 2021-05-28
19 3 1259.0 397 76 367 2021-05-29
答案 1 :(得分:2)
我要做的方法是首先创建两个日期之间所有日期的列表,并将其作为新列添加到数据框中,然后使用 explode
分解为行:
这是一个例子:
df['s_date'] = [pd.date_range(datetime(2021, 5, 25, 0, 0),datetime(2021, 5, 29, 0, 0),freq='d')] * len(df)
df = df.explode('s_date')
输出:
id start score date
0 id1 NaN 3 2021-05-25
0 id1 NaN 3 2021-05-26
0 id1 NaN 3 2021-05-27
0 id1 NaN 3 2021-05-28
0 id1 NaN 3 2021-05-29
1 id2 12.0 1 2021-05-25
1 id2 12.0 1 2021-05-26
1 id2 12.0 1 2021-05-27
1 id2 12.0 1 2021-05-28
1 id2 12.0 1 2021-05-29
2 id3 11.0 8 2021-05-25
2 id3 11.0 8 2021-05-26
2 id3 11.0 8 2021-05-27
2 id3 11.0 8 2021-05-28
2 id3 11.0 8 2021-05-29
...
...
答案 2 :(得分:1)
我的解决方案中的步骤:
DataFrame
pd.merge
两个DataFrame
(外连接)import pandas as pd
from datetime import datetime, timedelta
# example to your df
a = [{'a': 1, 'b': 2}, {'a': 3, 'b': 4}]
a_df = pd.DataFrame(a)
start_date = datetime.strptime('2021-05-01', '%Y-%m-%d')
end_date = datetime.strptime('2021-06-01', '%Y-%m-%d')
num_of_days = (end_date - start_date).days
date_df = pd.DataFrame([start_date + timedelta(days=x) for x in range(num_of_days)], columns=['date'])
a_df = pd.DataFrame(a)
a_df['key'] = 0
date_df['key'] = 0
a_df = a_df.merge(date_df, on='key', how='outer')
a_df = a_df.drop('key', 1)
a_df