所以我有pandas dataframe df_dates如下所示。
PERSON_ID MIN_DATE MAX_DATE
0 000099-48 2016-02-01 2017-03-20
1 000184 2016-02-05 2017-01-19
2 000461-48 2016-03-07 2017-03-20
3 000791-48 2016-02-01 2017-03-07
4 000986-48 2016-02-01 2017-03-17
5 001617 2016-02-01 2017-02-20
6 001768-48 2016-02-01 2017-03-20
7 001937 2016-02-01 2017-03-17
8 002223-48 2016-02-04 2017-03-16
9 002481-48 2016-02-05 2017-03-17
我正在尝试将Min和Max之间的所有日期添加为每个Person_ID的行。这是尝试过的。
df_dates.groupby('PERSON_ID').apply(lambda x: pd.date_range(x['MIN_DATE'].values[0], x['MAX_DATE'].values[0]))
但我得到的是,有没有办法将该系列转换为每个Person_ID的行?或任何其他更好的方式吗?
PERSON_ID
0-L2ID DatetimeIndex(['2016-08-05', '2016-08-06', '20...
0-LlID DatetimeIndex(['2016-02-03', '2016-02-04', '20...
000099-48 DatetimeIndex(['2016-02-01', '2016-02-02', '20...
000184 DatetimeIndex(['2016-02-05', '2016-02-06', '20...
000276 DatetimeIndex(['2016-02-01', '2016-02-02', '20...
000461-48 DatetimeIndex(['2016-03-07', '2016-03-08', '20...
000493-48 DatetimeIndex(['2016-02-01', '2016-02-02', '20...
000615-48 DatetimeIndex(['2016-02-02', '2016-02-03', '20...
000791-48 DatetimeIndex(['2016-02-01', '2016-02-02', '20...
000986-48 DatetimeIndex(['2016-02-01', '2016-02-02', '20...
dtype: object
这是我正在努力实现的目标:
PERSON_ID Date
000099-48 2/1/2016
000099-48 2/2/2016
000099-48 2/3/2016
000099-48 2/4/2016
:
:
000099-48 3/18/2016
000099-48 3/19/2016
000099-48 3/20/2016
000184 2/5/2016
000184 2/6/2016
000184 2/7/2016
:
:
000184 1/17/2017
000184 1/18/2017
000184 1/19/2017
答案 0 :(得分:3)
您可以使用melt
重新塑造,然后执行groupby
和resample
:
# Reshape via melt to get in the proper format for a resample.
df = df.melt(id_vars=['PERSON_ID'], value_vars=['MIN_DATE', 'MAX_DATE'], value_name='DATE')
# Set the index and drop unnecessary columns.
df = df.set_index('DATE').drop('variable', axis=1)
# Perform a groupby and resample.
df = df.groupby('PERSON_ID', group_keys=False).resample('D').ffill().reset_index()
结果输出:
DATE PERSON_ID
0 2016-02-01 000099-48
1 2016-02-02 000099-48
2 2016-02-03 000099-48
3 2016-02-04 000099-48
... ... ...
3976 2017-03-14 002481-48
3977 2017-03-15 002481-48
3978 2017-03-16 002481-48
3979 2017-03-17 002481-48
答案 1 :(得分:2)
选项1
d = pd.concat({
p: pd.Series(pd.date_range(s, e)) for i, p, s, e in df.itertuples()
})
d.rename_axis(
['PERSON_ID', None]
).reset_index('PERSON_ID', name='Date').reset_index(drop=True)
PERSON_ID Date
0 000099-48 2016-02-01
1 000099-48 2016-02-02
...
414 000184 2016-02-05
415 000184 2016-02-06
...
764 000461-48 2016-03-07
765 000461-48 2016-03-08
...
1143 000791-48 2016-02-01
1144 000791-48 2016-02-02
...
1544 000986-48 2016-02-01
1545 000986-48 2016-02-02
...
1955 001617 2016-02-01
1956 001617 2016-02-02
...
2341 001768-48 2016-02-01
2342 001768-48 2016-02-02
...
2755 001937 2016-02-01
2756 001937 2016-02-02
...
选项2
lol = [pd.date_range(t.MIN_DATE, t.MAX_DATE).tolist() for t in df.itertuples()]
lns = [len(l) for l in lol]
pd.DataFrame(dict(
PERSON_ID=df.PERSON_ID.values.repeat(lns), Date=np.concatenate(lol)
))[['PERSON_ID', 'Date']]
PERSON_ID Date
0 000099-48 2016-02-01
1 000099-48 2016-02-02
...
414 000184 2016-02-05
415 000184 2016-02-06
...
764 000461-48 2016-03-07
765 000461-48 2016-03-08
...
1143 000791-48 2016-02-01
1144 000791-48 2016-02-02
...
1544 000986-48 2016-02-01
1545 000986-48 2016-02-02
...
1955 001617 2016-02-01
1956 001617 2016-02-02
...
2341 001768-48 2016-02-01
2342 001768-48 2016-02-02
...
2755 001937 2016-02-01
2756 001937 2016-02-02
...
答案 2 :(得分:0)
您也可以继续之前已经在做的事情,但是将datetimeindex转换为字符串,然后使用str.split
创建新行
例如:
df = df.groupby('PERSON_ID').apply(lambda x: pd.date_range(x['MIN_DATE'].values[0], x['MAX_DATE'].values[0])).reset_index()
df_dates = df.rename(columns={0: 'Dates'})
创建转换为字符串的函数。
def get_date_string(x):
return ", ".join([d.strftime('%Y-%m-%d') for d in x])
df_dates['Dates'] = df_dates['Dates'].apply(get_date_string)
将字符串拆分为新行。
s = df_dates['Dates'].str.split(", ").apply(pd.Series, 1).stack()
s.index = s.index.droplevel(-1)
s.name = 'Dates'
加入PERSON_ID列。
del df[0]
print(df.join(s))