我有df1
Id Data Group_Id
0 1 A 1
1 2 B 2
2 3 B 3
...
100 4 A 101
101 5 A 102
...
和df2
Timestamp Group_Id
2012-01-01 00:00:05.523 1
2013-07-01 00:00:10.757 2
2014-01-12 00:00:15.507. 3
...
2016-03-05 00:00:05.743 101
2017-12-24 00:00:10.407 102
...
我想用Group_Id
匹配这两个数据集,然后仅从df2中的date
复制Timestamp
,然后根据相应的Group_Id
粘贴到df1中的新列,将列命名为day1
。
然后我想在day1
旁边添加另外的 6列,将其命名为day2
,...,day7
,接下来的六天基于第一天看起来像这样:
Id Data Group_Id day1 day2 day3 ... day7
0 1 A 1 2012-01-01 2012-01-02 2012-01-03 ...
1 2 B 2 2013-07-01 2013-07-02 2013-07-03 ...
2 3 B 3 2014-01-12 2014-01-13 2014-01-14 ...
...
100 4 A 101 2016-03-05 2016-03-06 2016-03-07 ...
101 5 A 102 2017-12-24 2017-12-25 2017-12-26 ...
...
谢谢。
答案 0 :(得分:1)
首先我们需要merge
df1=df1.merge(df2,how='left')
s=pd.DataFrame([pd.date_range(x,periods=6,freq ='D') for x in df1.Timestamp],index=df1.index)
s.columns+=1
df1.join(s.add_prefix('Day'))
答案 1 :(得分:1)
这里的另一种方法,基本上只是合并dfs,从时间戳中获取日期,并创建6个新列,每次添加一天:
import pandas as pd
df1 = pd.read_csv('df1.csv')
df2 = pd.read_csv('df2.csv')
df3 = df1.merge(df2, on='Group_Id')
df3['Timestamp'] = pd.to_datetime(df3['Timestamp']) #only necessary if not already timestamp
df3['day1'] = df3['Timestamp'].dt.date
for i in (range(1,7)):
df3['day'+str(i+1)] = df3['day1'] + pd.Timedelta(i,unit='d')
输出:
Id Data Group_Id Timestamp day1 day2 day3 day4 day5 day6 day7
0 1 A 1 2012-01-01 00:00:05.523 2012-01-01 2012-01-02 2012-01-03 2012-01-04 2012-01-05 2012-01-06 2012-01-07
1 2 B 2 2013-07-01 00:00:10.757 2013-07-01 2013-07-02 2013-07-03 2013-07-04 2013-07-05 2013-07-06 2013-07-07
2 3 B 3 2014-01-12 00:00:15.507 2014-01-12 2014-01-13 2014-01-14 2014-01-15 2014-01-16 2014-01-17 2014-01-18
3 4 A 101 2016-03-05 00:00:05.743 2016-03-05 2016-03-06 2016-03-07 2016-03-08 2016-03-09 2016-03-10 2016-03-11
4 5 A 102 2017-12-24 00:00:10.407 2017-12-24 2017-12-25 2017-12-26 2017-12-27 2017-12-28 2017-12-29 2017-12-30
请注意,我已将您的数据框复制到一个csv中,并且只有5个整数,因此索引与您的示例(即100、101)不同
如果不需要,您可以删除时间戳列