如何根据条件从列中复制另一个数据集的粘贴值

时间:2019-10-21 01:19:50

标签: python pandas

我有df1

    Id Data    Group_Id
0    1 A         1
1    2 B         2
2    3 B         3
      ...
100  4 A         101
101  5 A         102
      ...

和df2

      Timestamp           Group_Id
2012-01-01 00:00:05.523    1
2013-07-01 00:00:10.757    2
2014-01-12 00:00:15.507.   3
                   ...
2016-03-05 00:00:05.743    101
2017-12-24 00:00:10.407    102
                   ...

我想用Group_Id匹配这两个数据集,然后仅从df2中的date复制Timestamp,然后根据相应的Group_Id粘贴到df1中的新列,将列命名为day1

然后我想在day1旁边添加另外的 6列,将其命名为day2,...,day7,接下来的六天基于第一天看起来像这样:

    Id Data    Group_Id    day1    day2       day3        ...    day7  
0    1 A         1      2012-01-01 2012-01-02 2012-01-03         ...
1    2 B         2      2013-07-01 2013-07-02 2013-07-03         ...
2    3 B         3      2014-01-12 2014-01-13 2014-01-14         ...
                              ...
100  4 A         101    2016-03-05 2016-03-06 2016-03-07         ...
101  5 A         102    2017-12-24 2017-12-25 2017-12-26         ...
                              ...

谢谢。

2 个答案:

答案 0 :(得分:1)

首先我们需要merge

df1=df1.merge(df2,how='left')
s=pd.DataFrame([pd.date_range(x,periods=6,freq ='D') for x in df1.Timestamp],index=df1.index)
s.columns+=1
df1.join(s.add_prefix('Day'))

答案 1 :(得分:1)

这里的另一种方法,基本上只是合并dfs,从时间戳中获取日期,并创建6个新列,每次添加一天:

import pandas as pd
df1 = pd.read_csv('df1.csv')
df2 = pd.read_csv('df2.csv')
df3 = df1.merge(df2, on='Group_Id')

df3['Timestamp'] = pd.to_datetime(df3['Timestamp']) #only necessary if not already timestamp
df3['day1'] = df3['Timestamp'].dt.date

for i in (range(1,7)):
    df3['day'+str(i+1)] = df3['day1'] + pd.Timedelta(i,unit='d')

输出:

   Id Data  Group_Id               Timestamp        day1        day2        day3        day4        day5        day6        day7
0   1    A         1 2012-01-01 00:00:05.523  2012-01-01  2012-01-02  2012-01-03  2012-01-04  2012-01-05  2012-01-06  2012-01-07
1   2    B         2 2013-07-01 00:00:10.757  2013-07-01  2013-07-02  2013-07-03  2013-07-04  2013-07-05  2013-07-06  2013-07-07
2   3    B         3 2014-01-12 00:00:15.507  2014-01-12  2014-01-13  2014-01-14  2014-01-15  2014-01-16  2014-01-17  2014-01-18
3   4    A       101 2016-03-05 00:00:05.743  2016-03-05  2016-03-06  2016-03-07  2016-03-08  2016-03-09  2016-03-10  2016-03-11
4   5    A       102 2017-12-24 00:00:10.407  2017-12-24  2017-12-25  2017-12-26  2017-12-27  2017-12-28  2017-12-29  2017-12-30

请注意,我已将您的数据框复制到一个csv中,并且只有5个整数,因此索引与您的示例(即100、101)不同

如果不需要,您可以删除时间戳列