熊猫-如何在数据框中将日期范围拆分为额外的列

时间:2018-11-15 12:46:48

标签: python pandas dataframe

数据集

   sample = {'operator': ['op_a',
  'op_a',
  'op_a',
  'op_a',
  'op_b',
  'op_b',
  'op_b',
  'op_b',
  'op_c',
  'op_c',
  'op_c',
  'op_c'],
 'from': ['a', 'a', 'a', 'a', 'c', 'c', 'c', 'c', 'a', 'a', 'a', 'a'],
 'to': ['b', 'b', 'b', 'b', 'd', 'd', 'd', 'd', 'b', 'b', 'b', 'b'],
 'valid_from': ['13/11/2018',
  '13/11/2018',
  '13/11/2018',
  '13/11/2018',
  '13/11/2018',
  '13/11/2018',
  '13/11/2018',
  '13/11/2018',
  '15/02/2019',
  '15/02/2019',
  '15/02/2019',
  '15/02/2019'],
 'valid_to': ['19/11/2018',
  '19/11/2018',
  '19/11/2018',
  '19/11/2018',
  '19/11/2018',
  '19/11/2018',
  '19/11/2018',
  '19/11/2018',
  '21/02/2019',
  '21/02/2019',
  '21/02/2019',
  '21/02/2019']}

df_test = pd.DataFrame(sample)
df_test

我希望能够将valid_fromvalid_to列分成各自的日期并添加到数据框中。

输出

    df3 = pd.DataFrame({'operator': ['op_a',
  'op_a',
  'op_a',
  'op_a',
  'op_b',
  'op_b',
  'op_b',
  'op_b',
  'op_c',
  'op_c',
  'op_c',
  'op_c'],
 'from': ['a', 'a', 'a', 'a', 'c', 'c', 'c', 'c', 'a', 'a', 'a', 'a'],
 'to': ['b', 'b', 'b', 'b', 'd', 'd', 'd', 'd', 'b', 'b', 'b', 'b'],
 'valid_from': ['13/11/2018',
  '13/11/2018',
  '13/11/2018',
  '13/11/2018',
  '13/11/2018',
  '13/11/2018',
  '13/11/2018',
  '13/11/2018',
  '15/02/2019',
  '15/02/2019',
  '15/02/2019',
  '15/02/2019'],
 'valid_1': ['14/11/2018',
  '14/11/2018',
  '14/11/2018',
  '14/11/2018',
  '14/11/2018',
  '14/11/2018',
  '14/11/2018',
  '14/11/2018',
  '16/02/2019',
  '16/02/2019',
  '16/02/2019',
  '16/02/2019'],
 'valid_2': ['15/11/2018',
  '15/11/2018',
  '15/11/2018',
  '15/11/2018',
  '15/11/2018',
  '15/11/2018',
  '15/11/2018',
  '15/11/2018',
  '17/02/2019',
  '17/02/2019',
  '17/02/2019',
  '17/02/2019'],
 'valid_3': ['16/11/2018',
  '16/11/2018',
  '16/11/2018',
  '16/11/2018',
  '16/11/2018',
  '16/11/2018',
  '16/11/2018',
  '16/11/2018',
  '18/02/2019',
  '18/02/2019',
  '18/02/2019',
  '18/02/2019'],
 'valid_4': ['17/11/2018',
  '17/11/2018',
  '17/11/2018',
  '17/11/2018',
  '17/11/2018',
  '17/11/2018',
  '17/11/2018',
  '17/11/2018',
  '19/02/2019',
  '19/02/2019',
  '19/02/2019',
  '19/02/2019'],
 'valid_5': ['18/11/2018',
  '18/11/2018',
  '18/11/2018',
  '18/11/2018',
  '18/11/2018',
  '18/11/2018',
  '18/11/2018',
  '18/11/2018',
  '20/02/2019',
  '20/02/2019',
  '20/02/2019',
  '20/02/2019'],
 'valid_to': ['19/11/2018',
  '19/11/2018',
  '19/11/2018',
  '19/11/2018',
  '19/11/2018',
  '19/11/2018',
  '19/11/2018',
  '19/11/2018',
  '21/02/2019',
  '21/02/2019',
  '21/02/2019',
  '21/02/2019']})

df2

1 个答案:

答案 0 :(得分:1)

您可以尝试:

df_test['valid_from'] = pd.to_datetime(df_test['valid_from'])
df_test['valid_to'] = pd.to_datetime(df_test['valid_to'])
diff_days = int((df_test.loc[0,'valid_to'] - df_test.loc[0,'valid_from']).days)
for i in range(diff_days-1):
    df_test['valid_{}'.format(i+1)]= pd.DatetimeIndex(df_test['valid_from']) + pd.DateOffset(i+1)

此解决方案假定所有行的天数相同,因为未另行指定。

输出:

   from operator to valid_from    valid_to    valid_1    valid_2    valid_3  \
0     a     op_a  b 2018-11-13  19/11/2018 2018-11-14 2018-11-15 2018-11-16   
1     a     op_a  b 2018-11-13  19/11/2018 2018-11-14 2018-11-15 2018-11-16   
2     a     op_a  b 2018-11-13  19/11/2018 2018-11-14 2018-11-15 2018-11-16   
3     a     op_a  b 2018-11-13  19/11/2018 2018-11-14 2018-11-15 2018-11-16   
4     c     op_b  d 2018-11-13  19/11/2018 2018-11-14 2018-11-15 2018-11-16   
5     c     op_b  d 2018-11-13  19/11/2018 2018-11-14 2018-11-15 2018-11-16   
6     c     op_b  d 2018-11-13  19/11/2018 2018-11-14 2018-11-15 2018-11-16   
7     c     op_b  d 2018-11-13  19/11/2018 2018-11-14 2018-11-15 2018-11-16   
8     a     op_c  b 2019-02-15  21/02/2019 2019-02-16 2019-02-17 2019-02-18   
9     a     op_c  b 2019-02-15  21/02/2019 2019-02-16 2019-02-17 2019-02-18   
10    a     op_c  b 2019-02-15  21/02/2019 2019-02-16 2019-02-17 2019-02-18   
11    a     op_c  b 2019-02-15  21/02/2019 2019-02-16 2019-02-17 2019-02-18   

      valid_4    valid_5  
0  2018-11-17 2018-11-18  
1  2018-11-17 2018-11-18  
2  2018-11-17 2018-11-18  
3  2018-11-17 2018-11-18  
4  2018-11-17 2018-11-18  
5  2018-11-17 2018-11-18  
6  2018-11-17 2018-11-18  
7  2018-11-17 2018-11-18  
8  2019-02-19 2019-02-20  
9  2019-02-19 2019-02-20  
10 2019-02-19 2019-02-20  
11 2019-02-19 2019-02-20