日期拆分熊猫

时间:2021-06-11 18:30:32

标签: python pandas

我如何拆分以下 df: 现有数据框

    t         t1
Test1   [October 22nd, 2019, February 8th, 2020, Augus...
Test2   [July 31st, 2020, September 21st, 2020, March ...

Desired Dataframe
    t         t1
Test1    October 22nd, 2019
Test1    February 8th, 2020
Test2    July 31st, 2020
Test2    September 21st, 2020
new_df.head().to_dict()
{'t': {0: 'Test1', 1: 'Test2'},
 't1': {0: [Date(22,10,2019),
   Date(8,2,2020),
   Date(8,8,2020),
   Date(8,2,2021),
   Date(11,6,2021)],
  1: [Date(31,7,2020), Date(21,9,2020), Date(21,3,2021), Date(11,6,2021)]}}

按照下面的尝试代码

new_df["t1"]=new_df["t1"].float64.split(",")
print(new_df.explode("t1").reset_index(drop=True))

出现错误:

AttributeError: 'Series' object has no attribute 'float64'

1 个答案:

答案 0 :(得分:0)

我不确定 new_df 是如何为您构建的,但 @Henry 走在正确的轨道上,以下内容对我有用。

首先我构造数据框:

data = {
    't': ['Test1','Test2'],
    't1': [
        [date(2019,10,22), date(2020,2,8), date(2020,8,8), date(2021,2,8), date(2021,6,11)],
        [date(2020,7,31), date(2020,9,21)]
    ]}

new_df = pd.DataFrame(data)

然后用explode命令得到你想要的:

new_df.explode("t1").reset_index(drop=True)

Out:
       t          t1
0  Test1  2019-10-22
1  Test1  2020-02-08
2  Test1  2020-08-08
3  Test1  2021-02-08
4  Test1  2021-06-11
5  Test2  2020-07-31
6  Test2  2020-09-21

只要 t1 中的每一行都是一个日期时间数组,上面的应该可以工作。