根据熊猫中数据框的日期列对词典列表进行排序

时间:2020-07-16 18:52:13

标签: python-3.x pandas dataframe

我有一个输入列表和数据框,如下所示。

[{"type": "linear",
  "from": "2020-02-04T20:00:00.000Z",
  "to": "2020-02-03T20:00:00.000Z",
  "days":3,
  "coef":[0.1,0.1,0.1,0.1,0.1,0.1]
  },
 {"type": "quadratic",
  "from": "2020-02-03T20:00:00.000Z",
  "to": "2020-02-10T20:00:00.000Z",
  "days":3,
  "coef":[0.1,0.1,0.1,0.1,0.1,0.1]
  },
 {"type": "polynomial",
  "from": "2020-02-05T20:00:00.000Z",
  "to": "2020-02-03T20:00:00.000Z",
  "days":3,
  "coef":[0.1,0.1,0.1,0.1,0.1,0.1]
  }]

df:

Date                t_factor     
2020-02-01             5             
2020-02-02             23              
2020-02-03             14           
2020-02-04             23
2020-02-05             23  
2020-02-06             23          
2020-02-07             30            
2020-02-08             29            
2020-02-09             100
2020-03-10             38
2020-03-11             38               
2020-03-12             38                    
2020-03-13             70           
2020-03-14             70 

第一步:根据字典中“ from”键的值对列表进行排序

[
 {"type": "quadratic",
      "from": "2020-02-03T20:00:00.000Z",
      "to": "2020-02-10T20:00:00.000Z",
      "days":3,
      "coef":[0.1,0.1,0.1,0.1,0.1,0.1]
      },
{"type": "linear",
      "from": "2020-02-04T20:00:00.000Z",
      "to": "2020-02-03T20:00:00.000Z",
      "days":3,
      "coef":[0.1,0.1,0.1,0.1,0.1,0.1]
      },
     {"type": "polynomial",
      "from": "2020-02-05T20:00:00.000Z",
      "to": "2020-02-03T20:00:00.000Z",
      "days":3,
      "coef":[0.1,0.1,0.1,0.1,0.1,0.1]
      }]

第2步:添加一个字典,其值为“ from”键作为df的最小日期,而“ to”应为“ from”日期,是排序列表中第一个字典的日期。 “天” = 0,“系数”:[0.1,0.1,0.1,0.1,0.1,0.1]。

{"type": "df_first",
      "from": "2020-02-01T20:00:00.000Z",
      "to": "2020-02-03T20:00:00.000Z",
      "days":0,
      "coef":[0.1,0.1,0.1,0.1,0.1,0.1]
      }

Step3:添加一个字典,其值的“ from”键为df的最短日期之后7天,而“ to”则应为from的最短日期之后

{"type": "df_mid",
      "from": "2020-02-08T20:00:00.000Z",
      "to": "2020-02-09T20:00:00.000Z",
      "days":0,
      "coef":[0.1,0.1,0.1,0.1,0.1,0.1]
      }

第4步:添加一个字典,其值应为“ from”键,因为df的最大日期和“ to”应与“ from”相同。

{"type": "df_last",
      "from": "2020-02-14T20:00:00.000Z",
      "to": "2020-02-14T20:00:00.000Z",
      "days":0,
      "coef":[0.1,0.1,0.1,0.1,0.1,0.1]
      }

第5步:根据“开始”日期对所有词典进行排序。

预期输出:

[{"type": "df_first",
      "from": "2020-02-01T20:00:00.000Z",
      "to": "2020-02-03T20:00:00.000Z",
      "days":0,
      "coef":[0.1,0.1,0.1,0.1,0.1,0.1]
      },
     {"type": "quadratic",
      "from": "2020-02-03T20:00:00.000Z",
      "to": "2020-02-10T20:00:00.000Z",
      "days":3,
      "coef":[0.1,0.1,0.1,0.1,0.1,0.1]
      },
{"type": "linear",
      "from": "2020-02-04T20:00:00.000Z",
      "to": "2020-02-03T20:00:00.000Z",
      "days":3,
      "coef":[0.1,0.1,0.1,0.1,0.1,0.1]
      },

     {"type": "polynomial",
      "from": "2020-02-05T20:00:00.000Z",
      "to": "2020-02-03T20:00:00.000Z",
      "days":3,
      "coef":[0.1,0.1,0.1,0.1,0.1,0.1]
      },
{"type": "df_mid",
      "from": "2020-02-08T20:00:00.000Z",
      "to": "2020-02-09T20:00:00.000Z",
      "days":0,
      "coef":[0.1,0.1,0.1,0.1,0.1,0.1]
      },

{"type": "df_last",
      "from": "2020-02-14T20:00:00.000Z",
      "to": "2020-02-14T20:00:00.000Z",
      "days":0,
      "coef":[0.1,0.1,0.1,0.1,0.1,0.1]
      }
]

第6步:

将每个词典的“ to”值替换为下一个词典的“ from”值。最后一个字典的“ to”值保持原样。

预期的最终输出:

[{"type": "df_first",
          "from": "2020-02-01T20:00:00.000Z",
          "to": "2020-02-03T20:00:00.000Z",
          "days":0,
          "coef":[0.1,0.1,0.1,0.1,0.1,0.1]
          },
         {"type": "quadratic",
          "from": "2020-02-03T20:00:00.000Z",
          "to": "2020-02-04T20:00:00.000Z",
          "days":3,
          "coef":[0.1,0.1,0.1,0.1,0.1,0.1]
          },
    {"type": "linear",
          "from": "2020-02-04T20:00:00.000Z",
          "to": "2020-02-05T20:00:00.000Z",
          "days":3,
          "coef":[0.1,0.1,0.1,0.1,0.1,0.1]
          },
    
         {"type": "polynomial",
          "from": "2020-02-05T20:00:00.000Z",
          "to": "2020-02-08T20:00:00.000Z",
          "days":3,
          "coef":[0.1,0.1,0.1,0.1,0.1,0.1]
          },
    {"type": "df_mid",
          "from": "2020-02-08T20:00:00.000Z",
          "to": "2020-02-14T20:00:00.000Z",
          "days":0,
          "coef":[0.1,0.1,0.1,0.1,0.1,0.1]
          },
    
    {"type": "df_last",
          "from": "2020-02-14T20:00:00.000Z",
          "to": "2020-02-14T20:00:00.000Z",
          "days":0,
          "coef":[0.1,0.1,0.1,0.1,0.1,0.1]
          }
    ]

1 个答案:

答案 0 :(得分:1)

定义一个函数add_dct,该函数将参数作为带有lst_type_from的字典_to的列表,并向{{1 }}:

lst

请按照您的dmin, dmax = df['Date'].min(), df['Date'].max() def add_dct(lst, _type, _from, _to): lst.append({ 'type': _type, 'from': _from if isinstance(_from, str) else _from.strftime("%Y-%m-%dT20:%M:%S.000Z"), 'to': _to if isinstance(_to, str) else _to.strftime("%Y-%m-%dT20:%M:%S.000Z"), 'days': 0, "coef":[0.1,0.1,0.1,0.1,0.1,0.1] }) 的要求执行以下步骤:

predefined

结果:

# STEP 1
lst = sorted(lst, key=lambda d: pd.Timestamp(d['from']))

# STEP 2
add_dct(lst, 'df_first', dmin, lst[0]['from'])

# STEP 3
add_dct(lst, 'df_mid', dmin + pd.Timedelta(days=7), dmin + pd.Timedelta(days=8))

# STEP 4
add_dct(lst, 'df_last', dmax, dmax)

# STEP 5
lst = sorted(lst, key=lambda d: pd.Timestamp(d['from']))