我有一个输入列表和数据框,如下所示。
[{"type": "linear",
"from": "2020-02-04T20:00:00.000Z",
"to": "2020-02-03T20:00:00.000Z",
"days":3,
"coef":[0.1,0.1,0.1,0.1,0.1,0.1]
},
{"type": "quadratic",
"from": "2020-02-03T20:00:00.000Z",
"to": "2020-02-10T20:00:00.000Z",
"days":3,
"coef":[0.1,0.1,0.1,0.1,0.1,0.1]
},
{"type": "polynomial",
"from": "2020-02-05T20:00:00.000Z",
"to": "2020-02-03T20:00:00.000Z",
"days":3,
"coef":[0.1,0.1,0.1,0.1,0.1,0.1]
}]
df:
Date t_factor
2020-02-01 5
2020-02-02 23
2020-02-03 14
2020-02-04 23
2020-02-05 23
2020-02-06 23
2020-02-07 30
2020-02-08 29
2020-02-09 100
2020-03-10 38
2020-03-11 38
2020-03-12 38
2020-03-13 70
2020-03-14 70
第一步:根据字典中“ from”键的值对列表进行排序
[
{"type": "quadratic",
"from": "2020-02-03T20:00:00.000Z",
"to": "2020-02-10T20:00:00.000Z",
"days":3,
"coef":[0.1,0.1,0.1,0.1,0.1,0.1]
},
{"type": "linear",
"from": "2020-02-04T20:00:00.000Z",
"to": "2020-02-03T20:00:00.000Z",
"days":3,
"coef":[0.1,0.1,0.1,0.1,0.1,0.1]
},
{"type": "polynomial",
"from": "2020-02-05T20:00:00.000Z",
"to": "2020-02-03T20:00:00.000Z",
"days":3,
"coef":[0.1,0.1,0.1,0.1,0.1,0.1]
}]
第2步:添加一个字典,其值为“ from”键作为df的最小日期,而“ to”应为“ from”日期,是排序列表中第一个字典的日期。 “天” = 0,“系数”:[0.1,0.1,0.1,0.1,0.1,0.1]。
{"type": "df_first",
"from": "2020-02-01T20:00:00.000Z",
"to": "2020-02-03T20:00:00.000Z",
"days":0,
"coef":[0.1,0.1,0.1,0.1,0.1,0.1]
}
Step3:添加一个字典,其值的“ from”键为df的最短日期之后7天,而“ to”则应为from的最短日期之后
{"type": "df_mid",
"from": "2020-02-08T20:00:00.000Z",
"to": "2020-02-09T20:00:00.000Z",
"days":0,
"coef":[0.1,0.1,0.1,0.1,0.1,0.1]
}
第4步:添加一个字典,其值应为“ from”键,因为df的最大日期和“ to”应与“ from”相同。
{"type": "df_last",
"from": "2020-02-14T20:00:00.000Z",
"to": "2020-02-14T20:00:00.000Z",
"days":0,
"coef":[0.1,0.1,0.1,0.1,0.1,0.1]
}
第5步:根据“开始”日期对所有词典进行排序。
预期输出:
[{"type": "df_first",
"from": "2020-02-01T20:00:00.000Z",
"to": "2020-02-03T20:00:00.000Z",
"days":0,
"coef":[0.1,0.1,0.1,0.1,0.1,0.1]
},
{"type": "quadratic",
"from": "2020-02-03T20:00:00.000Z",
"to": "2020-02-10T20:00:00.000Z",
"days":3,
"coef":[0.1,0.1,0.1,0.1,0.1,0.1]
},
{"type": "linear",
"from": "2020-02-04T20:00:00.000Z",
"to": "2020-02-03T20:00:00.000Z",
"days":3,
"coef":[0.1,0.1,0.1,0.1,0.1,0.1]
},
{"type": "polynomial",
"from": "2020-02-05T20:00:00.000Z",
"to": "2020-02-03T20:00:00.000Z",
"days":3,
"coef":[0.1,0.1,0.1,0.1,0.1,0.1]
},
{"type": "df_mid",
"from": "2020-02-08T20:00:00.000Z",
"to": "2020-02-09T20:00:00.000Z",
"days":0,
"coef":[0.1,0.1,0.1,0.1,0.1,0.1]
},
{"type": "df_last",
"from": "2020-02-14T20:00:00.000Z",
"to": "2020-02-14T20:00:00.000Z",
"days":0,
"coef":[0.1,0.1,0.1,0.1,0.1,0.1]
}
]
第6步:
将每个词典的“ to”值替换为下一个词典的“ from”值。最后一个字典的“ to”值保持原样。
预期的最终输出:
[{"type": "df_first",
"from": "2020-02-01T20:00:00.000Z",
"to": "2020-02-03T20:00:00.000Z",
"days":0,
"coef":[0.1,0.1,0.1,0.1,0.1,0.1]
},
{"type": "quadratic",
"from": "2020-02-03T20:00:00.000Z",
"to": "2020-02-04T20:00:00.000Z",
"days":3,
"coef":[0.1,0.1,0.1,0.1,0.1,0.1]
},
{"type": "linear",
"from": "2020-02-04T20:00:00.000Z",
"to": "2020-02-05T20:00:00.000Z",
"days":3,
"coef":[0.1,0.1,0.1,0.1,0.1,0.1]
},
{"type": "polynomial",
"from": "2020-02-05T20:00:00.000Z",
"to": "2020-02-08T20:00:00.000Z",
"days":3,
"coef":[0.1,0.1,0.1,0.1,0.1,0.1]
},
{"type": "df_mid",
"from": "2020-02-08T20:00:00.000Z",
"to": "2020-02-14T20:00:00.000Z",
"days":0,
"coef":[0.1,0.1,0.1,0.1,0.1,0.1]
},
{"type": "df_last",
"from": "2020-02-14T20:00:00.000Z",
"to": "2020-02-14T20:00:00.000Z",
"days":0,
"coef":[0.1,0.1,0.1,0.1,0.1,0.1]
}
]
答案 0 :(得分:1)
定义一个函数add_dct
,该函数将参数作为带有lst
,_type
和_from
的字典_to
的列表,并向{{1 }}:
lst
请按照您的dmin, dmax = df['Date'].min(), df['Date'].max()
def add_dct(lst, _type, _from, _to):
lst.append({
'type': _type,
'from': _from if isinstance(_from, str) else _from.strftime("%Y-%m-%dT20:%M:%S.000Z"),
'to': _to if isinstance(_to, str) else _to.strftime("%Y-%m-%dT20:%M:%S.000Z"),
'days': 0,
"coef":[0.1,0.1,0.1,0.1,0.1,0.1]
})
的要求执行以下步骤:
predefined
结果:
# STEP 1
lst = sorted(lst, key=lambda d: pd.Timestamp(d['from']))
# STEP 2
add_dct(lst, 'df_first', dmin, lst[0]['from'])
# STEP 3
add_dct(lst, 'df_mid', dmin + pd.Timedelta(days=7), dmin + pd.Timedelta(days=8))
# STEP 4
add_dct(lst, 'df_last', dmax, dmax)
# STEP 5
lst = sorted(lst, key=lambda d: pd.Timestamp(d['from']))