我有一个df,如下所示。
Date t_factor
2020-02-01 5
2020-02-03 23
2020-02-06 14
2020-02-09 23
2020-02-10 23
2020-02-11 23
2020-02-13 30
2020-02-20 29
2020-02-29 100
2020-03-01 38
2020-03-10 38
2020-03-11 38
2020-03-26 70
2020-03-29 70
由此,我想创建一个函数,该函数将基于计算出的值t1,t2和t3计算名为t_function的列。
输入参数存储在字典中,如下所示。
d1 = {'b1': {'s': '2020-02-01', 'e':'2020-02-06', 'coef':[3, 1, 0]},
'b2': {'s': '2020-02-13', 'e':'2020-02-29', 'coef':[2, 0, 1]},
'b3': {'s': '2020-03-11', 'e':'2020-03-29', 'coef':[4, 0, 0]}}
预期输出:
Date t_factor t1 t2 t3 t_function
2020-02-01 5 4 NaN NaN 4
2020-02-03 23 6 NaN NaN 6
2020-02-06 14 9 NaN NaN 9
2020-02-09 23 NaN NaN NaN 0
2020-02-10 23 NaN NaN NaN 0
2020-02-11 23 NaN NaN NaN 0
2020-02-13 30 NaN 3 NaN 3
2020-02-20 29 NaN 66 NaN 66
2020-02-29 100 NaN 291 NaN 291
2020-03-01 38 NaN NaN NaN 0
2020-03-10 38 NaN NaN NaN 0
2020-03-11 38 NaN NaN 4 4
2020-03-26 70 NaN NaN 4 4
2020-03-29 70 NaN NaN 4 4
我尝试了以下代码
def fun(x, start="2020-02-01", end="2020-02-06", a0=3, a1=1, a2=0):
start = datetime.strptime(start, "%Y-%m-%d")
end = datetime.strptime(end, "%Y-%m-%d")
if start <= x.Date <= end:
t2 = (x.Date - start)/np.timedelta64(1, 'D') + 1
diff = a0 + a1*t2 + a2*(t2)**2
else:
diff = np.NaN
return diff
df["t1"] = df.apply(lambda x: fun(x), axis=1)
df["t2"] = df.apply(lambda x: fun(x, "2020-02-13", "2020-02-29", 2, 0, 1), axis=1)
df["t3"] = df.apply(lambda x: fun(x, "2020-03-11", "2020-03-29", 4, 0, 0), axis=1)
df["t_function"] = df['t1'].fillna(0) + df['t2'].fillna(0) + df['t3'].fillna(0)
在上面的代码中,我想通过遍历字典d1来进行更改。
注意:
字典d1可能有三个以上的键,例如'b1','b2','b3','b4',然后我们必须创建t1,t2,t3和t4列。我想通过遍历字典d1来自动执行此操作:
答案 0 :(得分:1)
我建议您将数据存储为元组列表。像这样
{
"_id": {
"$oid": "5ed611265828aa77c978afb4"
},
"advert_id": "5ec2e4a8bda562e21b8c5052",
"isCategory": false,
"name": "2+1 Ev Taşıma",
"value": "2000 TL",
"isVisible": false
}
现在您需要的是遍历params = [('2020-02-01', '2020-02-06', 3, 1, 0),
('2020-02-13', '2020-02-29', 2, 0, 1),
('2020-03-11', '2020-03-29', 4, 0, 0)]
并将列添加到数据框params
。
df
这将提供所需的输出:
total = None
for i, param in enumerate(params):
s, e, a0, a1, a2 = param
df[f"t{i+1}"] = df.apply(lambda x: fun(x, s, e, a0, a1, a2), axis=1)
if i==0:
total = df[f"t{i+1}"].fillna(0)
else:
total += df[f"t{i+1}"].fillna(0)
df["t_function"] = total