Question

我有一个df，如下所示。

Date                t_factor     
2020-02-01             5             
2020-02-03             23              
2020-02-06             14           
2020-02-09             23
2020-02-10             23  
2020-02-11             23          
2020-02-13             30            
2020-02-20             29            
2020-02-29             100
2020-03-01             38
2020-03-10             38               
2020-03-11             38                    
2020-03-26             70           
2020-03-29             70

由此，我想创建一个函数，该函数将基于计算出的值t1，t2和t3计算名为t_function的列。

输入参数存储在字典中，如下所示。

d1 = {'b1': {'s': '2020-02-01', 'e':'2020-02-06', 'coef':[3, 1, 0]},
     'b2': {'s': '2020-02-13', 'e':'2020-02-29', 'coef':[2, 0, 1]},
     'b3': {'s': '2020-03-11', 'e':'2020-03-29', 'coef':[4, 0, 0]}}

预期输出：

Date                t_factor     t1         t2         t3       t_function
2020-02-01             5          4         NaN        NaN      4
2020-02-03             23         6         NaN        NaN      6
2020-02-06             14         9         NaN        NaN      9
2020-02-09             23         NaN       NaN        NaN      0
2020-02-10             23         NaN       NaN        NaN      0
2020-02-11             23         NaN       NaN        NaN      0
2020-02-13             30         NaN       3          NaN      3   
2020-02-20             29         NaN       66         NaN      66
2020-02-29             100        NaN       291        NaN      291
2020-03-01             38         NaN       NaN        NaN      0
2020-03-10             38         NaN       NaN        NaN      0
2020-03-11             38         NaN       NaN        4        4 
2020-03-26             70         NaN       NaN        4        4
2020-03-29             70         NaN       NaN        4        4

我尝试了以下代码

def fun(x, start="2020-02-01", end="2020-02-06", a0=3, a1=1, a2=0):
    start = datetime.strptime(start, "%Y-%m-%d")
    end = datetime.strptime(end, "%Y-%m-%d")
    if start <= x.Date <= end:
        t2 = (x.Date - start)/np.timedelta64(1, 'D') + 1
        diff = a0 + a1*t2 + a2*(t2)**2
    else:
        diff = np.NaN
    return diff

df["t1"] = df.apply(lambda x: fun(x), axis=1)
df["t2"] = df.apply(lambda x: fun(x, "2020-02-13", "2020-02-29", 2, 0, 1), axis=1)
df["t3"] = df.apply(lambda x: fun(x, "2020-03-11", "2020-03-29", 4, 0, 0), axis=1)
df["t_function"] = df['t1'].fillna(0) + df['t2'].fillna(0) + df['t3'].fillna(0)

在上面的代码中，我想通过遍历字典d1来进行更改。

注意：

字典d1可能有三个以上的键，例如'b1'，'b2'，'b3'，'b4'，然后我们必须创建t1，t2，t3和t4列。我想通过遍历字典d1来自动执行此操作：

Answer 1

我建议您将数据存储为元组列表。像这样

{
    "_id": {
        "$oid": "5ed611265828aa77c978afb4"
    },
    "advert_id": "5ec2e4a8bda562e21b8c5052",
    "isCategory": false,
    "name": "2+1 Ev Taşıma",
    "value": "2000 TL",
    "isVisible": false
}

现在您需要的是遍历params = [('2020-02-01', '2020-02-06', 3, 1, 0), ('2020-02-13', '2020-02-29', 2, 0, 1), ('2020-03-11', '2020-03-29', 4, 0, 0)]并将列添加到数据框params。

df

这将提供所需的输出：

total = None
for i, param in enumerate(params):
    s, e, a0, a1, a2 = param
    df[f"t{i+1}"] = df.apply(lambda x: fun(x, s, e, a0, a1, a2), axis=1)
    if i==0:
        total = df[f"t{i+1}"].fillna(0)
    else: 
        total += df[f"t{i+1}"].fillna(0)
df["t_function"] = total

通过遍历熊猫中用户定义的词典来创建特定列

1 个答案: