通过将条件语句应用于dtypes datetime和integer的其他多个列来创建列

时间:2018-12-26 13:00:52

标签: python pandas datetime

我有一个名为df的数据框,看起来与此类似(“访问次数”上升到74,并且有数百个客户端-我在这里已对其进行了简化)。

Client    Visit_1     Visit_2     Visit_3     Visit_4     Visit_5     Eligible  Active     
Client_1  2016-05-10  2016-05-25  2016-06-10  2016-06-25  2016-07-10  0         0  
Client_2  2017-05-10  2017-05-25  2017-06-10  2017-06-25  2017-07-10  0         0  
Client_3  2018-09-10  2018-09-26  2018-10-10  2018-10-26  2018-11-10  1         0  
Client_4  2018-10-10  2018-10-26  2018-11-10  2018-11-26  2018-12-10  1         1  

我想创建一个名为Visit in Window的新列,它具有两个值0和1。如果客户为Visit in Window(值'1,我想将Eligible设置为等于1。 Eligible列中的“”),如果客户是ActiveActive列中的“ 1”的值),并且5列中的任何一个Visit_1Visit_5的日期介于2018-10-252018-12-15之间。

所以,我想得到一个看起来像这样的数据框:

Client    Visit_1     Visit_2     Visit_3     Visit_4     Visit_5     Eligible  Active  Visit_in_Window    
Client_1  2016-05-10  2016-05-25  2016-06-10  2016-06-25  2016-07-10  0         0       0  
Client_2  2017-05-10  2017-05-25  2017-06-10  2017-06-25  2017-07-10  0         0       0  
Client_3  2018-09-10  2018-09-26  2018-10-10  2018-10-26  2018-11-10  1         0       0  
Client_4  2018-10-10  2018-10-26  2018-11-10  2018-11-26  2018-12-10  1         1       1  

我可以使用以下代码对一列进行此操作

df['Visit_in_Window'] = 0
df.loc[((df.Eligible == 1) & (df.Active == 1) &
        (df.Visit_1 > '2018-10-24') & 
        (df.Visit_1 < '2018-12-16')), 'Visit_in_Window'] = 1

但是,我不知道如何在多个列上同时执行此操作。有人可以帮忙吗?

2 个答案:

答案 0 :(得分:1)

我认为,这当然是一种实现方法:

public ICommand OnEdit { get; set; }
OnEdit= new Command(EditAction); 
private void EditAction(object obj)
{ 
 Debug.Write("OK"); 
}

哪些印刷品:

import pandas as pd
from collections import OrderedDict

df = pd.DataFrame(OrderedDict([
    ("Client", ["Client_1", "Client_2", "Client_3", "Client_4"]),
    ("Visit_1", ["2016-05-10", "2017-05-10", "2018-09-10", "2018-10-10"]),
    ("Visit_2", ["2016-05-25", "2017-05-25", "2018-09-26", "2018-10-26"]),
    ("Visit_3", ["2016-06-10", "2017-06-10", "2018-10-10", "2018-11-10"]),
    ("Visit_4", ["2016-06-25", "2017-06-25", "2018-10-26", "2018-11-26"]),
    ("Visit_5", ["2016-07-10", "2017-07-10", "2018-11-10", "2018-12-10"]),
    ("Eligible", [0, 0, 1, 1]),
    ("Active", [0, 0, 0, 1])
]))

df["Visit_in_Window"] = (
    df["Eligible"] & df["Active"] & (
        (("2018-10-25" < df["Visit_1"]) & (df["Visit_1"] < "2018-12-15")) |
        (("2018-10-25" < df["Visit_2"]) & (df["Visit_2"] < "2018-12-15")) |
        (("2018-10-25" < df["Visit_3"]) & (df["Visit_3"] < "2018-12-15")) |
        (("2018-10-25" < df["Visit_4"]) & (df["Visit_4"] < "2018-12-15")) |
        (("2018-10-25" < df["Visit_5"]) & (df["Visit_5"] < "2018-12-15"))
    )
)

print(df.to_string(index=False))

更新

对于从 Client Visit_1 Visit_2 Visit_3 Visit_4 Visit_5 Eligible Active Visit_in_Window Client_1 2016-05-10 2016-05-25 2016-06-10 2016-06-25 2016-07-10 0 0 False Client_2 2017-05-10 2017-05-25 2017-06-10 2017-06-25 2017-07-10 0 0 False Client_3 2018-09-10 2018-09-26 2018-10-10 2018-10-26 2018-11-10 1 0 False Client_4 2018-10-10 2018-10-26 2018-11-10 2018-11-26 2018-12-10 1 1 True N的可变数量Visit_1列,这应该有效:

Visit_N

哪些印刷品:

N = 5
visits = pd.DataFrame([(("2018-10-25" < df["Visit_" + str(i)]) & (df["Visit_" + str(i)] < "2018-12-15")) for i in range(1, N + 1)])
print(visits)
df["Visit_in_Window"] = df["Eligible"] & df["Active"] & visits.any()

如您所见,只有第2列和第3列(客户端3和4)在日期范围内有 0 1 2 3 Visit_1 False False False False Visit_2 False False False True Visit_3 False False False True Visit_4 False False True True Visit_5 False False True True 访问过的地方。 True将处理事先由按位运算符any完成的“合并”。

答案 1 :(得分:0)

执行此操作的一种可能方法与您在问题中建议的相同,但带有附加的“或”语句

df['Visit_in_Window'] = 0
df.loc[
            (df.Eligible == 1) & 
            (df.Active == 1) & 
            ( ((df.Visit_1 > '2018-10-24') & (df.Visit_1 < '2018-12-16')) |
              ((df.Visit_2 > '2018-10-24') & (df.Visit_2 < '2018-12-16')) |
              ((df.Visit_3 > '2018-10-24') & (df.Visit_3 < '2018-12-16')) |
              ((df.Visit_4 > '2018-10-24') & (df.Visit_4 < '2018-12-16')) |
              ((df.Visit_5 > '2018-10-24') & (df.Visit_5 < '2018-12-16')) 
            ) , 

'Visit_in_Window'] = 1