如何在数据帧之间进行多逻辑值比较?

时间:2019-02-09 19:37:23

标签: python-3.x pandas dataframe

我有两个这样的数据框:

df1:

Email      DateTimeCompleted
2@2.com    2019-02-09T01:34:44.591Z

df2:

Email         DateTimeCompleted
b@b.com       2019-01-29T01:34:44.591Z
2@2.com       2018-01-29T01:34:44.591Z

如何查找df2中的Email值并比较DateTimeCompleted大于TODAY(减去)90天的位置,并将df1行数据附加到df2中?有时添加df2可以为空(如果有所不同)。

df2更新如下:

 Email         DateTimeCompleted
b@b.com       2019-01-29T01:34:44.591Z
2@2.com       2018-01-29T01:34:44.591Z
2@2.com       2019-02-09T01:34:44.591Z

我尝试过:

from datetime import date    

if df1.Email in df2.Email & df2.DateTimeCompleted >= date.today()-90 :
    print('true')

我收到错误消息:

TypeError: 'Series' objects are mutable, thus they cannot be hashed

Also tried:

if df2.Email.str.contains(df1.Email.iat[0]):
    print('true')

got error:

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

2 个答案:

答案 0 :(得分:1)

您可以执行以下操作:
1. merge在关键列Email上的两个数据框,这样您就知道两个数据框都包含哪些行。
2.筛选大于today - 90days
的行 3.用pd.concat

组合数据框以最终

代码:

# Merge dataframes together
df3 = pd.merge(df1, df2, on=['Email'], suffixes=['', '_2'])

# Filter the rows
df3 = df3[df3.DateTimeCompleted > (dt.today() - timedelta(90))]

# Drop the column we dont need
df3.drop(['DateTimeCompleted_2'], axis=1, inplace=True)

# Create final dataframe by concatting
df_final = pd.concat([df2, df3], ignore_index=True)

    Email   DateTimeCompleted
0   b@b.com 2019-01-29 01:34:44.591
1   2@2.com 2018-01-29 01:34:44.591
2   2@2.com 2019-02-09 01:34:44.591

答案 1 :(得分:0)

我编写了一个函数来执行以下操作

该函数接受参数

mailid, dataframe1, dataframe2

def process(mailid,df1,df2):
    if mailid in df2.Email.values:
        b = df1.loc[df1.Email==mailid,"DateTimeCompleted"].head(1)
        if((~b.empty) or (int(((pd.to_datetime('today'))-(pd.to_datetime(b))).astype('timedelta64[D]')) >90)):
            df1 = pd.concat([df1, pd.DataFrame([[mailid,b[0]]],columns=['Email','DateTimeCompleted'])],axis=0)
            print("Added the row")
        else:
            print("Condition failed")
            print("False")
    else:
        print("The mail is not there in dataframe")
    return df1