我有两个这样的数据框:
df1:
Email DateTimeCompleted
2@2.com 2019-02-09T01:34:44.591Z
df2:
Email DateTimeCompleted
b@b.com 2019-01-29T01:34:44.591Z
2@2.com 2018-01-29T01:34:44.591Z
如何查找df2中的Email
值并比较DateTimeCompleted
大于TODAY(减去)90天的位置,并将df1行数据附加到df2中?有时添加df2可以为空(如果有所不同)。
df2更新如下:
Email DateTimeCompleted
b@b.com 2019-01-29T01:34:44.591Z
2@2.com 2018-01-29T01:34:44.591Z
2@2.com 2019-02-09T01:34:44.591Z
我尝试过:
from datetime import date
if df1.Email in df2.Email & df2.DateTimeCompleted >= date.today()-90 :
print('true')
我收到错误消息:
TypeError: 'Series' objects are mutable, thus they cannot be hashed
Also tried:
if df2.Email.str.contains(df1.Email.iat[0]):
print('true')
got error:
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
答案 0 :(得分:1)
您可以执行以下操作:
1. merge
在关键列Email
上的两个数据框,这样您就知道两个数据框都包含哪些行。
2.筛选大于today - 90days
的行
3.用pd.concat
代码:
# Merge dataframes together
df3 = pd.merge(df1, df2, on=['Email'], suffixes=['', '_2'])
# Filter the rows
df3 = df3[df3.DateTimeCompleted > (dt.today() - timedelta(90))]
# Drop the column we dont need
df3.drop(['DateTimeCompleted_2'], axis=1, inplace=True)
# Create final dataframe by concatting
df_final = pd.concat([df2, df3], ignore_index=True)
Email DateTimeCompleted
0 b@b.com 2019-01-29 01:34:44.591
1 2@2.com 2018-01-29 01:34:44.591
2 2@2.com 2019-02-09 01:34:44.591
答案 1 :(得分:0)
我编写了一个函数来执行以下操作
该函数接受参数
mailid, dataframe1, dataframe2
def process(mailid,df1,df2):
if mailid in df2.Email.values:
b = df1.loc[df1.Email==mailid,"DateTimeCompleted"].head(1)
if((~b.empty) or (int(((pd.to_datetime('today'))-(pd.to_datetime(b))).astype('timedelta64[D]')) >90)):
df1 = pd.concat([df1, pd.DataFrame([[mailid,b[0]]],columns=['Email','DateTimeCompleted'])],axis=0)
print("Added the row")
else:
print("Condition failed")
print("False")
else:
print("The mail is not there in dataframe")
return df1