Question

我想过滤customer_id's，在此情况下，2019-01-10首先出现在某个日期之后，然后创建一个包含新客户列表的新df df

date          customer_id

2019-01-01    429492
2019-01-01    344343
2019-01-01    949222
2019-01-10    429492
2019-01-10    344343
2019-01-10    129292

输出df

customer_id
129292

这是我到目前为止尝试过的方法，但这也给了我2019年1月10日之前有效的customer_id's

s = df.loc[df["date"]>="2019-01-10", "customer_id"]

df_new = df[df["customer_id"].isin(s)]
df_new

Answer 1

您可以将布尔索引与Series.isin一起使用进行过滤：

df["date"] = pd.to_datetime(df["date"])

mask1 = df["date"]>="2019-01-10"
mask2 = df["customer_id"].isin(df.loc[~mask1,"customer_id"])

df = df.loc[mask1 & ~mask2, ['customer_id']]
print (df)
   customer_id
5       129292

Answer 2

“然后使用新客户列表创建新的df”，因此在这种情况下，您的输出为null，因为2019-01-10是最后日期，该日期之后没有新客户

但是，如果您想在特定日期或之后获取客户列表：

df=pd.DataFrame({
    'date':['2019-01-01','2019-01-01','2019-01-01',
            '2019-01-10','2019-01-10','2019-01-10'],
    'customer_id':[429492,344343,949222,429492,344343,129292]
})
certain_date=pd.to_datetime('2019-01-10')
df.date=pd.to_datetime(df.date)
df=df[
    df.date>=certain_date
]
print(df)


           date  customer_id
3 2019-01-10       429492
4 2019-01-10       344343
5 2019-01-10       129292

Answer 3

如果您的'date'列中有日期时间对象，则只需执行以下操作：

df_new = df[df['date'] >= datetime(2019, 1, 10)]['customer_id']

如果您的'date'列不包含日期时间对象，则应首先使用to_datetime方法对其进行转换：

df['date'] = pd.to_datetime(df['date'])

然后应用上述方法。

根据日期列删除行[Pandas]

4 个答案: