我有数据框df
。我想检查一下在最近的记录中过去6天内是否有任何没有代码B的项目。
df=
item Date code
X 3/5/2016 A
X 3/6/2016 B
X 3/10/2016 A
X 3/12/2016 B
Y 3/5/2016 B
Y 3/7/2016 A
Y 3/9/2016 A
Y 3/10/2016 A
Z 3/4/2016 B
Z 3/9/2016 A
Z 3/10/2016 A
Z 3/13/2016 A
result = [Y,Z]
这是我的尝试:我创建了一个列来表示检查日期。我分组项目,过滤掉旧记录,并说如果没有代码B的记录,保留它。但我的代码似乎没有这样做!任何帮助表示赞赏。
df['Date2'] = pd.to_datetime(df['Date'])
grouped = df.groupby('item')
df['check date'] = (grouped['Date2'].transform(lambda grp: grp.max()-pd.Timedelta(days=6)))
df2 = df.loc[(df['date2'] > df['check date'])]
result=pd.Series(df2['code']<>'B')
答案 0 :(得分:1)
IIUC您需要使用(df['code'] != 'B')
(and
)然后unique
添加条件&
:
df['Date2'] = pd.to_datetime(df['Date'])
grouped = df.groupby('item')
df['check date'] = (grouped['Date2'].transform(lambda grp: grp.max()-pd.Timedelta(days=6)))
df2 = df.loc[(df['Date2'] > df['check date']) & (df['code'] != 'B')]
print df2
item Date code Date2 check date
2 X 3/10/2016 A 2016-03-10 2016-03-06
5 Y 3/7/2016 A 2016-03-07 2016-03-04
6 Y 3/9/2016 A 2016-03-09 2016-03-04
7 Y 3/10/2016 A 2016-03-10 2016-03-04
9 Z 3/9/2016 A 2016-03-09 2016-03-07
10 Z 3/10/2016 A 2016-03-10 2016-03-07
11 Z 3/13/2016 A 2016-03-13 2016-03-07
print df2.item.unique()
['X' 'Y' 'Z']
或者,如果您需要检查groupby的所有值是否B
filter
使用all
:
df2 = df.loc[(df['Date2'] > df['check date'])]
print df2
item Date code Date2 check date
2 X 3/10/2016 A 2016-03-10 2016-03-06
3 X 3/12/2016 B 2016-03-12 2016-03-06
4 Y 3/5/2016 B 2016-03-05 2016-03-04
5 Y 3/7/2016 A 2016-03-07 2016-03-04
6 Y 3/9/2016 A 2016-03-09 2016-03-04
7 Y 3/10/2016 A 2016-03-10 2016-03-04
9 Z 3/9/2016 A 2016-03-09 2016-03-07
10 Z 3/10/2016 A 2016-03-10 2016-03-07
11 Z 3/13/2016 A 2016-03-13 2016-03-07
print df2.groupby('item').filter(lambda x: (x.code != 'B').all())
item Date code Date2 check date
9 Z 3/9/2016 A 2016-03-09 2016-03-07
10 Z 3/10/2016 A 2016-03-10 2016-03-07
11 Z 3/13/2016 A 2016-03-13 2016-03-07
print df2.groupby('item').filter(lambda x: (x.code != 'B').all()).item.unique()
['Z']