我有一个数据框,其中包含一些ID,以及一些与该ID相对应的开始日期和结束日期,
type SolrCoreParams = {
defType: SolrDefType,
boost: SolrBoostType,
}
type SolrSpellParams = {
spellcheck: "true" | "false",
"spellcheck.collate": "true" | "false",
"spellcheck.maxCollationTries": 1,
}
type SolrGroupParams = {
group: "true" | "false",
"group.limit": '4'
"group.sort": 'group_level asc,score desc,published desc,text_sort asc'
"group.main": 'true'
"group.field": 'group_uri'
}
type SolrPassthru =
SolrCoreParams &
SolrSpellParams &
SolrGroupParams
我想查看一行的起始日期是否在同一ID的任何其他起始日期和结束日期之间,以及起始日期和结束日期之间有多少记录。 结果输出应如下所示,
df = pd.DataFrame({'id': [1,1,1,1,1,2,2,2,2,2],
'start_date': ['2016-07-27 16:07:00','2016-10-20 08:10:00','2016-12-08 10:12:00','2017-07-16 11:07:00','2017-07-16 16:07:00','2016-07-27 16:07:00','2016-10-20 08:10:00','2016-12-08 10:12:00','2017-07-16 11:07:00','2017-07-16 16:07:00'],
'end_date': ['2016-07-29 15:07:00','2017-08-10 07:04:00','2017-03-07 12:03:00','2017-07-18 11:07:00','2017-09-20 12:09:00','2016-07-29 15:07:00','2017-08-10 07:04:00','2017-03-07 12:03:00','2017-07-18 11:07:00','2017-09-20 12:09:00']})
id start_date end_date
1 2016-07-27 16:07:00 2016-07-29 15:07:00
1 2016-10-20 08:10:00 2017-08-10 07:04:00
1 2016-12-08 10:12:00 2017-03-07 12:03:00
1 2017-07-16 11:07:00 2017-07-18 11:07:00
1 2017-07-16 16:07:00 2017-09-20 12:09:00
2 2016-07-27 16:07:00 2016-07-29 15:07:00
2 2016-10-20 08:10:00 2017-08-10 07:04:00
2 2016-12-08 10:12:00 2017-03-07 12:03:00
2 2017-07-16 11:07:00 2017-07-18 11:07:00
2 2017-07-16 16:07:00 2017-09-20 12:09:00
我尝试过类似的事情,
id start_date end_date count_col
1 2016-07-27 16:07:00 2016-07-29 15:07:00 0
1 2016-10-20 08:10:00 2017-08-10 07:04:00 0
1 2016-12-08 10:12:00 2017-03-07 12:03:00 1
1 2017-07-16 11:07:00 2017-07-18 11:07:00 1
1 2017-07-16 16:07:00 2017-09-20 12:09:00 2
2 2016-07-27 16:07:00 2016-07-29 15:07:00 0
2 2016-10-20 08:10:00 2017-08-10 07:04:00 0
2 2016-12-08 10:12:00 2017-03-07 12:03:00 1
2 2017-07-16 11:07:00 2017-07-18 11:07:00 1
2 2017-07-16 16:07:00 2017-09-20 12:09:00 2
但是,这也会自己检查一行,也不会单独与同一个ID进行比较。
答案 0 :(得分:3)
总会添加原始行,因此只需减去1
,这里就不需要lambda函数:
编辑:
对于每个组的测试值,请使用:
df.start_date = pd.to_datetime(df.start_date)
df.end_date = pd.to_datetime(df.end_date)
def start_date_compare(subdf):
date_within = subdf.apply(lambda x: ((x['start_date']<=subdf['start_date']) &
(x['end_date']>=subdf['start_date'])), axis=1)
subdf['count_col'] = date_within.sum(axis=0) - 1
return subdf
df = df.groupby('id').apply(start_date_compare)
print (df)
id start_date end_date count_col
0 1 2016-07-27 16:07:00 2016-07-29 15:07:00 0
1 1 2016-10-20 08:10:00 2017-08-10 07:04:00 0
2 1 2016-12-08 10:12:00 2017-03-07 12:03:00 1
3 1 2017-07-16 11:07:00 2017-07-18 11:07:00 1
4 1 2017-07-16 16:07:00 2017-09-20 12:09:00 2
5 2 2016-07-27 16:07:00 2016-07-29 15:07:00 0
6 2 2016-10-20 08:10:00 2017-08-10 07:04:00 0
7 2 2016-12-08 10:12:00 2017-03-07 12:03:00 1
8 2 2017-07-16 11:07:00 2017-07-18 11:07:00 1
9 2 2017-07-16 16:07:00 2017-09-20 12:09:00 2