我对熊猫很陌生,很抱歉没有太大意义。我对groupby按类别有一种感觉,但是我不确定如何在groupby中运行功能。
我想从Date1的给定行中查找日期,并查看是否具有相同ID的任何日期(在date2中)在7天内。
我考虑过通过拆分date1和date2的方式,但是我不确定从那里去哪里。
g1 = df[['Category', 'Date1']]
g2 = df[['Category', 'Date2']]
dif = pd.Timedelta(7, unit='D')
df['isDateWithin7Days'] = np.where((g1['Category'] == g2['Category'])(df['Date1'] > g2['Date2']-dif, True, False))
我收到此错误
ValueError:操作数不能与形状一起广播 (50537,)(3,)
df1:
category date1 date2
blue 1/1/2018
blue 1/2/2018
blue 1/5/2018
blue 2/1/2018
green 1/3/2018
green 1/1/2018
red 12/1/2018
red 11/1/2018
预期结果:
category date1 date2 isDateWithin7Days? EarliestDate?
blue 1/1/2018 True 1/2/2018
blue 2/1/2018 False 0
green 1/3/2018 False 0
red 12/1/2018 False 0
答案 0 :(得分:2)
IIUC,您正在尝试在date2
和category
的唯一组合的7天内找到date1
列中的日期-此代码返回True
如果找到任何这样的日期,则返回False
:
df['date1'] = pd.to_datetime(df['date1'], format = '%m-%d-%y')
df['date2'] = pd.to_datetime(df['date2'], format = '%m-%d-%y')
df1 = df.dropna(subset = ['date1']).drop(columns = ['date2'])
df2 = df.dropna(subset = ['date2']).drop(columns = ['date1'])
df3 = df1.merge(df2, on = 'category')
df3['date2'].between(df3['date1'] - pd.Timedelta(days=7), df3['date1'] + pd.Timedelta(days=7))
df3['isDateWithin7Days?'] = df3['date2'].between(df3['date1'] - pd.Timedelta(days=7), df3['date1'] + pd.Timedelta(days=7))
df3 = df3.groupby(['category', 'date1'])['isDateWithin7Days?'].sum().reset_index()
df3['isDateWithin7Days?'] = np.where(df3['isDateWithin7Days?'] > 0, True, False)
输出:
category date1 isDateWithin7Days?
0 blue 2018-01-01 True
1 blue 2018-02-01 False
2 green 2018-01-03 False
3 red 2018-12-01 False