答案 0 :(得分:3)
IIUC,尝试:
df_out = df.assign(Item_cnt=(df['Item'] != df['Item'].shift()).cumsum())\
.drop_duplicates(['Item','Item_cnt'], keep='last')
df_out['delta T'] = df_out['datetime'] - df_out.groupby((df_out['Item'] == 'apples').cumsum())['datetime'].transform('first')
输出:
Item datetime Item_cnt delta T
2 apples 1.2 1 0.0
3 oranges 2.3 2 1.1
4 apples 2.5 3 0.0
5 bananas 2.7 4 0.2
详细信息:
使用cumsum创建分组并检查下一行是否不同,然后使用drop_duplicates保留该组中的最后一条记录。
答案 1 :(得分:2)
IIUC,
df = pd.DataFrame({'Item' : ['apples', 'apples','apples','orange','apples','bananas'],
'dateTime' : [1,1.1,1.2,2.3,2.5,2.7]})
s = df.copy()
s['dateTime'] = s['dateTime'].round()
idx = s.drop_duplicates(subset=['Item','dateTime'],keep='last').index.tolist()
df = df.loc[idx]
df.loc[df['Item'].ne('apples'), 'delta'] = abs(df['dateTime'].shift() - df['dateTime'])
print(df.fillna(0))
Item dateTime delta
2 apples 1.2 0.0
3 orange 2.3 1.1
4 apples 2.5 0.0
5 bananas 2.7 0.2
答案 2 :(得分:1)
这是df:
df = pd.DataFrame.from_dict({'Item':
['apples', 'apples', 'apples', 'oranges', 'apples', 'bananas'],
'dateTime':[1, 1.1, 1.2, 2.3, 2.5, 2.7]})
您不能使用重复版本,因为您需要保留同一项目的多个副本,因此请尝试以下操作:
df['Item_lag'] = df['Item'].shift(-1)
df = df[df['Item'] != df['Item_lag']] # get rid of repeated Items
df['deltaT'] = df['dateTime'] - df['dateTime'].shift(1).fillna(0) # calculate time diff
df.drop(['dateTime', 'Item_lag'], axis=1, inplace=True) # drop extra columns
df # display df
out:
Item deltaT
apples 1.2
oranges 1.1
apples 0.2
bananas 0.2