我已将月份列中的值之间的差异用于创建差异列。
data_2019['difference'] = data_2019.groupby('propertyId')['month'].diff()
现在我要执行以下操作:
对于在差异列中具有1的每一行,只要propertyId值与前一行相同,就将该行和前一行保持不变。
答案 0 :(得分:0)
这是您可以完成此操作的一种方法:
# True for the second row of two consecutive rows
data_2019['difference+'] = data_2019.groupby('propertyId')['month'].diff()==1
# True for the first row of two consecutive rows
data_2019['differenc-'] = data_2019.groupby('propertyId')['month'].diff(periods=-1)==-1
# 'keep' is True if a row is the first or the second or both
data_2019['keep'] = data_2019['difference+'] | data_2019['difference-']
Out:
propertyId month occ difference+ difference- keep
0 a111 3 80.0 False False False
1 a111 5 93.0 False True True
2 a111 6 94.0 True True True
3 a111 7 95.5 True False True
4 a111 10 88.0 False False False
5 b111 2 97.0 False True True
6 b111 3 99.0 True False True
7 c116 2 97.0 False False False
然后您可以将行保留在data_2019['keep']==True
data_2019 = data_2019[data_2019['keep']==True]
答案 1 :(得分:0)
您可以尝试以下方法。如果它不起作用,请告诉我
df['new_diff'] = df['difference'].shift(-1)
df['new_propertyid'] = df['propertyid'].shift(-1)
mask = ( df['difference']==1) | ((df['new_diff']==1) & df['new_propertyid']==df['propertyid'])
ans = df[mask]