Question

我已将月份列中的值之间的差异用于创建差异列。

data_2019['difference'] = data_2019.groupby('propertyId')['month'].diff()

current state

现在我要执行以下操作：

对于在差异列中具有1的每一行，只要propertyId值与前一行相同，就将该行和前一行保持不变。

desired state

Answer 1

这是您可以完成此操作的一种方法：

# True for the second row of two consecutive rows
data_2019['difference+'] = data_2019.groupby('propertyId')['month'].diff()==1

 # True for the first row of two consecutive rows
data_2019['differenc-'] = data_2019.groupby('propertyId')['month'].diff(periods=-1)==-1

# 'keep' is True if a row is the first or the second or both
data_2019['keep'] = data_2019['difference+'] | data_2019['difference-']


Out:

    propertyId  month   occ     difference+ difference- keep
0   a111        3       80.0    False       False       False
1   a111        5       93.0    False       True        True
2   a111        6       94.0    True        True        True
3   a111        7       95.5    True        False       True
4   a111        10      88.0    False       False       False
5   b111        2       97.0    False       True        True
6   b111        3       99.0    True        False       True
7   c116        2       97.0    False       False       False

然后您可以将行保留在data_2019['keep']==True

data_2019 = data_2019[data_2019['keep']==True]

Answer 2

您可以尝试以下方法。如果它不起作用，请告诉我


df['new_diff'] = df['difference'].shift(-1)
df['new_propertyid'] = df['propertyid'].shift(-1)

mask = ( df['difference']==1) | ((df['new_diff']==1) & df['new_propertyid']==df['propertyid'])

ans = df[mask]

如何基于熊猫中的两个列值对一行和上一行进行切片？

2 个答案: