Question

我正在尝试创建一个仅在与上一行不同或我按更改分组的ID不同时才更改值的计数器

假设我具有以下数据框：

ID Flag   New_Column
A   NaN     1
A   0       1
A   0       1
A   0       1
A   1       2
A   1       2
A   1       2
A   0       3
A   0       3
A   0       3
A   1       4
A   1       4
A   1       4
B   NaN     1
B   0       1

我想创建New_Column，每次Flag值更改时，我都会将New_Column加1，如果ID更改，它将重置为1并重新开始

这是我尝试使用np.select进行的操作，但是它不起作用

df['New_Column'] = None
df['Flag_Lag'] = df.sort_values(by=['ID', 'Date_Time'], ascending=True).groupby(['ID'])['Flag'].shift(1)
df['ID_Lag'] = df.sort_values(by=['ID', 'Date_Time'], ascending=True).groupby(['ID'])['ID'].shift(1)


conditions = [((df['Flag'] != df['Flag_Lag']) & (df['ID'] == df['ID_Lag'])), 
              ((df['Flag'] == df['Flag_Lag']) & (df['ID'] == df['ID_Lag'])), 
              ((df['Flag_Lag'] == np.nan) & (df['New_Column'].shift(1) == 1)), 
              ((df['ID'] != df['ID_Lag']))
             ]

choices = [(df['New_Column'].shift(1) + 1), 
           (df['New_Column'].shift(1)), 
           (df['New_Column'].shift(1)), 
            1]

df['New_Column'] = np.select(conditions, choices, default=np.nan)

使用此代码，New_Column的第一个值为1，第二个为NaN，其余为无

有人知道更好的方法吗？

Answer 1

按ID分组并使用的总和（当前不等于先前的值）

df['new'] =  df.groupby('ID') \ 
  apply(lambda x: x['Flag'].fillna(0).diff().ne(0).cumsum()).reset_index(level=0, drop=True)

   ID  Flag  New_Column  new
0   A   NaN           1    1
1   A   0.0           1    1
2   A   0.0           1    1
3   A   0.0           1    1
4   A   1.0           2    2
5   A   1.0           2    2
6   A   1.0           2    2
7   A   0.0           3    3
8   A   0.0           3    3
9   A   0.0           3    3
10  A   1.0           4    4
11  A   1.0           4    4
12  A   1.0           4    4
13  B   NaN           1    1
14  B   0.0           1    1

Answer 2

如果速度不是问题，并且您需要一些易于阅读的代码，则可以简单地遍历数据帧并为每行运行一个简单函数。

def f(row):
    global previous_ID, previous_flag, previous_count

    if previous_ID == False: #let's start the count
        row['New_Column'] = 1

    elif previous_ID != row['ID']: #let's start the count over
        row['New_Column'] = 1

    elif previous_flag == row['Flag']: #same ID, same Flag
        row['New_Column'] = previous_count

    else: #same ID, different Flag
        row['New_Column'] = previous_count + 1


    previous_ID = row['ID']
    previous_flag = row['Flag']
    previous_count = row['New_Column']

您应该用0填充NaN值，或者在函数中添加一个特殊情况。

您可以通过以下方式运行该功能：

previous_ID, previous_flag, previous_count = False, False, False
df['New_Columns'] = []

for i, row in df.iterrows():
    f(row)

就是这样。

如何在Python中按组更改值计数器

2 个答案: