Question

给出以下数据框：

col_1   col_2
False   1
False   1
False   1
False   1
False   1
False   1
False   1
False   1
False   1
False   1
False   1
False   1
False   1
False   1
False   2
True    2
False   2
False   2
True    2
False   2
False   2
False   2
False   2
False   2
False   2
False   2
False   2
False   2
False   2
False   2

如何创建新索引以帮助识别True中何时存在col_1值？也就是说，当在第一列中出现一个True值时，我想向后填充新列中的一个数字。例如，这是上述数据框的预期输出：

   col_1  col_2 new_id
    False   1   1
    False   1   1
    False   1   1
    False   1   1
    False   1   1
    False   1   1
    False   1   1
    False   1   1
    False   1   1
    False   1   1
    False   1   1
    False   1   1
    False   1   1
    False   1   1
    False   2   1
    True    2   1   --------- ^ (fill with 1 and increase the counter)
    False   2   2
    False   2   2
    True    2   2   --------- ^ (fill with 2 and increase the counter)
    False   2   3
    False   2   3
    False   2   3
    False   2   3
    False   2   3
    False   2   3
    False   2   3
    False   2   3
    False   2   3
    False   2   3
    False   2   3
    True    2   4   --------- ^ (fill with 3 and increase the counter)

问题是，尽管我知道熊猫提供了一个可以帮助实现此目的的填充对象，但我不知道如何创建id。到目前为止，我尝试使用一个简单的for循环进行迭代：

count = 0
for index, row in df.iterrows():
    if row['col_1'] == False:
        print(count+1)
    else:
        print(row['col_2'] + 1)

但是，我不知道如何将计数器增加到下一个数字。我也尝试创建一个函数，然后将其应用于数据框：

def create_id(col_1, col_2):
    counter = 0
    if col_1 == True and col_2.bool() == True:
        return counter + 1
    else:
        pass

但是，我失去了向后填充列的控制权。

Answer 1

只需使用cumsum

df['new_id']=(df.col_1.cumsum().shift().fillna(0)+1).astype(int)
df
Out[210]: 
    col_1  col_2  new_id
0   False      1       1
1   False      1       1
2   False      1       1
3   False      1       1
4   False      1       1
5   False      1       1
6   False      1       1
7   False      1       1
8   False      1       1
9   False      1       1
10  False      1       1
11  False      1       1
12  False      1       1
13  False      1       1
14  False      2       1
15   True      2       1
16  False      2       2
17  False      2       2
18   True      2       2
19  False      2       3
20  False      2       3
21  False      2       3
22  False      2       3
23  False      2       3
24  False      2       3
25  False      2       3
26  False      2       3
27  False      2       3
28  False      2       3
29  False      2       3

Answer 2

如果您打算将new_id列附加到数据框：

new_id=[]
counter=1
for index, row in df.iterrows():
    new_id+= [counter]
    if row['col_1']==True:
        counter+=1   
df['new_id']=new_id

在新的pandas列中创建基于两列的索引时出现问题？

2 个答案: