Question

我正在尝试生成一个pandas数据框，其中一列将具有基于另一个数据框中一列的值的数值。下面是一个示例：我想基于数据框df _

的列生成另一个数据框

ipdb> df_ = pd.DataFrame({'c1':[False, True, False, True]})
ipdb> df_
      c1
0  False
1   True
2  False
3   True

使用df_会生成另一个具有以下列的数据帧df1。

ipdb> df1
   col1  col2
0     0   NaN
1     1   0
2     2   NaN
3     3   1

在这里，'col1'具有正常的索引值，而'c1'在df_中存在False的行中具有NaN，并在'c1'为True的情况下依次递增值。

要生成此数据框，下面是我尝试过的操作。

ipdb> df_[df_['c1']==True].reset_index().reset_index()
   level_0  index    c1
0        0      1  True
1        1      3  True

但是，我认为应该像df1一样，有一种更好的方法来生成包含两列的数据框。

Answer 1

我认为您需要cumsum并从1中减去0才能开始计数：

df_ = pd.DataFrame({'c1':[False, True, False, True]})

df_['col2'] = df_.loc[df_['c1'], 'c1'].cumsum().sub(1)
print (df_)
      c1  col2
0  False   NaN
1   True   0.0
2  False   NaN
3   True   1.0

另一种解决方案是通过True和numpy.arange对sum个值的出现次数进行计数，然后分配回已过滤的DataFrame：

df_.loc[df_['c1'],'col2']= np.arange(df_['c1'].sum())
print (df_)
      c1  col2
0  False   NaN
1   True   0.0
2  False   NaN
3   True   1.0

详细信息：

print (df_['c1'].sum())
2

print (np.arange(df_['c1'].sum()))
[0 1]

Answer 2

另一种解决方法，

df.loc[df['c1'],'col2']=range(len(df[df['c1']]))

输出：

      c1  col2
0  False   NaN
1   True   0.0
2  False   NaN
3   True   1.0

熊猫：生成一个数据框列，其值取决于数据框的另一列

2 个答案: