Question

在我的工作流程中，有多个CSV，其中包含四列OID, value, count, unique_id。我想知道如何在unique_id列下生成增量值。使用apply()，我可以执行df.apply(lambda x : x + 1) #where x = 0之类的操作，这会导致unique_id下的所有值都为1.但是，我对如何使用apply()逐步生成感到困惑特定列的每一行中的值。

# Current Dataframe 
   OID  Value  Count  unique_id
0   -1      1      5          0
1   -1      2     46          0
2   -1      3     32          0
3   -1      4      3          0
4   -1      5     17          0

# Trying to accomplish
   OID  Value  Count  unique_id
0   -1      1      5          0
1   -1      2     46          1
2   -1      3     32          2
3   -1      4      3          3
4   -1      5     17          4

示例代码（我理解语法不正确，但它大致是我想要完成的）：

def numbers():
    for index, row in RG_Res_df.iterrows():
        return index

RG_Res_df = RG_Res_df['unique_id'].apply(numbers)

Answer 1

不要循环你可以直接分配一个numpy数组来生成id，这里使用np.arange并传递df.shape[0]

的行数

In [113]:
df['unique_id'] = np.arange(df.shape[0])
df

Out[113]:
   OID  Value  Count  unique_id
0   -1      1      5          0
1   -1      2     46          1
2   -1      3     32          2
3   -1      4      3          3
4   -1      5     17          4

或使用RangeIndex的纯pandas方法，此处默认start为0，因此我们只需要传递stop=df.shape[0]：

In [114]:
df['unique_id'] = pd.RangeIndex(stop=df.shape[0])
df

Out[114]:
   OID  Value  Count  unique_id
0   -1      1      5          0
1   -1      2     46          1
2   -1      3     32          2
3   -1      4      3          3
4   -1      5     17          4

Pandas Dataframe - 生成增量值

1 个答案: