增量ID基于另一列的值

时间:2017-02-17 10:30:08

标签: pandas

来自此DataFrame:

car_id    month
93829     September
27483     April
48372     October
93829     December
93829     March
48372     February
27483     March

如何为id添加基本上是新car的第三列,但是增量一列,如下所示:

car_id    month        new_incremental_car_id
93829     September    0
27483     April        1
48372     October      2
93829     December     0
93829     March        0
48372     February     2
27483     March        1

目前,我正在使用groupby('car_id')创建新的DataFrame,我在其中添加了一个增量列,然后使用car_id连接键将其连接回原始DataFrame。

实现这一目标是否有一种不那么麻烦,更直接的方法?

修改

我目前正在使用的代码:

cars_id = pd.DataFrame(list(car_sales.groupby('car_id')['car_id'].groups))
cars_id['car_short_id'] = cars_id.index
cars_id.set_index(0, inplace=True)
car_sales.join(cars_id, on='car_id', how='left')

2 个答案:

答案 0 :(得分:1)

除了pd.factorize你还可以

使用,map根据唯一值构建的字典。

In [959]: df.car_id.map({x: i for i, x in enumerate(df.car_id.unique())})
Out[959]:
0    0
1    1
2    2
3    0
4    0
5    2
6    1
Name: car_id, dtype: int64

或者,使用category类型和codes但不能使用相同的顺序。

In [954]: df.car_id.astype('category').cat.codes
Out[954]:
0    2
1    0
2    1
3    2
4    2
5    1
6    0
dtype: int8

答案 1 :(得分:1)

使用factorize方法:

In [49]: df['new_incremental_car_id'] = pd.factorize(df.car_id)[0].astype(np.uint16)

In [50]: df
Out[50]:
   car_id      month  new_incremental_car_id
0   93829  September                       0
1   27483      April                       1
2   48372    October                       2
3   93829   December                       0
4   93829      March                       0
5   48372   February                       2
6   27483      March                       1

In [51]: df.dtypes
Out[51]:
car_id                     int64
month                     object
new_incremental_car_id    uint16
dtype: object