Question

我一直遇到这个问题，即必须为数据集中的每个组分配一个唯一的ID。我在RNN的零填充，生成图表以及许多其他场合时都使用了这个。

这通常可以通过连接每个resourceVersion列中的值来完成。但是，通常情况下，定义组的列数，它们的dtype或值大小使得连接成为不必要地耗尽内存的不切实际的解决方案。

我想知道是否有一种简单的方法可以为pandas中的组分配唯一的数字ID。

Answer 1

您只需要来自seeiespi（或ngroup）

的pd.factorize数据

df.groupby('C').ngroup()
Out[322]: 
0    0
1    0
2    2
3    1
4    1
5    1
6    1
7    2
8    2
dtype: int64

更多选项

pd.factorize(df.C)[0]
Out[323]: array([0, 0, 1, 2, 2, 2, 2, 1, 1], dtype=int64)
df.C.astype('category').cat.codes
Out[324]: 
0    0
1    0
2    2
3    1
4    1
5    1
6    1
7    2
8    2
dtype: int8

Answer 2

我管理了一个简单的解决方案，我经常参考并希望分享：

docRef.get()
    .then(function(doc) { 
        if ( doc.exists ) { db.collection("collection")
        .doc("document").update({ "array"[0]: true }) 
    .then(function() { 
              console.log("Changed !"); 
    }) 
} })

将唯一数字组ID分配给Pandas中的组

2 个答案: