Question

我想为重复的值创建唯一的标识符。重复的值仅为0。想法是将每个零转换为零及其位置（第一行为0 + 1，第二行为0 + 2，等等）。但是，问题在于该列还具有其他非重复值。

我已经编写了这一行代码，尝试按所述方式转换零值，但我收到了此错误消息

TypeError：ufunc'add'不包含签名匹配的循环类型dtype（'

这是我的代码

seller_customer['customer_id'] = np.where(seller_customer['customer_id']==0, seller_customer['customer_id'] + seller_customer.groupby(['customer_id']).cumcount().replace('0',''))

这是我的数据样本

{0: '7e468d618e16c6e1373fb2c4a522c969',
 1: '1c14a115bead8a332738c5d7675cca8c',
 2: '434dee65d973593dbb8461ba38202798',
 3: '4bbeac9d9a22f0628ba712b90862df28',
 4: '578d5098cbbe40771e1229fea98ccafd',
 5:  0,
 6:  0,
 7:  0}

Answer 1

如果我的理解正确，您可以将范围值分配给id的{{1}}：

输出：

df.loc[df['id']==0, 'id'] = np.arange((df['id']==0).sum()) + 1

print(df)

或更短但略慢：

                                 id
0  7e468d618e16c6e1373fb2c4a522c969
1  1c14a115bead8a332738c5d7675cca8c
2  434dee65d973593dbb8461ba38202798
3  4bbeac9d9a22f0628ba712b90862df28
4  578d5098cbbe40771e1229fea98ccafd
5                                 1
6                                 2
7                                 3

Answer 2

您可以执行以下操作：

    <!-- JSTL for JSP -->
    <dependency>
        <groupId>javax.servlet</groupId>
        <artifactId>jstl</artifactId>
    </dependency>

    <!-- For JSP compilation -->
    <dependency>
        <groupId>org.apache.tomcat.embed</groupId>
        <artifactId>tomcat-embed-jasper</artifactId>
        <scope>provided</scope>
    </dependency>

输出：

    from pandas.util import hash_pandas_object
    import numpy as np
    df.x = np.where(df.x == 0, hash_pandas_object(df.x), df.x)
    df

它们不会像索引一样顺序，但是它们是唯一的（几乎可以肯定，除非遇到哈希冲突）

将唯一标识符添加到pandas数据框中的重复值

2 个答案: