Question

我有一个看起来像这样的DataFrame

   col1 col2 col3  col4 col5  
 0   0   1    0     1     1
 1   0   1    0     0     1

我想为每个0条目分配一个大于1的唯一正整数。

所以我想要一个看起来像这样的DataFrame

      col1 col2  col3  col4 col5    
    0  2    1     3     1    1
    1  4    1     5     6    1

整数不一定是有序的，只是积极而独特的。

Answer 1

np.arange(...).reshape(df.shape)生成一个df的数据帧，其中包含从2开始的连续整数。

df.where(df, ...)有效，因为您的数据框由二进制指示符（零和一）组成。它保留所有真值（即那些值），然后使用连续的numpy数组填充零。

# optional: inplace=True
>>> df.where(df, np.arange(start=2, stop=df.shape[0] * df.shape[1] + 2).reshape(df.shape))  
   col1  col2  col3  col4  col5
0     2     1     4     1     1
1     7     1     9    10     1

Answer 2

我认为您可以使用numpy.arange生成shape的唯一随机数，并用0生成的布尔掩码替换所有df == 0：

print df
   col1  col2  col3  col4  col5
0     0     1     0     1     1
1     0     1     0     0     1

print df == 0
   col1   col2  col3   col4   col5
0  True  False  True  False  False
1  True  False  True   True  False

print df.shape
(2, 5)

#count of integers
min_count = df.shape[0] * df.shape[1]
print min_count
10

#you need add 2, because omit 0 and 1
print np.arange(start=2, stop=min_count + 2).reshape(df.shape)
[[ 2  3  4  5  6]
 [ 7  8  9 10 11]]

#use integers from 2 to max count of values of df
df[ df == 0 ] = np.arange(start=2, stop=min_count + 2).reshape(df.shape)
print df
   col1  col2  col3  col4  col5
0     2     1     4     1     1
1     7     1     9    10     1

或者使用numpy.random.choice获取更大的唯一随机整数：

#count of integers
min_count = df.shape[0] * df.shape[1]
print min_count
10
#you can use bigger number in np.arange, e.g. 100, but minimal is min_count + 2
df[ df == 0 ] = np.random.choice(np.arange(2, 100), replace=False, size=df.shape)
print df
   col1  col2  col3  col4  col5
0    17     1    53     1     1
1    39     1    15    76     1

Answer 3

这可行，虽然它不是熊猫中最好的表现：

import random

MAX_INT = 100

for row in df:
    for col in row:
        if col == 0:
            col == random.randrange(1, MAX_INT)

像itertuples()这样的东西会更快，但如果不是很多数据，这很好。

Answer 4

df[df == 0] = np.random.choice(np.arange(2, df.size + 2), replace=False, size=df.shape)

这里很多很好的答案，但把它扔到那里。

replace表示样品是否有替代品。
np.arange来自（2，size of the df + 2）。它是2，因为您希望它大于1。
size必须与df的形状相同，所以我只使用df.shape

说明np.random.choice生成的数组值：

>>> np.random.choice(np.arange(2, df.size + 2), replace=False, size=df.shape)
array([[11,  4,  6,  5,  9],
       [ 7,  8, 10,  3,  2]])

请注意，它们都大于1并且都是唯一的。

在：

   col1  col2  col3  col4  col5
0     0     1     0     1     1
1     0     1     0     0     1

后：

   col1  col2  col3  col4  col5
0     9     1     7     1     1
1     6     1     3    11     1

使用唯一正整数填充DataFrame

4 个答案: