将列值替换为随机值

时间:2019-07-10 05:32:55

标签: r dataframe

大家好,我试图用某个范围内的随机数值替换数据帧中的某个值。

下面提供了示例数据框,我想用0到0.1之间的值替换所有数字3。

df <- data.frame(datay = sample(1:5, 10, replace = TRUE), 
                 dataz = sample(1:10, 10, replace = TRUE))

输入:

   datay dataz
1      5     8
2      5     3
3      2     1
4      5    10
5      4     5
6      1     6
7      1     8
8      3     2
9      3     9
10     3     4

输出:

    datay dataz
1      5     8
2      5     0.05
3      2     1
4      5    10
5      4     5
6      1     6
7      1     8
8      0.05     2
9      0.02     9
10     0.01     4

2 个答案:

答案 0 :(得分:0)

我们可以基于'datay'中值3的出现创建逻辑索引,并用指定的sample中的seq替换

i1 <- df$datay == 3
df$datay[i1] <- sample(seq(0, 0.01, by = 0.001), sum(i1), replace = TRUE)
df
#   datay dataz
#1  1.000     o
#2  1.000     y
#3  1.000     y
#4  0.005     b
#5  1.000     b
#6  5.000     n
#7  4.000     q
#8  4.000     c
#9  2.000     a
#10 0.001     k

如果我们需要在多列上使用它(列名组成)

nm1 <- c("col1", "col2", "col3")
df[nm1] <- lapply(df[nm1], function(x) replace(x, i1, sample(seq(0, 0.01, 
             by = 0.001), sum(i1), replace = TRUE)))

或与tidyverse

library(tidyverse)
df %>%
      mutate_at(vars(nm1), ~ replace(., i1, sample(seq(0, 0.01, 
             by = 0.001), sum(i1), replace = TRUE)))

或者仅在数字列上应用

df %>%
    mutate_if(is.numeric, ~ replace(., datay == 3, sample(seq(0, 0.01, 
             by = 0.001), sum(i1), replace = TRUE)))

如果我们不想更改原始对象,请使用replace

transform(df, datay = replace(datay, i1, sample(seq(0, 0.01, 
             by = 0.001), sum(i1), replace = TRUE)))

另一个选项是runif

transform(df, datay = replace(datay, i1, runif(sum(i1), 0, 0.001)))

或使用data.table

library(data.table)
setDT(df)[datay == 3, datay := sample(seq(0, 0.01, by = 0.001), .N, replace = TRUE)]

答案 1 :(得分:0)

我们还可以使用runif生成两个值之间的随机数。

inds <- df$datay == 3
df$datay[inds] <- runif(sum(inds), 0, 0.001)

df
#      datay dataz
#1  0.000555     k
#2  5.000000     v
#3  4.000000     n
#4  2.000000     q
#5  1.000000     l
#6  2.000000     n
#7  0.000121     u
#8  0.000794     z
#9  1.000000     x
#10 2.000000     d

编辑

对于所有列,我们都可以做到

mat <- which(df == 3, arr.ind = TRUE)
#If you need only for selected columns say for first two columns do
#mat <- which(df[1:2] == 3, arr.ind = TRUE)
df[mat] <- runif(nrow(mat), 0, 0.001)

df
#      datay    dataz
#1  5.000000  8.00000
#2  5.000000  0.00078
#3  2.000000  1.00000
#4  5.000000 10.00000
#5  4.000000  5.00000
#6  1.000000  6.00000
#7  1.000000  8.00000
#8  0.000144  2.00000
#9  0.000965  9.00000
#10 0.000771  4.00000