我有以下名为cars
的数据框
Brand year mpg reputation Luxury
Honda 2010 30 8.5 0.5
Honda 2011 28 8.5 0.6
Dodge 2010 20 6.5 0.6
Dodge 2011 23 7.0 0.7
Mercedes 2010 22 9.5 NA
Mercedes 2011 25 9.0 NA
我想用0.9 and 1.0
之间的随机生成的实数代替NA
我正在尝试以下操作,但是它用数字0.9代替了NA
cars[is.na(cars)] <- sample(0.9:1, sum(is.na(cars)),replace=TRUE)
数据表将如下所示:
Brand year mpg reputation Luxury
Honda 2010 30 8.5 0.5
Honda 2011 28 8.5 0.6
Dodge 2010 20 6.5 0.6
Dodge 2011 23 7.0 0.7
Mercedes 2010 22 9.5 *0.91*
Mercedes 2011 25 9.0 *0.97*
数据结构代码:
cars <- structure(list(Brand = c("Honda","Honda", "Dodge", "Dodge","Mercedes","Mercedes"),
year = c(2010L, 2011L,2010L, 2011L, 2010L, 2011L),
mpg = c(30L, 28L, 20L, 23L, 22L, 25L), reputation = c(8.5, 8.5, 6.5, 7L, 9.5, 9.5), Luxury = c(5L, 5.5, 6L, 6.5)),
class = "data.frame", row.names = c(NA, -4L))
答案 0 :(得分:5)
使用runif
代替sample
:
cars[is.na(cars)] <- runif(sum(is.na(cars)), min = 0.9, max = 1)
答案 1 :(得分:4)
这是因为0.9:1
仅给您一个数字0.9。尝试
0.9:1
#[1] 0.9
因此,它将这些数字替换为0.9。
假设您需要序列
vals <- seq(0.9, 1, 0.01)
vals
#[1] 0.90 0.91 0.92 0.93 0.94 0.95 0.96 0.97 0.98 0.99 1.00
现在,我们可以按此顺序sample
df[is.na(df)] <- sample(vals, sum(is.na(df)), replace = TRUE)
df
# Brand year mpg reputation Luxury
#1 Honda 2010 30 8.5 5.00
#2 Honda 2011 28 8.5 5.50
#3 Dodge 2010 20 6.5 6.00
#4 Dodge 2011 23 7.0 6.50
#5 Mercedes 2010 22 9.5 0.91
#6 Mercedes 2011 25 9.0 0.92
数据
df <- structure(list(Brand = structure(c(2L, 2L, 1L, 1L, 3L, 3L),
.Label = c("Dodge",
"Honda", "Mercedes"), class = "factor"), year = c(2010L, 2011L,
2010L, 2011L, 2010L, 2011L), mpg = c(30L, 28L, 20L, 23L, 22L,
25L), reputation = c(8.5, 8.5, 6.5, 7, 9.5, 9), Luxury = c(5,
5.5, 6, 6.5, NA, NA)), class = "data.frame", row.names = c(NA, -6L))