我知道这可能是一个常见问题,但是我找不到将代码应用于问题的好方法:
我有一个包含两个颜色列的数据集,我想用colour.y中的相应值替换颜色列的“未知”。有时color和colour.y列不匹配,但我仍然想保留color的值,只替换未知值。
这里是示例:
id colour colour.y
1 1 unknown red
2 2 blue blue
3 2 blue blue
4 3 red red
5 4 red red
6 4 unknown red
7 4 blue blue
8 5 green green
9 5 green green
10 5 unknown green
11 6 red red
12 6 blue blue
13 6 unknown green
这是代码:
id = c(1,2,2,3,4,4,4,5,5,5,6,6,6)
colour = c("unknown","blue","blue","red","red","unknown","blue","green","green","unknown","red","blue","unknown")
colour.y = c("red","green","blue","green","red","red","blue","blue","blue","green","red","blue","green")
data = data.frame(cbind(id,colour,colour.y))
data
谢谢!
答案 0 :(得分:1)
我们可以使用base R
来做到这一点。根据“颜色”列中“未知”的出现创建逻辑矢量。使用它可以将“ colour”和“ colour.y”中的元素子集化,并将“ colour”中的那些值替换为“ colour.y”中的相应元素。
i1 <- data$colour == 'unknown'
data$colour[i1] <- data$colour.y[i1]
或者更好的选择是data.table
。转换为data.table
(setDT(data)
)后,请指定具有逻辑条件的i
并将'colour.y'的值分配(:=
)到'colour'>
library(data.table)
setDT(data)[colour == 'unknown', colour := colour.y]
注意:为列设置character
类比factor
更好(在stringsAsFactors = FALSE
构造中使用data.frame
。如果我们确实需要factor
类,然后在进行分配之前指定levels
包括'colour.y'级别
data <- data.frame(id,colour,colour.y, stringsAsFactors = FALSE)
答案 1 :(得分:0)
以R为底
data$colour[which(data$colour=="unknown")]<-data$colour.y[which(data$colour=="unknown")]