R将一列的特殊值替换为另一列的值

时间:2018-07-18 01:52:49

标签: r replace

我知道这可能是一个常见问题,但是我找不到将代码应用于问题的好方法:

我有一个包含两个颜色列的数据集,我想用colour.y中的相应值替换颜色列的“未知”。有时color和colour.y列不匹配,但我仍然想保留color的值,只替换未知值。

这里是示例:

   id  colour colour.y
1   1 unknown      red
2   2    blue     blue
3   2    blue     blue
4   3     red      red
5   4     red      red
6   4 unknown      red
7   4    blue     blue
8   5   green    green
9   5   green    green
10  5 unknown    green
11  6     red      red
12  6    blue     blue
13  6 unknown    green

这是代码:

id = c(1,2,2,3,4,4,4,5,5,5,6,6,6)
colour = c("unknown","blue","blue","red","red","unknown","blue","green","green","unknown","red","blue","unknown")
colour.y = c("red","green","blue","green","red","red","blue","blue","blue","green","red","blue","green")
data = data.frame(cbind(id,colour,colour.y))
data

谢谢!

2 个答案:

答案 0 :(得分:1)

我们可以使用base R来做到这一点。根据“颜色”列中“未知”的出现创建逻辑矢量。使用它可以将“ colour”和“ colour.y”中的元素子集化,并将“ colour”中的那些值替换为“ colour.y”中的相应元素。

i1 <- data$colour == 'unknown'
data$colour[i1] <- data$colour.y[i1]

或者更好的选择是data.table。转换为data.tablesetDT(data))后,请指定具有逻辑条件的i并将'colour.y'的值分配(:=)到'colour'

library(data.table)
setDT(data)[colour == 'unknown', colour := colour.y]

注意:为列设置character类比factor更好(在stringsAsFactors = FALSE构造中使用data.frame。如果我们确实需要factor类,然后在进行分配之前指定levels包括'colour.y'级别

数据

data <- data.frame(id,colour,colour.y, stringsAsFactors = FALSE)

答案 1 :(得分:0)

以R为底

data$colour[which(data$colour=="unknown")]<-data$colour.y[which(data$colour=="unknown")]