如何用r中下一列的值替换匹配

时间:2016-07-24 00:05:36

标签: r

我有一个数据框:

structure(list(City = structure(c(4L, 2L, 1L, 3L, 3L, 3L), .Label = c("Gold Cost", 
"Melbourne", "Other", "Sydney"), class = "factor"), Town = structure(c(1L, 
1L, 1L, 3L, 4L, 2L), .Label = c("", "Brighton", "Hurstville", 
"Penhurst"), class = "factor")), .Names = c("City", "Town"), class = "data.frame", row.names = c(NA, 
-6L))

我想替换名为City的列中包含名为Other的值的所有行,并将其替换为相同行的下一列中的值。

我的输出应如下所示:

structure(list(City = structure(c(6L, 4L, 2L, 3L, 5L, 1L), .Label = c("Brighton", 
"Gold Cost", "Hurstville", "Melbourne", "Penhurst", "Sydney"), class = "factor"), 
    Town = structure(c(1L, 1L, 1L, 3L, 4L, 2L), .Label = c("", 
    "Brighton", "Hurstville", "Penhurst"), class = "factor")), .Names = c("City", 
"Town"), class = "data.frame", row.names = c(NA, -6L))

我之前没有写过任何函数,但我猜它应该是这样的:

for(data1 in 1:nrow(data1)) {
        if(data1$City[i] == 'Other') {
                data1$city[i] <- data1$Town[i]
        } else {
                break
        }
}
  1. 我哪里出错了?
  2. 将来解决这类问题的思考过程应该是什么?
  3. 如何获得理想的结果?

1 个答案:

答案 0 :(得分:1)

有2个错误和2个效率低下。

错误1 : 您写了for(data1而不是for(i

错误2: factor您的课程为City,您正在尝试添加新级别。而是将此操作更改为character。否则,新因子级别将转换为NA。还有其他方法可以解决这个问题,但效率较低;你可以随时将它改回因子。

效率低下1: 您也不需要else声明。

低效率2:您可以在没有for循环的情况下完成此操作(即以矢量化方式)。

data1 <- structure(list(City = structure(c(4L, 2L, 1L, 3L, 3L, 3L), 
.Label = c("Gold Cost", "Melbourne", "Other", "Sydney"), class = "factor"), 
                        Town = structure(c(1L, 1L, 1L, 3L, 4L, 2L),
 .Label = c("", "Brighton", "Hurstville", "Penhurst"), class = "factor")), 
                   .Names = c("City", "Town"), class = "data.frame",
   row.names = c(NA, -6L))

desired_output <- structure(list(City = structure(c(6L, 4L, 2L, 3L, 5L, 1L),
 .Label = c("Brighton", "Gold Cost", "Hurstville", "Melbourne", "Penhurst", "Sydney"),
  class = "factor"), Town = structure(c(1L, 1L, 1L, 3L, 4L, 2L), 
 .Label = c("", "Brighton", "Hurstville", "Penhurst"), class = "factor")), 
  .Names = c("City", "Town"), class = "data.frame",
 row.names = c(NA, -6L))

data1$City <- as.character(data1$City)
data1$Town <- as.character(data1$Town)
for(i in 1:nrow(data1)){
  if(data1$City[i]=='Other'){
    data1$City[i]<- data1$Town[i]
  }
}

data1
        City       Town
1     Sydney           
2  Melbourne           
3  Gold Cost           
4 Hurstville Hurstville
5   Penhurst   Penhurst
6   Brighton   Brighton
data1 == desired_output
     City Town
[1,] TRUE TRUE
[2,] TRUE TRUE
[3,] TRUE TRUE
[4,] TRUE TRUE
[5,] TRUE TRUE
[6,] TRUE TRUE

现在为矢量化解决方案。通过避免使用循环,您的代码将以指数级更快的速度运行,并且您还必须输入更少的代码。

data1$City[data1$City == "Other"] <- data1$Town[data1$City == "Other"]