我有一个数据框:
structure(list(City = structure(c(4L, 2L, 1L, 3L, 3L, 3L), .Label = c("Gold Cost",
"Melbourne", "Other", "Sydney"), class = "factor"), Town = structure(c(1L,
1L, 1L, 3L, 4L, 2L), .Label = c("", "Brighton", "Hurstville",
"Penhurst"), class = "factor")), .Names = c("City", "Town"), class = "data.frame", row.names = c(NA,
-6L))
我想替换名为City的列中包含名为Other的值的所有行,并将其替换为相同行的下一列中的值。
我的输出应如下所示:
structure(list(City = structure(c(6L, 4L, 2L, 3L, 5L, 1L), .Label = c("Brighton",
"Gold Cost", "Hurstville", "Melbourne", "Penhurst", "Sydney"), class = "factor"),
Town = structure(c(1L, 1L, 1L, 3L, 4L, 2L), .Label = c("",
"Brighton", "Hurstville", "Penhurst"), class = "factor")), .Names = c("City",
"Town"), class = "data.frame", row.names = c(NA, -6L))
我之前没有写过任何函数,但我猜它应该是这样的:
for(data1 in 1:nrow(data1)) {
if(data1$City[i] == 'Other') {
data1$city[i] <- data1$Town[i]
} else {
break
}
}
答案 0 :(得分:1)
有2个错误和2个效率低下。
错误1 :
您写了for(data1
而不是for(i
。
错误2: factor
您的课程为City
,您正在尝试添加新级别。而是将此操作更改为character
。否则,新因子级别将转换为NA
。还有其他方法可以解决这个问题,但效率较低;你可以随时将它改回因子。
效率低下1:
您也不需要else
声明。
低效率2:您可以在没有for
循环的情况下完成此操作(即以矢量化方式)。
data1 <- structure(list(City = structure(c(4L, 2L, 1L, 3L, 3L, 3L),
.Label = c("Gold Cost", "Melbourne", "Other", "Sydney"), class = "factor"),
Town = structure(c(1L, 1L, 1L, 3L, 4L, 2L),
.Label = c("", "Brighton", "Hurstville", "Penhurst"), class = "factor")),
.Names = c("City", "Town"), class = "data.frame",
row.names = c(NA, -6L))
desired_output <- structure(list(City = structure(c(6L, 4L, 2L, 3L, 5L, 1L),
.Label = c("Brighton", "Gold Cost", "Hurstville", "Melbourne", "Penhurst", "Sydney"),
class = "factor"), Town = structure(c(1L, 1L, 1L, 3L, 4L, 2L),
.Label = c("", "Brighton", "Hurstville", "Penhurst"), class = "factor")),
.Names = c("City", "Town"), class = "data.frame",
row.names = c(NA, -6L))
data1$City <- as.character(data1$City)
data1$Town <- as.character(data1$Town)
for(i in 1:nrow(data1)){
if(data1$City[i]=='Other'){
data1$City[i]<- data1$Town[i]
}
}
data1
City Town 1 Sydney 2 Melbourne 3 Gold Cost 4 Hurstville Hurstville 5 Penhurst Penhurst 6 Brighton Brighton
data1 == desired_output
City Town [1,] TRUE TRUE [2,] TRUE TRUE [3,] TRUE TRUE [4,] TRUE TRUE [5,] TRUE TRUE [6,] TRUE TRUE
现在为矢量化解决方案。通过避免使用循环,您的代码将以指数级更快的速度运行,并且您还必须输入更少的代码。
data1$City[data1$City == "Other"] <- data1$Town[data1$City == "Other"]