为R中的变量生成标签

时间:2013-08-13 12:33:22

标签: r label

我正在寻找一种比这更好/更快的方法来为变量生成标签:

df <- data.frame(a=c(0,7,1,10,2,4,3,5,10,1,7,8,3,2))
pick <- c(0,1,2,3,10)
df[sapply(df$a,function(x) !(x %in% pick)),"a"] <- "a"
df[sapply(df$a,function(x) x==0),"a"] <- "b"
df[sapply(df$a,function(x) x==1 | x==2 | x==3),"a"] <- "c"
df[sapply(df$a,function(x) x==10),"a"] <- "d"

df$a
[1] "b" "a" "c" "d" "c" "a" "c" "a" "d" "c" "a" "a" "c" "c" 

为简单起见,我在这个例子中只有一个变量,当然我的数据集中有更多的变量,但我只想更改一个特定的变量。

4 个答案:

答案 0 :(得分:2)

您不需要sapply

df$a[!df$a %in% pick] <- "a"
df$a[df$a==0] <- "b"
df$a[df$a %in% 1:3] <- "c"
df$a[df$a==10] <- "d"

您还可以使用因子生成相同的结果:

df <- data.frame(a=c(0,7,1,10,2,4,3,5,10,1,7,8,3,2))

# the above method
a <- df$a
a[!df$a %in% pick] <- "a"
a[df$a==0] <- "b"
a[df$a %in% 1:3] <- "c"
a[df$a==10] <- "d"

# one way that gives a warning
b1 <- factor(df$a, levels=0:10, labels=c("b",rep("c",3),rep("a",6),"d"))

# another way that won't give a warning
b2 <- factor(df$a)
levels(b2) <- c("b",rep("c",3),rep("a",4),"d")
b2 <- as.character(b2)

# a third strategy using `library(car)`
b3 <- car::recode(df$a,"0='b';1:3='c';10='d';else='a'")

# check that all strategies are the same
all.equal(a,as.character(b1))
# [1] TRUE
all.equal(as.character(b1),as.character(b2))
# [1] TRUE
all.equal(as.character(b1),as.character(b3))
# [1] TRUE

答案 1 :(得分:2)

您可能还会考虑mapvalues中的revalueplyr,特别是如果您要处理更多标签:

df$a <- mapvalues(df$a, c(0, 1, 2, 3, 10), c("b", "c", "c", "c", "d"))
df$a[! df$a %in% c("b", "c", "d")] <- "a" # The !pick values

答案 2 :(得分:2)

这是另一个相当简单的解决方案:

names(pick) <- c("b", "c", "c", "c", "d")
x <- names(pick[match(df$a, pick)])
x[is.na(x)] <- "a"
x
# [1] "b" "a" "c" "d" "c" "a" "c" "a" "d" "c" "a" "a" "c" "c"

如果在“选择”对象中包含NA,则更为直接。

pick <- c(NA, 0, 1, 2, 3, 10)
names(pick) <- c("a", "b", "c", "c", "c", "d")
names(pick[match(df$a, pick, nomatch = 1)])
# [1] "b" "a" "c" "d" "c" "a" "c" "a" "d" "c" "a" "a" "c" "c"

如果您使用第二种替代方法,请注意nomatch采用您重新匹配的位置的整数值。在这里,nomatch映射到“NA”,它位于“pick”向量的第一个位置。如果“NA”位于最后位置,则应将其作为nomatch = 6输入。

答案 3 :(得分:0)

您还可以使用ifelse功能。

with(df,ifelse(a==0,"b",ifelse(a %in% c(1,2,3),"c",ifelse(a==10,"d","a"))))
 [1] "b" "a" "c" "d" "c" "a" "c" "a" "d" "c" "a" "a" "c" "c"