我在R中有一个data.frame,如下所示:
fruits
X1 X2 X3
aa kiwi 15
ba orange 25
cc lemon 23
ba apple 17
cc lemon 19
cc orange 18
cc orange 21
ba banana 17
我想将“橙色”和“柠檬”中的所有值替换为“其他”。怎么在R?
示例数据:
fruits <- structure(list(X1 = structure(c(1L, 2L, 3L, 2L, 3L, 3L, 3L, 2L
), .Label = c("aa", "ba", "cc"), class = "factor"), X2 = structure(c(3L,
5L, 4L, 1L, 4L, 5L, 5L, 2L), .Label = c("apple", "banana", "kiwi",
"lemon", "orange"), class = "factor"), X3 = c(15L, 25L, 23L,
17L, 19L, 18L, 21L, 17L)), .Names = c("X1", "X2", "X3"), class = "data.frame", row.names = c(NA,
-8L))
答案 0 :(得分:5)
首先创建一个指示要更改的行的变量。你可以这样做,例如像这样:
shouldBecomeOther<-!(fruits$X2 %in% c("orange", "lemon"))
然后使用该索引器:
fruits$X2[shouldBecomeOther]<- "other"
请注意,如果列是一个因素(极有可能),则需要更多工作,如下所示:
tmp<-as.character(fruits$x2)
tmp[shouldBecomeOther]<-"other"
fruits$x2<-factor(tmp)
答案 1 :(得分:2)
一种简单的方法是将因子强制转换为字符向量,然后确定哪些元素不在必需的类中,并用"other"
替换它们,最后强制回到一个因子。
此主题有两种变体,第一种使用replace()
功能:
transform(fruits,
X2 = factor(replace(as.character(X2),
list = !X2 %in% c("orange","lemon"),
values = "other")))
给出:
> transform(fruits, X2 = factor(replace(as.character(X2),
+ list = !X2 %in% c("orange","lemon"),
+ values = "other")))
X1 X2 X3
1 aa other 15
2 ba orange 25
3 cc lemon 23
4 ba other 17
5 cc lemon 19
6 cc orange 18
7 cc orange 21
8 ba other 17
或者你可以手工完成:
fruits <- transform(fruits,
X2 = {x <- as.character(X2)
x[!x %in% c("orange","lemon")] <- "other"
factor(x)})
> fruits
X1 X2 X3
1 aa other 15
2 ba orange 25
3 cc lemon 23
4 ba other 17
5 cc lemon 19
6 cc orange 18
7 cc orange 21
8 ba other 17
我在这里使用transform()
,以便我们在X2
可见的环境中进行操作,而不必使用fruits$X2
之类的东西来输入。
答案 2 :(得分:1)
怎么样:
R> fruits = data.frame(X1 = 1:3, X2 = c("kiwi", "orange", "lemon"))
R> fruits$X2 = as.character(fruits$X2)
R> fruits[!(fruits$X2 %in% c("lemon", "orange")),]$X2 = "Other"
R> fruits
X1 X2
1 1 Other
2 2 orange
3 3 lemon
在上面的解决方案中,我将因子转换为“字符”。你不必这样做,你也可以:
read.csv
,请使用stringsAsFactors 您直接使用因素:
R> fruits$X2 = factor(fruits$X2, levels = c(as.character(fruits$X2), "Other"))
R> fruits[!(fruits$X2 %in% c("lemon", "orange")),]$X2 = "Other"
R> fruits
X1 X2
1 1 Other
2 2 orange
3 3 lemon
请注意,我扩展了第1行中第一个因子的级别。