我有数据框DF
:
DF <- data.frame(V1 = factor(c("Yes", "No", "Yes", "No", "No")),
V2 = factor(c("Yes", "No", "No", "Yes", "No")),
Location = factor(c("London", "Paris", "No", "Dallas", "No")),
V3 = factor(c("No", "Yes", "No", "No", "No")),
V4 = factor(c("No", "Yes", "No", "No", "No")))
我想将变量"No"
,V1
,V2
和V3
(而不是V4
)中的值Location
更改为"X"
。我可以轻松地在每一列中手动更改级别的名称,但这在大型数据集中非常耗时。但是,如果我使用revalue
,那么每个我想保持不变的"No"
,包括Location
中的那些,都将更改为"X"
:
library("plyr")
as.data.frame(lapply(DF, function(x) { revalue(x, c("No"="X")) }))
是否有一种方法可以根据变量在数据集中的位置(此处为列1:2和4:5)来指定变量,以对其进行重命名?
答案 0 :(得分:2)
另一种使用dplyrs quosure style lambda ~ fun(.)
作为.funs
参数并结合forcats :: fct_recode:
library("dplyr")
library("forcats")
(DF <- DF %>%
mutate_at(vars(-Location), ~fct_recode(., "X" = "No")))
# V1 V2 Location V3 V4
# 1 Yes Yes London X X
# 2 X X Paris Yes Yes
# 3 Yes X No X X
# 4 X Yes Dallas X X
# 5 X X No X X
dplyr 1.0的更新:
新的across()
取代了mutate_at
之类的“作用域变体”。
across()
可以轻松地将相同的转换应用于多个列,从而允许您在summarise()和mutate()中使用select()语义
此处适用于问题,以下是实现此目的的两种变体:
DF %>%
mutate(across((!Location), ~fct_recode(., "X" = "No")))
DF %>%
mutate(across(c(1:2,4:5), ~fct_recode(., "X" = "No")))
答案 1 :(得分:1)
也许有人可以提出一个更优雅的解决方案,但是以下一种可行的解决方案(无需手动更改每个变量)如下:
change.vec = c("V1", "V2", "V3", "V4")
for(i in 1:length(change.vec)) {
levels(DF[,change.vec[i]]) = c("X", "Yes")
}
>DF
V1 V2 Location V3 V4
1 Yes Yes London X X
2 X X Paris Yes Yes
3 Yes X No X X
4 X Yes Dallas X X
5 X X No X X
答案 2 :(得分:1)
只需指定要将revalue
函数应用于的列号:
cols_to_update <- c(1:2,4:5)
DF[, cols_to_update] <- lapply(DF[,cols_to_update], function(x) plyr::revalue(x, c("No"="X")))
答案 3 :(得分:1)
您也可以使用tidyverse方法进行此操作:
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
library(plyr)
#> -------------------------------------------------------------------------
#> You have loaded plyr after dplyr - this is likely to cause problems.
#> If you need functions from both plyr and dplyr, please load plyr first, then dplyr:
#> library(plyr); library(dplyr)
#> -------------------------------------------------------------------------
#>
#> Attaching package: 'plyr'
#> The following objects are masked from 'package:dplyr':
#>
#> arrange, count, desc, failwith, id, mutate, rename, summarise,
#> summarize
DF <- data.frame(V1 = factor(c("Yes", "No", "Yes", "No", "No")),
V2 = factor(c("Yes", "No", "No", "Yes", "No")),
Location = factor(c("London", "Paris", "No", "Dallas", "No")),
V3 = factor(c("No", "Yes", "No", "No", "No")),
V4 = factor(c("No", "Yes", "No", "No", "No")))
(DF <- DF %>%
mutate_at(.vars = vars(-Location),
.funs = function(t) revalue(x = t,
replace = c("No" = "X"))))
#> V1 V2 Location V3 V4
#> 1 Yes Yes London X X
#> 2 X X Paris Yes Yes
#> 3 Yes X No X X
#> 4 X Yes Dallas X X
#> 5 X X No X X
由reprex package(v0.2.1)于2019-03-17创建