获得了包含大量变量的数据框(82),其中许多用于进一步计算。所以我试图转换为数字,但有一个巨大的工作猜测每个变量的不同值,然后分配数字。
我想知道是否有更自动化的方法,因为我不关心将哪个数字分配给任何值,因为它不会重复。
到目前为止我的方法(为了清晰起见,虚拟数据):
df <- data.frame(original.var1 = c("display","memory","software","display","disk","memory"),
original.var2 = c("skeptic","believer","believer","believer","skeptic","believer"),
original.var3 = c("round","square","triangle","cube","sphere","hexagon"),
original.var4 = c(10,20,30,40,50,60))
考虑到这个工作正常
library(dplyr)
library(magrittr)
df$NEW1 <- as.numeric(interaction(df$original.var1, drop=TRUE))
我试图以这种方式适应dplyr和管道
df %<>% mutate(VAR1= as.numeric(interaction(original.var1, drop=TRUE))) %>%
mutate(VAR2= as.numeric(interaction(original.var2, drop=TRUE))) %>%
mutate(VAR3= as.numeric(interaction(original.var2, drop=TRUE)))
但是前面的第三个VAR结果出错了
df %>% dplyr::group_by(original.var1,VAR1) %>% tally()
# A tibble: 4 x 3
# Groups: original.var1 [?]
original.var1 VAR1 n
<fctr> <dbl> <int>
1 disk 1 1
2 display 2 2
3 memory 3 2
4 software 4 1
> df %>% dplyr::group_by(original.var2,VAR2) %>% tally()
# A tibble: 2 x 3
# Groups: original.var2 [?]
original.var2 VAR2 n
<fctr> <dbl> <int>
1 believer 1 4
2 skeptic 2 2
> df %>% dplyr::group_by(original.var3,VAR3) %>% tally()
# A tibble: 6 x 3
# Groups: original.var3 [?]
original.var3 VAR3 n
<fctr> <dbl> <int>
1 cube 1 1
2 hexagon 1 1
3 round 2 1
4 sphere 2 1
5 square 1 1
6 triangle 1 1
重新编码的任何方法或包没有先前声明的映射?
答案 0 :(得分:1)
使用purrr仅保留factor列并对其进行操作。最后用数字合并。
df %>% purrr::keep(is.factor) %>% mutate_all(funs(as.numeric(interaction(., drop = TRUE))))
答案 1 :(得分:1)
您可以使用mutate_if
,
library(dplyr)
mutate_if(df, is.factor, funs(as.numeric(interaction(., drop = TRUE))))
给出,
original.var1 original.var2 original.var3 original.var4 1 2 2 3 10 2 3 1 5 20 3 4 1 6 30 4 2 1 1 40 5 1 2 4 50 6 3 1 2 60
或者,您可以使用stringsAsFactors = FALSE
阅读您的数据框并使用is.character
,但这是相同的事情
要解决您的评论,如果您还想保留原始列,那么
mutate_if(df, is.factor, funs(new = as.numeric(interaction(., drop = TRUE))))