更改多个变量的更快方法

时间:2019-05-12 17:22:34

标签: r

我是R and Stack的新手,所以请让我知道我可能无意忽略的礼节。

我有多个变量需要重新编码。他们是连续的。我一直在使用它,并尝试使用mutate(包括2:20来获取那些连续的变量。)但无法使其正常工作。 阿米尔是我的df

amer$ir1 <- recode(amer$ir01, "1 = 4; 2 = 3; 3 = 2; 4 = 1; 8 = NA; 9 = NA")
amer$ir02 <- recode(amer$ir02, "1 = 4; 2 = 3; 3 = 2; 4 = 1; 8 = NA; 9 = NA")
amer$ir03 <- recode(amer$ir03, "1 = 4; 2 = 3; 3 = 2; 4 = 1; 8 = NA; 9 = NA")
amer$t01 <- recode(amer$t01, "1 = 4; 2 = 3; 3 = 2; 4 = 1; 8 = NA; 9 = NA")
amer$t02 <- recode(amer$t02, "1 = 4; 2 = 3; 3 = 2; 4 = 1; 8 = NA; 9 = NA")
amer$t03 <- recode(amer$t03, "1 = 4; 2 = 3; 3 = 2; 4 = 1; 8 = NA; 9 = NA")
amer$t04 <- recode(amer$t04, "1 = 4; 2 = 3; 3 = 2; 4 = 1; 8 = NA; 9 = NA")
amer$m01 <- recode(amer$m01, "1 = 4; 2 = 3; 3 = 2; 4 = 1; 8 = NA; 9 = NA")
amer$m02 <- recode(amer$m02, "1 = 4; 2 = 3; 3 = 2; 4 = 1; 8 = NA; 9 = NA")
amer$m03 <- recode(amer$m03, "1 = 4; 2 = 3; 3 = 2; 4 = 1; 8 = NA; 9 = NA")

3 个答案:

答案 0 :(得分:2)

这应该有帮助,

amer <- data.frame(ir01 = 1:20, ir02 = 1:20, ir03 = 1:20)

library(memisc) # This is where recode is from
apply(amer, 2, function(x) recode(x, "1 = 4; 2 = 3; 3 = 2; 4 = 1; 8 = NA; 9 = NA"))

在运行apply函数时,为了保持数据框类使用@Rui Barradas,

amer[] <- apply(amer, 2, function(x) recode(x, "1 = 4; 2 = 3; 3 = 2; 4 = 1; 8 = NA; 9 = NA"))

这是假设您的数据看起来像

> amer
   ir01 ir02 ir03 ...
1     1    1    1 ...
2     2    2    2 ...
3     3    3    3 ...
4     4    4    4 ...
5     5    5    5 ...
6     6    6    6 ...
7     7    7    7 ...
8     8    8    8 ...
9     9    9    9 ...
10   10   10   10 ...
11   11   11   11 ...
12   12   12   12 ...
13   13   13   13 ...
14   14   14   14 ...
15   15   15   15 ...
16   16   16   16 ...
17   17   17   17 ...
18   18   18   18 ...
19   19   19   19 ...
20   20   20   20 ...

返回,

      ir01 ir02 ir03
 [1,]    4    4    4
 [2,]    3    3    3
 [3,]    2    2    2
 [4,]    1    1    1
 [5,]    5    5    5
 [6,]    6    6    6
 [7,]    7    7    7
 [8,]   NA   NA   NA
 [9,]   NA   NA   NA
[10,]   10   10   10
[11,]   11   11   11
[12,]   12   12   12
[13,]   13   13   13
[14,]   14   14   14
[15,]   15   15   15
[16,]   16   16   16
[17,]   17   17   17
[18,]   18   18   18
[19,]   19   19   19
[20,]   20   20   20

答案 1 :(得分:1)

您可以在向量recode中定义要更改的变量,并在lapply上定义ifelse,在其中进行一些算术运算。

假设此数据帧

head(df1)
#   ir01 ir02 dont.change.me
# 1    1    4              1
# 2    8    8              2
# 3    1    8              3
# 4    1    8              4
# 5    2    4              5
# 6    4    2              6

定义recode向量,

recode <- c("ir01", "ir02")

lapply在以下范围内的已定义列上:

df1[recode] <- lapply(df1[recode], function(x) ifelse(x %in% 8:9, NA, abs(x - 5)))
head(df1)
#   ir01 ir02 dont.change.me
# 1    4    1              1
# 2   NA   NA              2
# 3    4   NA              3
# 4    4   NA              4
# 5    3    1              5
# 6    1    3              6

看起来相反,只有那些应该改变!

因素?

有时候这些人是factors

df1$ir01 <- lapply(df1$ir01, as.factor)  # intentionally change `ir01` into factor
str(df1)
# 'data.frame': 20 obs. of  3 variables:
#  $ ir01          : Factor w/ 6 levels "1","2","3","4",..: 1 5 1 1 2 4 2 2 1 4 ...
#  $ ir02          : int  4 8 8 8 4 2 4 3 2 1 ...
#  $ dont.change.me: int  1 2 3 4 5 6 7 8 9 10 ...

我们可以扩展功能来实现它们:

df1[recode] <- lapply(df1[recode], 
                      function(x) {
                        if (is.factor(x))
                          x <- as.numeric(levels(x))[x]
                        ifelse(x %in% 8:9, NA, abs(x - 5))
                      })
head(df1)
#   ir01 ir02 dont.change.me
# 1    4    1              1
# 2   NA   NA              2
# 3    4   NA              3
# 4    4   NA              4
# 5    3    1              5
# 6    1    3              6

数据

df1 <- structure(list(ir01 = c(1L, 8L, 1L, 1L, 2L, 4L, 2L, 2L, 1L, 4L, 
                               1L, 8L, 9L, 4L, 2L, 2L, 3L, 1L, 1L, 3L), 
                      ir02 = c(4L, 8L, 8L, 8L, 4L, 2L, 4L, 3L, 2L, 1L, 
                               2L, 9L, 3L, 9L, 2L, 4L, 4L, 9L, 2L, 8L), 
                      dont.change.me = 1:20), class = "data.frame", 
                 row.names = c(NA, -20L))

答案 2 :(得分:0)

您可能还希望考虑针对此问题的data.table解决方案。它适用于大型数据集,其中可能有超过100,000行。我使用recode包中的car,因为它可以与data.table配合使用。使用memisc和以下recode_key语法时出现错误。无论如何,您可以将其全部放在一起:

library(data.table)
library(car)
amer <- data.table(ir01 = 1:20, ir02 = 1:20, ir03 = 1:20) #read data in as a data.table

recode_key<-c("1 = 4; 2 = 3; 3 = 2; 4 = 1; 8 = NA; 9 = NA") #modify this to add other recodes
recode_cols<-c("ir01","ir02") #If you want to only make changes to specific columns list them here

amer[,eval(recode_cols):=lapply(.SD,function(x) recode(x,recode_key)),.SDcols=recode_cols] #This will change the columns in the data.table

请注意,我使用eval来确保它没有创建名为recode_cols的新列!然后,使用特殊符号.SD,以便recode函数在data.table的列上进行迭代。如果要将重新编码应用于所有列,则可以将.SDcols参数保留为空白,然后删除eval(recode_cols):并从lapply开始。

最后要注意的是,我不需要将代码的最后一行分配给全局变量。 data.table之所以快是因为它会使用指针自动更新原始数据,因此不需要复制。但是,请小心,因为如果您运行最后一行代码两次,则将返回除NAs之外的其他内容。让我知道这种解释是否有意义。