我是R and Stack的新手,所以请让我知道我可能无意忽略的礼节。
我有多个变量需要重新编码。他们是连续的。我一直在使用它,并尝试使用mutate(包括2:20来获取那些连续的变量。)但无法使其正常工作。 阿米尔是我的df
amer$ir1 <- recode(amer$ir01, "1 = 4; 2 = 3; 3 = 2; 4 = 1; 8 = NA; 9 = NA")
amer$ir02 <- recode(amer$ir02, "1 = 4; 2 = 3; 3 = 2; 4 = 1; 8 = NA; 9 = NA")
amer$ir03 <- recode(amer$ir03, "1 = 4; 2 = 3; 3 = 2; 4 = 1; 8 = NA; 9 = NA")
amer$t01 <- recode(amer$t01, "1 = 4; 2 = 3; 3 = 2; 4 = 1; 8 = NA; 9 = NA")
amer$t02 <- recode(amer$t02, "1 = 4; 2 = 3; 3 = 2; 4 = 1; 8 = NA; 9 = NA")
amer$t03 <- recode(amer$t03, "1 = 4; 2 = 3; 3 = 2; 4 = 1; 8 = NA; 9 = NA")
amer$t04 <- recode(amer$t04, "1 = 4; 2 = 3; 3 = 2; 4 = 1; 8 = NA; 9 = NA")
amer$m01 <- recode(amer$m01, "1 = 4; 2 = 3; 3 = 2; 4 = 1; 8 = NA; 9 = NA")
amer$m02 <- recode(amer$m02, "1 = 4; 2 = 3; 3 = 2; 4 = 1; 8 = NA; 9 = NA")
amer$m03 <- recode(amer$m03, "1 = 4; 2 = 3; 3 = 2; 4 = 1; 8 = NA; 9 = NA")
答案 0 :(得分:2)
这应该有帮助,
amer <- data.frame(ir01 = 1:20, ir02 = 1:20, ir03 = 1:20)
library(memisc) # This is where recode is from
apply(amer, 2, function(x) recode(x, "1 = 4; 2 = 3; 3 = 2; 4 = 1; 8 = NA; 9 = NA"))
在运行apply
函数时,为了保持数据框类使用@Rui Barradas,
amer[] <- apply(amer, 2, function(x) recode(x, "1 = 4; 2 = 3; 3 = 2; 4 = 1; 8 = NA; 9 = NA"))
这是假设您的数据看起来像
> amer
ir01 ir02 ir03 ...
1 1 1 1 ...
2 2 2 2 ...
3 3 3 3 ...
4 4 4 4 ...
5 5 5 5 ...
6 6 6 6 ...
7 7 7 7 ...
8 8 8 8 ...
9 9 9 9 ...
10 10 10 10 ...
11 11 11 11 ...
12 12 12 12 ...
13 13 13 13 ...
14 14 14 14 ...
15 15 15 15 ...
16 16 16 16 ...
17 17 17 17 ...
18 18 18 18 ...
19 19 19 19 ...
20 20 20 20 ...
返回,
ir01 ir02 ir03
[1,] 4 4 4
[2,] 3 3 3
[3,] 2 2 2
[4,] 1 1 1
[5,] 5 5 5
[6,] 6 6 6
[7,] 7 7 7
[8,] NA NA NA
[9,] NA NA NA
[10,] 10 10 10
[11,] 11 11 11
[12,] 12 12 12
[13,] 13 13 13
[14,] 14 14 14
[15,] 15 15 15
[16,] 16 16 16
[17,] 17 17 17
[18,] 18 18 18
[19,] 19 19 19
[20,] 20 20 20
答案 1 :(得分:1)
您可以在向量recode
中定义要更改的变量,并在lapply
上定义ifelse
,在其中进行一些算术运算。
假设此数据帧
head(df1)
# ir01 ir02 dont.change.me
# 1 1 4 1
# 2 8 8 2
# 3 1 8 3
# 4 1 8 4
# 5 2 4 5
# 6 4 2 6
定义recode
向量,
recode <- c("ir01", "ir02")
和lapply
在以下范围内的已定义列上:
df1[recode] <- lapply(df1[recode], function(x) ifelse(x %in% 8:9, NA, abs(x - 5)))
head(df1)
# ir01 ir02 dont.change.me
# 1 4 1 1
# 2 NA NA 2
# 3 4 NA 3
# 4 4 NA 4
# 5 3 1 5
# 6 1 3 6
看起来相反,只有那些应该改变!
因素?
有时候这些人是factors
,
df1$ir01 <- lapply(df1$ir01, as.factor) # intentionally change `ir01` into factor
str(df1)
# 'data.frame': 20 obs. of 3 variables:
# $ ir01 : Factor w/ 6 levels "1","2","3","4",..: 1 5 1 1 2 4 2 2 1 4 ...
# $ ir02 : int 4 8 8 8 4 2 4 3 2 1 ...
# $ dont.change.me: int 1 2 3 4 5 6 7 8 9 10 ...
我们可以扩展功能来实现它们:
df1[recode] <- lapply(df1[recode],
function(x) {
if (is.factor(x))
x <- as.numeric(levels(x))[x]
ifelse(x %in% 8:9, NA, abs(x - 5))
})
head(df1)
# ir01 ir02 dont.change.me
# 1 4 1 1
# 2 NA NA 2
# 3 4 NA 3
# 4 4 NA 4
# 5 3 1 5
# 6 1 3 6
数据
df1 <- structure(list(ir01 = c(1L, 8L, 1L, 1L, 2L, 4L, 2L, 2L, 1L, 4L,
1L, 8L, 9L, 4L, 2L, 2L, 3L, 1L, 1L, 3L),
ir02 = c(4L, 8L, 8L, 8L, 4L, 2L, 4L, 3L, 2L, 1L,
2L, 9L, 3L, 9L, 2L, 4L, 4L, 9L, 2L, 8L),
dont.change.me = 1:20), class = "data.frame",
row.names = c(NA, -20L))
答案 2 :(得分:0)
您可能还希望考虑针对此问题的data.table
解决方案。它适用于大型数据集,其中可能有超过100,000行。我使用recode
包中的car
,因为它可以与data.table
配合使用。使用memisc
和以下recode_key语法时出现错误。无论如何,您可以将其全部放在一起:
library(data.table)
library(car)
amer <- data.table(ir01 = 1:20, ir02 = 1:20, ir03 = 1:20) #read data in as a data.table
recode_key<-c("1 = 4; 2 = 3; 3 = 2; 4 = 1; 8 = NA; 9 = NA") #modify this to add other recodes
recode_cols<-c("ir01","ir02") #If you want to only make changes to specific columns list them here
amer[,eval(recode_cols):=lapply(.SD,function(x) recode(x,recode_key)),.SDcols=recode_cols] #This will change the columns in the data.table
请注意,我使用eval
来确保它没有创建名为recode_cols的新列!然后,使用特殊符号.SD
,以便recode
函数在data.table
的列上进行迭代。如果要将重新编码应用于所有列,则可以将.SDcols
参数保留为空白,然后删除eval(recode_cols):
并从lapply
开始。
最后要注意的是,我不需要将代码的最后一行分配给全局变量。 data.table
之所以快是因为它会使用指针自动更新原始数据,因此不需要复制。但是,请小心,因为如果您运行最后一行代码两次,则将返回除NAs
之外的其他内容。让我知道这种解释是否有意义。