我有一个数据框(df),包含CA,VT,NC,AZ,CAvalue,VTvalue,NCvalue,AZvalue等变量。
在Stata中,我可以使用foreach
命令和generate
个新变量:
foreach x in CA VT NC AZ {
gen `x'1 = 0
replace `x'1 = 1 if `x'value > 1
}
当我将此代码转换为R时,我发现它有问题。
这是我写的:
x=c("CA","VT","NC","AZ")
x_1=paste(x,"1",sep="")
m1=as.data.frame(matrix(0,ncol=length(x),nrow=NROW(df)))
colnames(m1)=x_1
虽然我在创建以“1”结尾的新变量时没有问题,但我不知道如何转换以“replace”开头的行。我尝试使用CAtime,VTtime,NCtime和AZtime创建另一个向量。但我不知道如何将它们合并到循环中而不写入四次。
更新: 最初,我的数据看起来像这样:
df=as.data.frame(matrix(runif(200,1,150),ncol=8,nrow=25))
name=c("CA","VT","NC","AZ","CAtime","VTtime", "NCtime","AZtime")
colnames(df)=name
然后我想在新的数据框m1中创建4个新变量CA1,VT1,NC1,AZ1:
x=c("CA","VT","NC","AZ")
x_1=paste(x,"1",sep="")
m1=as.data.frame(matrix(0,ncol=length(x),nrow=NROW(df)))
colnames(m1)=x_1
m1中所有变量值= 0。
然后,如果CAtime> 1,我希望CA1中的相应单元格= 1。这适用于所有四个变量CAtime,VTtime,NCtime,AZtime。我不想写四个循环,这就是我被卡住的原因。
答案 0 :(得分:4)
获取与您的说明匹配的示例数据集df
:
set.seed(1)
x <- c("CA","VT","NC","AZ")
df <- setNames(data.frame(replicate(8,sample(0:2,5,replace=TRUE),simplify=FALSE)),
c("CA","VT","NC","AZ","CAvalue","VTvalue","NCvalue","AZvalue"))
df
# CA VT NC AZ CAvalue VTvalue NCvalue AZvalue
#1 0 2 0 1 2 1 1 2
#2 1 2 0 2 0 0 1 2
#3 1 1 2 2 1 1 1 0
#4 2 1 1 1 0 2 0 2
#5 0 0 2 2 0 1 2 1
现在lapply
检查每个列的值是否为> 1
,并将其重新分配给新变量,并在末尾附加1
:
df[paste0(x,"1")] <- lapply(df[paste0(x,"value")], function(n) as.numeric(n > 1) )
df
# CA VT NC AZ CAvalue VTvalue NCvalue AZvalue CA1 VT1 NC1 AZ1
#1 0 2 0 1 2 1 1 2 1 0 0 1
#2 1 2 0 2 0 0 1 2 0 0 0 1
#3 1 1 2 2 1 1 1 0 0 0 0 0
#4 2 1 1 1 0 2 0 2 0 1 0 1
#5 0 0 2 2 0 1 2 1 0 0 1 0
答案 1 :(得分:3)
以下是使用set
中的data.table
的可能选项,这可以通过引用更新来提高效率。
library(data.table)
setDT(df)[,(x1):= NA]
x2 <- paste0(x, 'value')
indx <- match(x1, names(df))
for(j in seq_along(x2)){
set(df, i=NULL, j=indx[j], value=as.numeric(df[[x2[j]]]>1))
}
df
# CA VT NC AZ CAvalue VTvalue NCvalue AZvalue CA1 VT1 NC1 AZ1
#1: 0 2 0 1 2 1 1 2 1 0 0 1
#2: 1 2 0 2 0 0 1 2 0 0 0 1
#3: 1 1 2 2 1 1 1 0 0 0 0 0
#4: 2 1 1 1 0 2 0 2 0 1 0 1
#5: 0 0 2 2 0 1 2 1 0 0 1 0
假设我们需要另一个数据集中的新列,我们可以将结果子集化为一个。或者使用修改后的例子,
setDT(df1)
setDT(df2)
x2 <- paste0(x, 'time')
for(j in seq_along(x2)){
set(df2, i=NULL, j=j, value=as.numeric(df1[[x2[j]]] >1))
}
head(df2,4)
# CA1 VT1 NC1 AZ1
#1: 0 0 1 1
#2: 0 1 1 0
#3: 0 0 0 1
#4: 1 1 0 0
set.seed(1)
x <- c("CA","VT","NC","AZ")
x1 <- paste0(x, 1)
df <- setNames(data.frame(replicate(8,sample(0:2,5,replace=TRUE),
simplify=FALSE)),c("CA","VT","NC","AZ","CAvalue","VTvalue","NCvalue",
"AZvalue"))
set.seed(425)
df1 <- as.data.frame(matrix(rnorm(200,1,150),ncol=8,nrow=25))
name <- c("CA","VT","NC","AZ","CAtime","VTtime", "NCtime","AZtime")
colnames(df1) <- name
df2 <- as.data.frame(matrix(0,ncol=length(x),nrow=NROW(df1)))
colnames(df2) <- x1