假设我有以下数据框:
tmp <- data.frame(
code = c("11","111","112"),
label = c("sector a","industry a1","industry a2"),
sector = c("11","11","11"),
industry = c("NA","111","112")
)
这样:
> tmp
code label sector industry
1 11 sector a 11 NA
2 111 industry a1 11 111
3 112 industry a2 11 112
我想创建一个带有扇区标签的变量。在这个简单的例子中,所有行业都属于同一部门,所以
> tmp$sector.alpha <- c(rep("sector a",3))
可以生成:
> tmp
code label sector industry sector.alpha
1 11 sector a 11 NA sector a
2 111 industry a1 11 111 sector a
3 112 industry a2 11 112 sector a
但是假设一个更复杂的例子,其中有两个或更多个部门,每个部门有任意数量的行业。
如何生成正确的标签?
答案 0 :(得分:1)
例如:
ddply(tmp,.(sector),transform,sector.alpha=label[1])
code label sector industry sector.alpha
1 11 sector a 11 NA sector a
2 111 industry a1 11 111 sector a
3 112 industry a2 11 112 sector a
更改一点数据以引入更多扇区:
tmp <- data.frame(
code = c("11","111","112","121"),
label = c("sector a","industry a1","industry a2","indstry 14"),
sector = c("11","11","12","12"),
industry = c("NA","111","112","212")
)
library(plyr)
ddply(tmp,.(sector),transform,sector.alpha=label[1])
code label sector industry sector.alpha
1 11 sector a 11 NA sector a
2 111 industry a1 11 111 sector a
3 112 industry a2 12 112 industry a2
4 121 indstry 14 12 212 industry a2
答案 1 :(得分:0)
可以使用 cut 命令将数字变量转换为具有多个类别的分类变量。使用?cut命令的详细信息。让我们尝试以下代码。
x<-sample(0:100,10) #Generates random data between 0 and 100 of size 10
cat<-cut(x,breaks=c(0,40,50,60,70,80,100),labels=c("a","b","c","d","e","f"))
剪切命令中断您想要的变量,标签在中断中定义的相应类间隔。这可能有所帮助。您可以对数据框执行相同的操作
x<-sample(0:100,10)
y<-sample(200:300,10)
dat<-data.frame(x,y)
dat$cat<-cut(x,breaks=c(0,40,50,60,70,80,100),labels=c("a","b","c","d","e","f"))
答案 2 :(得分:-1)
这也有效:
tmp$sector.a <- tmp[match(tmp$sector,tmp$code),"label"]