使用一致性替换数据框中的值

时间:2014-03-07 18:09:32

标签: r dataframe data-manipulation

假设我有以下数据框:

tmp <- data.frame(
code = c("11","111","112"),
label = c("sector a","industry a1","industry a2"),
sector = c("11","11","11"),
industry = c("NA","111","112")
)

这样:

> tmp
  code       label sector industry
1   11    sector a     11       NA
2  111 industry a1     11      111
3  112 industry a2     11      112

我想创建一个带有扇区标签的变量。在这个简单的例子中,所有行业都属于同一部门,所以

> tmp$sector.alpha <- c(rep("sector a",3))

可以生成:

> tmp
  code       label sector industry sector.alpha
1   11    sector a     11       NA     sector a
2  111 industry a1     11      111     sector a
3  112 industry a2     11      112     sector a

但是假设一个更复杂的例子,其中有两个或更多个部门,每个部门有任意数量的行业。

如何生成正确的标签?

3 个答案:

答案 0 :(得分:1)

例如:

 ddply(tmp,.(sector),transform,sector.alpha=label[1])
  code       label sector industry sector.alpha
1   11    sector a     11       NA     sector a
2  111 industry a1     11      111     sector a
3  112 industry a2     11      112     sector a

更改一点数据以引入更多扇区:

tmp <- data.frame(
  code = c("11","111","112","121"),
  label = c("sector a","industry a1","industry a2","indstry 14"),
  sector = c("11","11","12","12"),
  industry = c("NA","111","112","212")
)

library(plyr)
ddply(tmp,.(sector),transform,sector.alpha=label[1])

 code       label sector industry sector.alpha
1   11    sector a     11       NA     sector a
2  111 industry a1     11      111     sector a
3  112 industry a2     12      112  industry a2
4  121  indstry 14     12      212  industry a2

答案 1 :(得分:0)

可以使用 cut 命令将数字变量转换为具有多个类别的分类变量。使用?cut命令的详细信息。让我们尝试以下代码。

x<-sample(0:100,10) #Generates random data between 0 and 100 of size 10

cat<-cut(x,breaks=c(0,40,50,60,70,80,100),labels=c("a","b","c","d","e","f"))

剪切命令中断您想要的变量,标签在中断中定义的相应类间隔。这可能有所帮助。您可以对数据框执行相同的操作

x<-sample(0:100,10)
y<-sample(200:300,10)
dat<-data.frame(x,y)
dat$cat<-cut(x,breaks=c(0,40,50,60,70,80,100),labels=c("a","b","c","d","e","f"))

答案 2 :(得分:-1)

这也有效:

tmp$sector.a <- tmp[match(tmp$sector,tmp$code),"label"]