在数据框中为已观察但未按因子显式记录的变量创建行

时间:2013-07-17 05:21:57

标签: r plyr reshape

我创建了一个变量,根据数据框架将物种群体描述为国内,野生或异国情调,其中每一行代表在独特地点(siteID)中找到的物种。我想通过每个siteID将行插入到我的数据框中,以便为在该站点上未观察到的组报告0。换句话说,这就是我拥有的数据框架:

df.start <- data.frame(species = c("dog","deer","toucan","dog","deer","toucan"), 
    siteID = c("a","b","b","c","c","c"), 
    group = c("domestic", "wild", "exotic", "domestic", "wild", "exotic"), 
    value = c(2:7))

df.start
#   species siteID    group value
# 1     dog      a domestic     2
# 2    deer      b     wild     3
# 3  toucan      b   exotic     4
# 4     dog      c domestic     5
# 5    deer      c     wild     6
# 6  toucan      c   exotic     7

这是我想要的数据框:

df.end <-data.frame(species=c("dog","NA","NA","NA","deer",
                              "toucan","dog","deer","toucan"),
    siteID = c("a","a","a","b","b","b","c","c","c"),
    group = rep(c("domestic", "wild", "exotic"),3), 
    value = c(2,0,0,0,3,4,5,6,7))

df.end
#   species siteID    group value
# 1     dog      a domestic     2
# 2      NA      a     wild     0
# 3      NA      a   exotic     0
# 4      NA      b domestic     0
# 5    deer      b     wild     3
# 6  toucan      b   exotic     4
# 7     dog      c domestic     5
# 8    deer      c     wild     6
# 9  toucan      c   exotic     7

这是因为我想使用plyr函数按组总结平均值,并且我意识到某些组网站组合缺少零并且夸大了我的估计。也许我错过了一个更明显的解决方法?

2 个答案:

答案 0 :(得分:1)

使用基本R函数:

result <- merge(  
  with(df.start, expand.grid(siteID=unique(siteID),group=unique(group))),
  df.start,
  by=c("siteID","group"),
  all.x=TRUE
)
result$value[is.na(result$value)] <- 0

> result
  siteID    group species value
1      a domestic     dog     2
2      a   exotic    <NA>     0
3      a     wild    <NA>     0
4      b domestic    <NA>     0
5      b   exotic  toucan     4
6      b     wild    deer     3
7      c domestic     dog     5
8      c   exotic  toucan     7
9      c     wild    deer     6

答案 1 :(得分:1)

 df.sg <- data.frame(xtabs(value~siteID+group, data=df.start))
 merge(df.start[-4], df.sg, by=c("siteID", "group"), all.y=TRUE)
#-------------
  siteID    group species Freq
1      a domestic     dog    2
2      a   exotic    <NA>    0
3      a     wild    <NA>    0
4      b domestic    <NA>    0
5      b   exotic  toucan    4
6      b     wild    deer    3
7      c domestic     dog    5
8      c   exotic  toucan    7
9      c     wild    deer    6 

xtabs函数返回一个表,该表允许as.data.frame.table方法对其进行处理。非常方便。