根据其他data.frame的多个条件进行计数

时间:2015-04-21 13:20:05

标签: r dataframe

我正在将分析从Excel迁移到R,并且想要了解如何最好地执行类似于R的Excel的COUNTIFS。

我有两个data.frames,statedf和memberdf。

statedf=data.frame(state=c('MD','MD','MD','NY','NY','NY'), week = 5:7) 
memberdf=data.frame(memID = 1:15, state = c('MD','MD','NY','NY','MD'),
              finalweek = c(3,3,5,3,3,5,3,5,3,5,6,5,2,3,5),
              orders = c(1,2,3))

此数据适用于基于订阅的业务。我想知道在声明的每周/州组合中新失效的成员数量,其中新失效定义为声明f $ week - 1 = memberdf $ finalweek。此外,我想为每个订单值(1,2,3)分别计算。

所需的输出看起来像

out <- data.frame(state=c('MD','MD','MD','NY','NY','NY'), week = 5:7,
               oneorder = c(0,1,0,0,0,0),
               twoorder = c(0,0,1,0,1,0),
               threeorder = c(0,3,0,0,1,0))

我问(并得到了一个很好的回答)这个问题的简单版本yesterday - 答案围绕着创建一个基于member.df的新data.frame。但是,我需要将数据附加到statedf,因为声明的成员/周组合不存在于memberdf中,反之亦然。如果这是在Excel中,我会使用COUNTIFS,但我正在努力寻找R中的解决方案。

感谢。

2 个答案:

答案 0 :(得分:2)

我们可以在&#39;陈述的&#39;中创建一个新变量(&#39; week1&#39;)。数据集,merge&#39; memberdf&#39;与&#39;陈述&#39;,然后从'长&#39;重塑广泛的&#39;格式为dcast。我改变了命令&#39;列以匹配&#39; out&#39;中的列名称。

statedf$week1 <-  statedf$week-1
df1 <- merge(memberdf[-1], statedf, by.x=c('state', 'finalweek'), 
                 by.y=c('state', 'week1'), all.y=TRUE)
lvls <- paste0(c('one', 'two', 'three'), 'order')
df1$orders <- factor(lvls[df1$orders],levels=lvls) 
library(reshape2)
out1 <- dcast(df1, state+week~orders, value.var='orders', length)[-6]
out1
#     state week oneorder twoorder threeorder
#1    MD    5        0        0          0
#2    MD    6        1        0          3
#3    MD    7        0        1          0
#4    NY    5        0        0          0
#5    NY    6        0        1          1
#6    NY    7        0        0          0

all.equal(out, out1)
#[1] TRUE

答案 1 :(得分:2)

以下是包含dplyrtidyr软件包的解决方案:

library(tidyr) ; library(dplyr)

counts <- memberdf %>%
  mutate(lapsedweek = finalweek + 1) %>%
  group_by(state, lapsedweek, orders) %>%
  tally()
counts <- counts %>% spread(orders, n, fill = 0)    
out <- left_join(statedf, counts, by = c("state", "week" = "lapsedweek"))
out[is.na(out)] <- 0 # convert rows with all NAs to 0s
names(out)[3:5] <- paste0("order", names(out)[3:5]) # rename columns