我正在将分析从Excel迁移到R,并且想要了解如何最好地执行类似于R的Excel的COUNTIFS。
我有两个data.frames,statedf和memberdf。
statedf=data.frame(state=c('MD','MD','MD','NY','NY','NY'), week = 5:7)
memberdf=data.frame(memID = 1:15, state = c('MD','MD','NY','NY','MD'),
finalweek = c(3,3,5,3,3,5,3,5,3,5,6,5,2,3,5),
orders = c(1,2,3))
此数据适用于基于订阅的业务。我想知道在声明的每周/州组合中新失效的成员数量,其中新失效定义为声明f $ week - 1 = memberdf $ finalweek。此外,我想为每个订单值(1,2,3)分别计算。
所需的输出看起来像
out <- data.frame(state=c('MD','MD','MD','NY','NY','NY'), week = 5:7,
oneorder = c(0,1,0,0,0,0),
twoorder = c(0,0,1,0,1,0),
threeorder = c(0,3,0,0,1,0))
我问(并得到了一个很好的回答)这个问题的简单版本yesterday - 答案围绕着创建一个基于member.df的新data.frame。但是,我需要将数据附加到statedf,因为声明的成员/周组合不存在于memberdf中,反之亦然。如果这是在Excel中,我会使用COUNTIFS,但我正在努力寻找R中的解决方案。
感谢。
答案 0 :(得分:2)
我们可以在&#39;陈述的&#39;中创建一个新变量(&#39; week1&#39;)。数据集,merge
&#39; memberdf&#39;与&#39;陈述&#39;,然后从'长&#39;重塑广泛的&#39;格式为dcast
。我改变了命令&#39;列以匹配&#39; out&#39;中的列名称。
statedf$week1 <- statedf$week-1
df1 <- merge(memberdf[-1], statedf, by.x=c('state', 'finalweek'),
by.y=c('state', 'week1'), all.y=TRUE)
lvls <- paste0(c('one', 'two', 'three'), 'order')
df1$orders <- factor(lvls[df1$orders],levels=lvls)
library(reshape2)
out1 <- dcast(df1, state+week~orders, value.var='orders', length)[-6]
out1
# state week oneorder twoorder threeorder
#1 MD 5 0 0 0
#2 MD 6 1 0 3
#3 MD 7 0 1 0
#4 NY 5 0 0 0
#5 NY 6 0 1 1
#6 NY 7 0 0 0
all.equal(out, out1)
#[1] TRUE
答案 1 :(得分:2)
以下是包含dplyr
和tidyr
软件包的解决方案:
library(tidyr) ; library(dplyr)
counts <- memberdf %>%
mutate(lapsedweek = finalweek + 1) %>%
group_by(state, lapsedweek, orders) %>%
tally()
counts <- counts %>% spread(orders, n, fill = 0)
out <- left_join(statedf, counts, by = c("state", "week" = "lapsedweek"))
out[is.na(out)] <- 0 # convert rows with all NAs to 0s
names(out)[3:5] <- paste0("order", names(out)[3:5]) # rename columns